Glue Iceberg Rest Api and PyIceberg
Access Glue Iceberg tables via the Iceberg Rest Api
AWS Released silently Iceberg REST-API support. This is a standard API to access iceberg tables on different platforms. More information can be found here https://iceberg.apache.org/concepts/catalog/
PyIceberg is a python library with generic iceberg support. It also supports the rest api. Other tools are pyspark
Example code to use a catalog via the Iceberg Rest API from Glue.
from pyiceberg.catalog import load_catalogimport logging
# Set up logging to show debug messageslogging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
# Specifically for PyIceberg logginglogger = logging.getLogger('pyiceberg')logger.setLevel(logging.DEBUG)
def main(): rest_catalog = load_catalog( "ibtest1", **{ "type": "rest", "uri": "https://glue.eu-central-1.amazonaws.com/iceberg", "rest.sigv4-enabled": "true", "rest.signing-name": "glue", "rest.signing-region": "eu-central-1" } ) print(rest_catalog.list_namespaces()) print(rest_catalog.list_tables("ibtest")) print(rest_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
if __name__ == "__main__": main()
Glue Catalog version
For comparison this is the native glue version in pyiceberg. This uses the boto api.
def main(): glue_catalog = load_catalog("glue", **{"type": "glue"})
print(glue_catalog.list_namespaces()) print(glue_catalog.list_tables("ibtest")) print(glue_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
Output
I’ll add the full debug output here, to show that i only uses the rest api. There are no requests to s3. I run with with the aws credentials with glue and s3 permissions in the environment as AWS_PROFILE.
ListNameSpace
2024-12-22 15:13:41,061 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Calculating signature using v4 auth.2024-12-22 15:13:41,062 - botocore.auth - DEBUG - CanonicalRequest:GET/iceberg/v1/config
accept:*/*accept-encoding:gzip, deflatecontent-type:application/jsonhost:glue.eu-central-1.amazonaws.comx-amz-date:20241222T141341Zx-client-version:0.14.1x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegationxxxxxxxx2024-12-22 15:13:41,062 - botocore.auth - DEBUG - StringToSign:AWS4-HMAC-SHA25620241222T141341Z20241222/eu-central-1/glue/aws4_requestxxxxxxx2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Signature:xxxxxx2024-12-22 15:13:41,062 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:4432024-12-22 15:13:41,213 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/config HTTP/1.1" 200 3272024-12-22 15:13:41,237 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Calculating signature using v4 auth.2024-12-22 15:13:41,237 - botocore.auth - DEBUG - CanonicalRequest:GET/iceberg/v1/catalogs/123456789012/namespaces
accept:*/*accept-encoding:gzip, deflatecontent-type:application/jsonhost:glue.eu-central-1.amazonaws.comx-amz-date:20241222T141341Zx-client-version:0.14.1x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegationxxxxxx2024-12-22 15:13:41,237 - botocore.auth - DEBUG - StringToSign:AWS4-HMAC-SHA25620241222T141341Z20241222/eu-central-1/glue/aws4_requestxxxxxxx2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Signature:xxxxxx2024-12-22 15:13:41,237 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:4432024-12-22 15:13:41,435 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces HTTP/1.1" 200 48[('ibtest',), ('sourcedata_sales',)]
ListTable
2024-12-22 15:13:41,462 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Calculating signature using v4 auth.2024-12-22 15:13:41,463 - botocore.auth - DEBUG - CanonicalRequest:GET/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables
accept:*/*accept-encoding:gzip, deflatecontent-type:application/jsonhost:glue.eu-central-1.amazonaws.comx-amz-date:20241222T141341Zx-client-version:0.14.1x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegationxxxxxx2024-12-22 15:13:41,463 - botocore.auth - DEBUG - StringToSign:AWS4-HMAC-SHA25620241222T141341Z20241222/eu-central-1/glue/aws4_requestxxxxxxx2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Signature:xxxx2024-12-22 15:13:41,541 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables HTTP/1.1" 200 59
[('ibtest', 'ibtest1')]
Scan Tables
2024-12-22 15:13:41,567 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Calculating signature using v4 auth.2024-12-22 15:13:41,567 - botocore.auth - DEBUG - CanonicalRequest:GET/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1
accept:*/*accept-encoding:gzip, deflatecontent-type:application/jsonhost:glue.eu-central-1.amazonaws.comx-amz-date:20241222T141341Zx-client-version:0.14.1x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegationxxxxxx2024-12-22 15:13:41,567 - botocore.auth - DEBUG - StringToSign:AWS4-HMAC-SHA25620241222T141341Z20241222/eu-central-1/glue/aws4_requestxxxxx2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Signature: xxxxxx2024-12-22 15:13:41,712 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 HTTP/1.1" 200 2123
id name created0 001 test 2024-12-22 13:48:31.381
Urls
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/config
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces
- https://glue.eu-central-1.amazonaws.com:/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1
Curl option
I remembered that curl has a sigv4 option. I tried it with the same iam credentials and the sigv4 sign area of aws:amz:<region>:glue
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/config --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"
Combining this with the urls discovered above shows the output
Get the namespaces
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126/namespaces --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" <aws:ibtest>{"namespaces":[["ibtest"],["sourcedata_sales"]]}
Get the Table Info
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" | jq
{ "config": { "metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json", "previous_metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json", "table_type": "ICEBERG" }, "metadata": { "current-schema-id": 0, "current-snapshot-id": 968789183104214971, "default-sort-order-id": 0, "default-spec-id": 0, "format-version": 2, "last-column-id": 3, "last-partition-id": 999, "last-sequence-number": 1, "last-updated-ms": 1734875312126, "location": "s3://ibtest-123456789012/ibtest/ibtest1", "metadata-log": [ { "metadata-file": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json", "timestamp-ms": 1734874992670 } ], "partition-specs": [ { "fields": [], "spec-id": 0 } ], "partition-statistics-files": [], "properties": { "write.parquet.compression-codec": "zstd" }, "refs": { "main": { "snapshot-id": 968789183104214971, "type": "branch" } }, "schemas": [ { "fields": [ { "doc": "", "id": 1, "name": "id", "required": false, "type": "string" }, { "doc": "", "id": 2, "name": "name", "required": false, "type": "string" }, { "doc": "", "id": 3, "name": "created", "required": false, "type": "timestamp" } ], "schema-id": 0, "type": "struct" } ], "snapshot-log": [ { "snapshot-id": 968789183104214971, "timestamp-ms": 1734875312126 } ], "snapshots": [ { "manifest-list": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/snap-968789183104214971-1-86672725-3389-414f-b8f4-7f4aaa6401b0.avro", "schema-id": 0, "sequence-number": 1, "snapshot-id": 968789183104214971, "summary": { "changed-partition-count": "1", "added-data-files": "1", "total-equality-deletes": "0", "added-records": "1", "trino_query_id": "20241222_134831_00070_aiuhg", "total-position-deletes": "0", "added-files-size": "507", "total-delete-files": "0", "total-files-size": "507", "total-records": "1", "total-data-files": "1", "operation": "append" }, "timestamp-ms": 1734875312126 } ], "sort-orders": [ { "fields": [], "order-id": 0 } ], "statistics-files": [], "table-uuid": "d4dbfb4a-93b4-4255-9ce3-cfaa280fa40c" }, "metadata-location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json"}
Conclusion
With the latest release of Glue you can access Iceberg tables on AWS using the standard iceberg REST_API opening the infrastructure to multiple tools. The only AWS specific call is the signing with sigv4. https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html. This is already used by many tools for the S3 access.
This decouples the code from AWS Specific access and allows you to use more generic tools
Next steps
- Test with more tools (Curl update added)
- Test with the new S3 Tables (iceberg)
- Can we use this with Unity?