AWS Glue vulnerabilities in default packages
Securing AWS Glue: A Guide to Identifying and Fixing Python Package Vulnerabilities
Introduction
Did you know that the default Python packages in AWS Glue contain a number of known vulnerabilities? While instances, containers, and Lambda functions are often scanned by tools like AWS Inspector, Trivy, and Snyk, data pipelines are frequently overlooked. Whether by accident or design, many data pipelines—often laden with Python code—interact with external systems and APIs to ingest data. As such, securing these pipelines is just as important as securing any other part of your infrastructure.
In this post, I’ll walk you through how to enhance the security of your AWS Glue data pipelines. The first issue I encountered is that these pipelines often combine system and runtime dependencies with application code. AWS Glue and Apache Airflow both provide Python environments with pre-installed packages, along with the option to add custom ones.
AWS Glue
For this post, I’ll focus specifically on the AWS Glue environment.
AWS Glue allows you to create three types of jobs:
-
Glue ETL (PySpark):
Glue ETL Python Libraries -
Python Shell:
Python Shell Jobs in AWS Glue -
Ray (not supported):
The Glue ETL job spins up an on-demand Spark environment, while the Python Shell is more akin to a Lambda function. It doesn’t have the same 15-minute time limit but does have limited capacity.
Exporting System Requirements
While browsing the Glue documentation, I came across tables listing the pre-installed Python packages. I wrote a small program to parse these tables and export them to a requirements.txt file.
For a Python Shell job using Python 3.9, this is the output:
awscli==1.23.5botocore==1.23.5For Python Shell jobs, there’s also an option to set the library-set to analytics, which provides a set of commonly-used packages, including the useful AWS SDK for pandas. However, note that the version included is fairly outdated:
avro==1.11.0awscli==1.23.5awswrangler==2.15.1botocore==1.24.21boto3==1.21.21elasticsearch==8.2.0numpy==1.22.3pandas==1.4.2psycopg2==2.9.3pyathena==2.5.3PyMySQL==1.0.2pyodbc==4.0.32pyorc==0.6.0redshift-connector==2.0.907requests==2.27.1scikit-learn==1.0.2scipy==1.8.0SQLAlchemy==1.4.36s3fs==2022.3.0Now we have the system dependencies in a workable format.
Run-Time Dependencies
AWS Glue also allows you to install additional packages at runtime using pip. You can extend or override the pre-installed Python packages as needed.
For more details, check the official AWS Glue Programming Python Libraries documentation.
Glue Inspector
With the above information, I created a tool called Glue Inspector. It downloads the AWS system dependencies, caches them locally, and then retrieves runtime dependencies. These are merged into a list and exported as a CycloneDX Software Bill of Materials (SBOM) in JSON format.
To use it:
- Set your AWS credentials in the environment.
- Run the following command to inspect a Glue job:
glue-inspector inspect mygluejob --output mygluejob-sbom.jsonYou can then use the resulting SBOM to manage the software supply chain with tools like DependencyTrack, or scan for vulnerabilities using tools like Trivy:
trivy sbom mygluejob-sbom.json --scanners vuln,license --list-all-pkgs -d --format cyclonedx --output mygluejob-sbom-trivy.jsonI’ve just released version 0.2.0 of Glue Inspector.
AWS Vulnerabilities in Glue
While working on this tool, I was surprised by the number of critical and high-severity vulnerabilities present in the default packages. I filed a report with AWS Security, and after weeks of waiting, I was told that the runtime is isolated and therefore not considered an AWS system issue. However, users are encouraged to update their packages as needed.
I believe more awareness is needed in this area.
Glue Runtime Vulnerabilities
Here’s an overview of vulnerabilities in the Glue runtimes:
| Filename | Critical | High | Medium | Low |
|---|---|---|---|---|
| glueetl-2.0 | 5 | 12 | 12 | 1 |
| glueetl-3.0 | 4 | 16 | 20 | 2 |
| glueetl-4.0 | 4 | 14 | 18 | 2 |
| glueetl-5.0 | 0 | 6 | 11 | 3 |
| pythonshell-3.6 | 1 | 1 | 6 | 0 |
| pythonshell-3.9 | 0 | 0 | 0 | 0 |
| pythonshell-3.9-analytics | 1 | 1 | 3 | 0 |
Vulnerabilities in AWS Glue 5.0 GlueETL
Here are some critical and high-severity vulnerabilities in the newly released Glue ETL 5.0 runtime:
| Package | Severity | Id | Installed Version | Fixed Version | Title |
|---|---|---|---|---|---|
| Pygments | MEDIUM | CVE-2022-40896 | 2.7.4 | 2.15.0 | pygments: ReDoS in pygments |
| aiohttp | MEDIUM | CVE-2024-42367 | 3.10.1 | 3.10.2 | aiohttp: python-aiohttp: Compressed files as symlinks are not protected from path traversal |
| aiohttp | MEDIUM | CVE-2024-52304 | 3.10.1 | 3.10.11 | aiohttp: aiohttp vulnerable to request smuggling due to incorrect parsing of chunk extensions |
| cryptography | HIGH | CVE-2023-0286 | 36.0.1 | 39.0.1 | openssl: X.400 address type confusion in X.509 GeneralName |
| cryptography | HIGH | CVE-2023-50782 | 36.0.1 | 42.0.0 | python-cryptography: Bleichenbacher timing oracle attack against RSA decryption - incomplete fix for CVE-2020-25659 |
| cryptography | MEDIUM | CVE-2023-23931 | 36.0.1 | 39.0.1 | python-cryptography: memory corruption via immutable objects |
| cryptography | MEDIUM | CVE-2023-49083 | 36.0.1 | 41.0.6 | python-cryptography: NULL-dereference when loading PKCS7 certificates |
| cryptography | MEDIUM | CVE-2024-0727 | 36.0.1 | 42.0.2 | openssl: denial of service via null dereference |
| cryptography | LOW | GHSA-5cpq-8wj7-hf2v | 36.0.1 | 41.0.0 | Vulnerable OpenSSL included in cryptography wheels |
| cryptography | LOW | GHSA-jm77-qphf-c4w8 | 36.0.1 | 41.0.3 | pyca/cryptography’s wheels include vulnerable OpenSSL |
| cryptography | LOW | GHSA-v8gr-m533-ghj9 | 36.0.1 | 41.0.4 | Vulnerable OpenSSL included in cryptography wheels |
| idna | MEDIUM | CVE-2024-3651 | 2.10 | 3.7 | python-idna: potential DoS via resource consumption via specially crafted inputs to idna.encode() |
| pip | MEDIUM | CVE-2023-5752 | 21.3.1 | 23.3 | pip: Mercurial configuration injectable in repo revision when installing via pip |
| pip | MEDIUM | CVE-2023-5752 | 22.3.1 | 23.3 | pip: Mercurial configuration injectable in repo revision when installing via pip |
| setuptools | HIGH | CVE-2022-40897 | 59.6.0 | 65.5.1 | pypa-setuptools: Regular Expression Denial of Service (ReDoS) in package_index.py |
| setuptools | HIGH | CVE-2024-6345 | 59.6.0 | 70.0.0 | pypa/setuptools: Remote code execution via download functions in the package_index module in pypa/setuptools |
| urllib3 | HIGH | CVE-2021-33503 | 1.25.10 | 1.26.5 | python-urllib3: ReDoS in the parsing of authority part of URL |
| urllib3 | HIGH | CVE-2023-43804 | 1.25.10 | 2.0.6, 1.26.17 | python-urllib3: Cookie request header isn’t stripped during cross-origin redirects |
| urllib3 | MEDIUM | CVE-2023-45803 | 1.25.10 | 2.0.7, 1.26.18 | urllib3: Request body not stripped after redirect from 303 status changes request method to GET |
| urllib3 | MEDIUM | CVE-2024-37891 | 1.25.10 | 1.26.19, 2.2.2 | urllib3: proxy-authorization request header is not stripped during cross-origin redirects |
Mitigating Vulnerabilities
If your Glue jobs access external resources, be sure to update the required packages using the runtime installation option. However, this could lead to a “dependency hell” situation, so use your favorite tools or something like pur to help update the requirements.
Here’s an overview of some key packages that are outdated:
Updated aiobotocore: 2.13.1 -> 2.16.1Updated aiohappyeyeballs: 2.3.5 -> 2.4.4Updated aiohttp: 3.10.1 -> 3.11.11Updated aioitertools: 0.11.0 -> 0.12.0Updated aiosignal: 1.3.1 -> 1.3.2Updated async-timeout: 4.0.3 -> 5.0.1Updated attrs: 24.2.0 -> 24.3.0Updated awscrt: 0.19.19 -> 0.23.6Updated boto3: 1.34.131 -> 1.35.92Updated botocore: 1.34.131 -> 1.35.92Updated certifi: 2024.7.4 -> 2024.12.14Updated cffi: 1.14.5 -> 1.17.1Updated charset-normalizer: 3.3.2 -> 3.4.1Updated colorama: 0.4.4 -> 0.4.6Updated contourpy: 1.2.1 -> 1.3.1Updated cryptography: 36.0.1 -> 44.0.0Updated distlib: 0.3.1 -> 0.3.9Updated distro: 1.5.0 -> 1.9.0Updated docutils: 0.16 -> 0.21.2Updated filelock: 3.0.12 -> 3.16.1Updated fonttools: 4.53.1 -> 4.55.3Updated frozenlist: 1.4.1 -> 1.5.0Updated fsspec: 2024.6.1 -> 2024.12.0Updated idna: 2.10 -> 3.10Updated importlib_resources: 6.4.0 -> 6.5.2Updated jmespath: 0.10.0 -> 1.0.1Updated kiwisolver: 1.4.5 -> 1.4.8Updated libcomps: 0.1.20 -> 0.1.21.post1Updated matplotlib: 3.9.0 -> 3.10.0Updated multidict: 6.0.5 -> 6.1.0Updated numpy: 1.26.4 -> 2.2.1Updated packaging: 24.1 -> 24.2Updated pandas: 2.2.2 -> 2.2.3Updated pillow: 10.4.0 -> 11.1.0Updated pip: 21.3.1 -> 24.3.1Updated pip: 22.3.1 -> 24.3.1Updated plotly: 5.23.0 -> 5.24.1Updated prompt-toolkit: 3.0.24 -> 3.0.48Updated pyarrow: 17.0.0 -> 18.1.0Updated pycparser: 2.20 -> 2.22Updated Pygments: 2.7.4 -> 2.19.0Updated pyparsing: 3.1.2 -> 3.2.1Updated pytz: 2024.1 -> 2024.2Updated requests: 2.32.2 -> 2.32.3Updated ruamel.yaml: 0.16.6 -> 0.18.9Updated ruamel.yaml.clib: 0.1.2 -> 0.2.12Updated s3fs: 2024.6.1 -> 2024.12.0Updated s3transfer: 0.10.2 -> 0.10.4Updated setuptools: 59.6.0 -> 75.7.0Updated six: 1.16.0 -> 1.17.0Updated tzdata: 2024.1 -> 2024.2Updated urllib3: 1.25.10 -> 2.3.0Updated virtualenv: 20.4.0 -> 20.28.1Updated wcwidth: 0.2.5 -> 0.2.13Updated wrapt: 1.16.0 -> 1.17.0Updated yarl: 1.9.4 -> 1.18.3Updated zipp: 3.19.2 -> 3.21.0Luckily, Glue 5 now supports the use of a requirements.txt file uploaded to S3, which can be parsed by pip:
This opens up the possibility of using local checks and tools like GitHub Dependabot to monitor your dependencies for vulnerabilities.
Conclusion
-
Data pipelines are applications and need to be treated with the same level of scrutiny as any other software. Managing their lifecycle is critical for security.
-
Be aware of vulnerabilities in default runtimes, whether using AWS Glue, Apache Airflow, or other similar tools.
-
Use Glue Inspector to scan your Glue jobs and generate an SBOM for better software supply chain management. SBOMs are becoming an industry standard, with requirements from norms like DORA and U.S. government standards for critical infrastructure.