AWS Glue
AWS Glue is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. It was introduced in August 2017.[2]
Developer(s) | Amazon.com |
---|---|
Initial release | August 2017 [1] |
Operating system | Cross-platform |
Available in | English |
Website | aws |
The primary purpose of Glue is to scan other services[3] in the same Virtual Private Cloud (or equivalent accessible network element even if not provided by AWS), particularly S3. The jobs are billed according to compute time, with a minimum count of 1 minute.[4] Glue discovers the source data to store associated meta-data (e.g. the table's schema of field names, types lengths) in the AWS Glue Data Catalog (which is then accessible via AWS console or APIs).[5]
Catalog interrogation via API
The catalog can be read in AWS console (via browser) and via API divided into topics including:[7]
- Database API
- Table API
- Partition API
- Connection API
- User-Defined Function API
- Importing an Athena Catalog to AWS Glue
See also
References
- "Introducing AWS Glue: A Simple, Flexible, and Cost-Effective Extract, Transfer, and Load (ETL) Service".
- "AWS Services List". ParkMyCloud. Retrieved October 6, 2020.
- "AWS Glue: crawlers and use cases". Retrieved July 13, 2022.
- "AWS Glue version 2.0 featuring 10x faster job start times and 1-minute minimum billing duration". AWS. August 10, 2020. Retrieved October 6, 2020.
- "AWS Glue API Documentation". AWS. Retrieved October 6, 2020.
- "AWS Glue Now Supports Scala in Addition to Python". AWS. January 12, 2018. Retrieved October 6, 2020.
- "Catalog API". AWS. Retrieved October 8, 2020.