AwsGlueJobOperator

Amazon

Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala

View Source

Last Updated: May. 7, 2021

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

job_nameOptional[str]unique job name per AWS Account
script_locationOptional[str]location of ETL script. Must be a local or S3 path
job_descOptional[str]job description details
concurrent_run_limitOptional[int]The maximum number of concurrent runs allowed for a job
script_argsdictetl script arguments and AWS Glue arguments
retry_limitOptional[int]The maximum number of times to retry this job if it fails
num_of_dpusintNumber of AWS Glue DPUs to allocate to this Job.
region_namestraws region name (example: us-east-1)
s3_bucketOptional[str]S3 bucket where logs and local etl script will be uploaded
iam_role_nameOptional[str]AWS IAM Role for Glue Job Execution

Documentation

Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala

Example DAGs

Improve this module by creating an example DAG.

View Source
  1. Add an `example_dags` directory to the top-level source of the provider package with an empty `__init__.py` file.
  2. Add your DAG to this directory. Be sure to include a well-written and descriptive docstring
  3. Create a pull request against the source code. Once the package gets released, your DAG will show up on the Registry.

Was this page helpful?