EMRContainerOperator

Amazon

An operator that submits jobs to EMR on EKS virtual clusters.

View on GitHub

Last Updated: Aug. 30, 2021

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

nameRequiredstrThe name of the job run.
virtual_cluster_idRequiredstrThe EMR on EKS virtual cluster ID
execution_role_arnRequiredstrThe IAM role ARN associated with the job run.
release_labelRequiredstrThe Amazon EMR release version to use for the job run.
job_driverRequireddictJob configuration details, e.g. the Spark job parameters.
configuration_overridesdictThe configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_tokenstrThe client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.
aws_conn_idstrThe Airflow connection used for AWS credentials.
poll_intervalintTime (in seconds) to wait between two consecutive calls to check query status on EMR
max_triesintMaximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.

Documentation

An operator that submits jobs to EMR on EKS virtual clusters.

Was this page helpful?