DataprocJobBaseOperator

Google

The base class for operators that launch job on DataProc.

View Source

Last Updated: May. 7, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

job_namestrThe job name used in the DataProc cluster. This name by default is the task_id appended with the execution data, but can be templated. The name will always be appended with a random number to avoid name clashes.
cluster_namestrThe name of the DataProc cluster.
dataproc_propertiesdictMap for the Hive properties. Ideal to put in default arguments (templated)
dataproc_jarslistHCFS URIs of jar files to add to the CLASSPATH of the Hive server and Hadoop MapReduce (MR) tasks. Can contain Hive SerDes and UDFs. (templated)
gcp_conn_idstrThe connection ID to use connecting to Google Cloud.
delegate_tostrThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
labelsdictThe labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a job.
regionstrThe specified region where the dataproc cluster is created.
job_error_statessetJob states that should be considered error states. Any states in this set will result in an error being raised and failure of the task. Eg, if the CANCELLED state should also be considered a task failure, pass in {'ERROR', 'CANCELLED'}. Possible values are currently only 'ERROR' and 'CANCELLED', but could change in the future. Defaults to {'ERROR'}.
impersonation_chainUnion[str, Sequence[str]]Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
asynchronousboolFlag to return after submitting the job to the Dataproc API. This is useful for submitting long running jobs and waiting on them asynchronously using the DataprocJobSensor
dataproc_job_id

Documentation

The base class for operators that launch job on DataProc.

Example DAGs

Improve this module by creating an example DAG.

View Source
  1. Add an `example_dags` directory to the top-level source of the provider package with an empty `__init__.py` file.
  2. Add your DAG to this directory. Be sure to include a well-written and descriptive docstring
  3. Create a pull request against the source code. Once the package gets released, your DAG will show up on the Registry.

Was this page helpful?