DataprocSubmitHadoopJobOperator

Google

Start a Hadoop Job on a Cloud DataProc cluster.

View Source

Last Updated: May. 7, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

main_jarstrThe HCFS URI of the jar file containing the main class (use this or the main_class, not both together).
main_classstrName of the job class. (use this or the main_jar, not both together).
argumentslistArguments for the job. (templated)
archiveslistList of archived files that will be unpacked in the work directory. Should be stored in Cloud Storage.
fileslistList of files to be copied to the working directory

Documentation

Start a Hadoop Job on a Cloud DataProc cluster.

Example DAGs

Improve this module by creating an example DAG.

View Source
  1. Add an `example_dags` directory to the top-level source of the provider package with an empty `__init__.py` file.
  2. Add your DAG to this directory. Be sure to include a well-written and descriptive docstring
  3. Create a pull request against the source code. Once the package gets released, your DAG will show up on the Registry.

Was this page helpful?