DataprocSubmitSparkJobOperator

Google

Start a Spark Job on a Cloud DataProc cluster.

View on GitHub

Last Updated: May. 27, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

main_jarstrThe HCFS URI of the jar file that contains the main class (use this or the main_class, not both together).
main_classstrName of the job class. (use this or the main_jar, not both together).
argumentslistArguments for the job. (templated)
archiveslistList of archived files that will be unpacked in the work directory. Should be stored in Cloud Storage.
fileslistList of files to be copied to the working directory

Documentation

Start a Spark Job on a Cloud DataProc cluster.

Was this page helpful?