LivyOperator

Livy

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

View Source

Last Updated: Mar. 17, 2021

Access Instructions

Install the Livy provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

filestrpath of the file containing the application to execute (required).
class_namestrname of the application Java/Spark main class.
argslistapplication command line arguments.
jarslistjars to be used in this sessions.
py_fileslistpython files to be used in this session.
fileslistfiles to be used in this session.
driver_memorystramount of memory to use for the driver process.
driver_coresstr, intnumber of cores to use for the driver process.
executor_memorystramount of memory to use per executor process.
executor_coresstr, intnumber of cores to use for each executor.
num_executorsstr, intnumber of executors to launch for this session.
archiveslistarchives to be used in this session.
queuestrname of the YARN queue to which the application is submitted.
namestrname of this session.
confdictSpark configuration properties.
proxy_userstruser to impersonate when running the job.
livy_conn_idstrreference to a pre-defined Livy Connection.
polling_intervalinttime in seconds between polling for job completion. Don't poll for values >=0
extra_optionsA dictionary of options, where key is string and value depends on the option that's being modified.

Documentation

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

Was this page helpful?