SparkSqlHook

Spark

This hook is a wrapper around the spark-sql binary. It requires that the “spark-sql” binary is in the PATH.

View Source

Last Updated: Apr. 27, 2021

Access Instructions

Install the Spark provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

sqlstrThe SQL query to execute
confstr (format: PROP=VALUE)arbitrary Spark configuration property
conn_idstrconnection_id string
total_executor_coresint(Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)
executor_coresint(Standalone & YARN only) Number of cores per executor (Default: 2)
executor_memorystrMemory per executor (e.g. 1000M, 2G) (Default: 1G)
keytabstrFull path to the file that contains the keytab
masterstrspark://host:port, mesos://host:port, yarn, or local
namestrName of the job.
num_executorsintNumber of executors to launch
verboseboolWhether to pass the verbose flag to spark-sql
yarn_queuestrThe YARN queue to submit to (Default: "default")

Documentation

This hook is a wrapper around the spark-sql binary. It requires that the “spark-sql” binary is in the PATH.

Example DAGs

Improve this module by creating an example DAG.

View Source
  1. Add an `example_dags` directory to the top-level source of the provider package with an empty `__init__.py` file.
  2. Add your DAG to this directory. Be sure to include a well-written and descriptive docstring
  3. Create a pull request against the source code. Once the package gets released, your DAG will show up on the Registry.

Was this page helpful?