HiveStatsCollectionOperator

Hive

Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats overwrite themselves if you rerun the same date/partition.

View Source

Last Updated: Jan. 23, 2021

Access Instructions

Install the Hive provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Documentation

Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats overwrite themselves if you rerun the same date/partition.

CREATE TABLE hive_stats (
ds VARCHAR(16),
table_name VARCHAR(500),
metric VARCHAR(200),
value BIGINT
);

Example DAGs

Improve this module by creating an example DAG.

View Source
  1. Add an `example_dags` directory to the top-level source of the provider package with an empty `__init__.py` file.
  2. Add your DAG to this directory. Be sure to include a well-written and descriptive docstring
  3. Create a pull request against the source code. Once the package gets released, your DAG will show up on the Registry.

Was this page helpful?