Orchestrating Multiple Azure Data Factory Pipelines in Airflow
This DAG demonstrates orchestrating multiple Azure Data Factory (ADF) pipelines using Airflow to perform classic ELT operations.
Run this DAG
1. Install the Astronomer CLI:Skip if you already have our CLI
2. Download the repository:
3. Navigate to where the repository was cloned and start the DAG:
Orchestrating Azure Data Factory Pipelines in Airflow
This DAG demonstrates orchestrating multiple Azure Data Factory (ADF) pipelines using Airflow to perform classic ELT operations. These ADF pipelines extract daily, currency exchange-rates from an API, persist data to a data lake in Azure Blob Storage, perform data-quality checks on staged data, and finally load to a daily aggregate table with SCD, Type-2 logic in Azure SQL Database. There are two ADF pipelines,
loadDailyExchangeRates, which perform the ELT.
extractDailyExchangeRates ADF pipeline will extract the data from the open Exchange Rate API for the USD and EUR currencies and initially store the response data in a "landing" container within Azure Blob Storage, then copy the extracted data to a "data-lake" container, load the landed data to a staging table in Azure SQL Database via a T-SQL stored procedure, and finally delete the landed data file.
loadDailyExchangeRates ADF pipeline performs a data quality check against the ingested currency codes relative to a dimensional, reference dataset. If the data quality check passes, another T-SQL stored procedure will insert the data into a daily, aggregate table of exchange rates comparing the US dollar, Euro, Japanese Yen, and the Swiss Franc.
Note: A custom operator
AzureDataFactoryPipelineRunStatusSensorwere created for this DAG to execute an ADF pipeline and check the status of the pipeline run as an operational checkpoint -- both soon to be a part of the Microsoft Azure provider in Airflow.
- Azure Data Factory