Deploying Kedro Pipelines to Apache Airflow
PluginsIntegrations
Overview
Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It borrows concepts from software engineering and applies them to machine learning code.
While Kedro is an excellent option for data engineers and data scientists looking to author their data pipelines and projects with software engineering practices, it can extend even further to integrate with Apache Airflow for distributed scheduling and execution of the resultant pipelines.
The Plugin
In close partnership with the team at Kedro, we've recently extended the kedro-airflow
plugin to accommodate a significantly improved developer experience. With this plugin, you can translate your Kedro pipeline into a clean, legible, and well-structured Apache Airflow DAG with one simple command:
kedro airflow create
This makes for a super clean experience for anyone looking to deploy their Kedro pipelines to a distributed scheduler for workflow orchestration.
Prerequisites
To use the plugin, you'll need the following running on your machine or a fresh virtual environment:
Try it Out
We've added some additional functionality to the plugin that makes for a great integration with Astronomer. To give it a try, we'll use the astro-airflow-iris
starter that's included in the Kedro project; The steps below walk through spinning up a fresh Kedro project and running your pipelines as DAGs on a local Airflow environment.
Create an Astro-Kedro project
kedro new --starter astro-airflow-iris
to build your starter directory.cd <kedro-project-directory>
kedro install
kedro package
Prepare and run the project in Astro
cp src/dist/*.whl ./
kedro catalog create --pipeline=__default__
-
Edit your
conf/base/catalog/__default__.yml
and configure datasets to be persisted, e.g.example_train_x: type: pickle.PickleDataSet filepath: data/05_model_input/example_train_x.pkl example_train_y: type: pickle.PickleDataSet filepath: data/05_model_input/example_train_y.pkl example_test_x: type: pickle.PickleDataSet filepath: data/05_model_input/example_test_x.pkl example_test_y: type: pickle.PickleDataSet filepath: data/05_model_input/example_test_y.pkl example_model: type: pickle.PickleDataSet filepath: data/06_models/example_model.pkl example_predictions: type: pickle.PickleDataSet filepath: data/07_model_output/example_predictions.pkl`
- Make sure you have the
kedro-airflow
plugin installed, then runpip install kedro-airflow
kedro airflow create -t dags/
- Make sure you have the Astro CLI installed and have Docker running on your machine, then run
astro dev start
to fire up a local Airflow instance and visualize your DAGs.
We're proud to partner with the Kedro team on bringing this plugin experience into the world and look forward to extending it to improve the developer experience even more. Please get in touch if you'd like to talk to us about how you use Kedro and Airflow together!