Deploying Kedro Pipelines to Apache Airflow
Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It borrows concepts from software engineering and applies them to machine learning code.
While Kedro is an excellent option for data engineers and data scientists looking to author their data pipelines and projects with software engineering practices, it can extend even further to integrate with Apache Airflow for distributed scheduling and execution of the resultant pipelines.
In close partnership with the team at Kedro, we've recently extended the
kedro-airflow plugin to accommodate a significantly improved developer experience. With this plugin, you can translate your Kedro pipeline into a clean, legible, and well-structured Apache Airflow DAG with one simple command:
kedro airflow create
This makes for a super clean experience for anyone looking to deploy their Kedro pipelines to a distributed scheduler for workflow orchestration.
To use the plugin, you'll need the following running on your machine or a fresh virtual environment:
Try it Out
We've added some additional functionality to the plugin that makes for a great integration with Astronomer. To give it a try, we'll use the
astro-airflow-iris starter that's included in the Kedro project; The steps below walk through spinning up a fresh Kedro project and running your pipelines as DAGs on a local Airflow environment.
Create an Astro-Kedro project
kedro new --starter astro-airflow-iristo build your starter directory.
Prepare and run the project in Astro
cp src/dist/*.whl ./
kedro catalog create --pipeline=__default__
conf/base/catalog/__default__.ymland configure datasets to be persisted, e.g.
example_train_x: type: pickle.PickleDataSet filepath: data/05_model_input/example_train_x.pkl example_train_y: type: pickle.PickleDataSet filepath: data/05_model_input/example_train_y.pkl example_test_x: type: pickle.PickleDataSet filepath: data/05_model_input/example_test_x.pkl example_test_y: type: pickle.PickleDataSet filepath: data/05_model_input/example_test_y.pkl example_model: type: pickle.PickleDataSet filepath: data/06_models/example_model.pkl example_predictions: type: pickle.PickleDataSet filepath: data/07_model_output/example_predictions.pkl`
- Make sure you have the
kedro-airflowplugin installed, then run
pip install kedro-airflow
kedro airflow create -t dags/
- Make sure you have the Astro CLI installed and have Docker running on your machine, then run
astro dev startto fire up a local Airflow instance and visualize your DAGs.
We're proud to partner with the Kedro team on bringing this plugin experience into the world and look forward to extending it to improve the developer experience even more. Please get in touch if you'd like to talk to us about how you use Kedro and Airflow together!