
transform
Astro SDKExecute an explicit, SELECT SQL statement. Data returned from this SQL is inserted into a temporary table which can used by other downstream tasks.
Access Instructions
Install the Astro SDK provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
With your data is in an SQL system, it's time to start transforming it! The @transform
SQL decorator is your "ELT" system. Each step of the transform pipeline creates a new table from the SELECT
statement and enables tasks to pass those tables as if they were native Python objects.
You will notice that the functions use a custom templating system. Wrapping a value in single brackets (like {customer_table}
) indicates the value needs to be rendered as a SQL table. The SQL decorator also treats values in double brackets as Airflow Jinja templates.
Please note that this is NOT an f string. F-strings in SQL formatting risk security breaches via SQL injections.
For security, users MUST explicitly identify tables in the function parameters by typing a value as a Table
. Only then will the SQL decorator treat the value as a table.
- Example:
- @transformdef get_orders():...@transformdef get_customers():...@transformdef join_orders_and_customers(orders_table: Table, customer_table: Table):"""Join `orders_table` and `customers_table` to create a simple 'feature' dataset."""return """SELECT c.customer_id, c.source, c.region, c.member_since,CASE WHEN purchase_count IS NULL THEN 0 ELSE 1 END AS recent_purchaseFROM {orders_table} c LEFT OUTER JOIN {customer_table} p ON c.customer_id = p.customer_id"""with dag:orders = get_orders()customers = get_customers()join_orders_and_customers(orders, customers)