
dataframe
Astro SDKConvert a Table object into a Pandas DataFrame or persist a DataFrame result to a database table.
Access Instructions
Install the Astro SDK provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
Your pipeline might call for procedures that would be too complex or impossible in SQL. This could be building a model from a feature set, or using a windowing function which more Pandas is adept for. The dataframe
function can easily move your data into a Pandas dataframe and back to your database as needed.
At runtime, the operator loads any Table
object into a Pandas DataFrame. If the Task returns a DataFame, downstream TaskFlow API Tasks can interact with it to continue using Python.
If after running the function, you wish to return the value into your database, simply include a Table
in the reserved output_table
parameter (please note that since this parameter is reserved, you can not use it in your function definition).
- Example:
- from astro import dataframefrom astro.sql import transformfrom astro.sql.table import Tableimport pandas as pd@dataframedef get_dataframe():return pd.DataFrame({"numbers": [1, 2, 3], "colors": ["red", "white", "blue"]})@transformdef sample_pg(input_table: Table):return "SELECT * FROM {input_table}"with self.dag:my_df = get_dataframe(output_table=Table(table_name="my_df_table", conn_id="postgres_conn", database="pagila"))pg_df = sample_pg(my_df)