GCSListObjectsOperator

Google

List all objects from the bucket with the given string prefix and delimiter in name.

View on GitHub

Last Updated: May. 27, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

bucketRequiredstrThe Google Cloud Storage bucket to find the objects. (templated)
prefixstrPrefix string which filters objects whose name begin with this prefix. (templated)
delimiterstrThe delimiter by which you want to filter the objects. (templated) For e.g to lists the CSV files from in a directory in GCS you would use delimiter=’.csv’.
gcp_conn_idstr(Optional) The connection ID used to connect to Google Cloud.
google_cloud_storage_conn_id(Deprecated) The connection ID used to connect to Google Cloud. This parameter has been deprecated. You should pass the gcp_conn_id parameter instead.
delegate_tostrThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
impersonation_chainUnion[str, Sequence[str]]Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

List all objects from the bucket with the given string prefix and delimiter in name.

param bucket

The Google Cloud Storage bucket to find the objects. (templated)

type bucket

str

param prefix

Prefix string which filters objects whose name begin with this prefix. (templated)

type prefix

str

param delimiter

The delimiter by which you want to filter the objects. (templated) For e.g to lists the CSV files from in a directory in GCS you would use delimiter=’.csv’.

type delimiter

str

param gcp_conn_id

(Optional) The connection ID used to connect to Google Cloud.

type gcp_conn_id

str

param google_cloud_storage_conn_id

(Deprecated) The connection ID used to connect to Google Cloud. This parameter has been deprecated. You should pass the gcp_conn_id parameter instead.

type google_cloud_storage_conn_id

param delegate_to

The account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

type delegate_to

str

param impersonation_chain

Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

type impersonation_chain

Union[str, Sequence[str]]

Example:

The following Operator would list all the Avro files from sales/sales-2017 folder in data bucket.

GCS_Files = GoogleCloudStorageListOperator(
task_id='GCS_Files',
bucket='data',
prefix='sales/sales-2017/',
delimiter='.avro',
gcp_conn_id=google_cloud_conn_id
)

Was this page helpful?