CloudDataCatalogSearchCatalogOperator

Google

Searches Data Catalog for multiple resources like entries, tags that match a query.

View Source

Last Updated: Mar. 22, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

scopeUnion[Dict, google.cloud.datacatalog_v1beta1.types.SearchCatalogRequest.Scope]Required. The scope of this search request.If a dict is provided, it must be of the same form as the protobuf message :class:`~google.cloud.datacatalog_v1beta1.types.Scope`No role entry for "class" in module "docutils.parsers.rst.languages.en". Trying "class" as canonical role name.Unknown interpreted text role "class".
querystrRequired. The query string in search query syntax. The query must be non-empty.Query strings can be simple as "x" or more qualified as:name:xcolumn:xdescription:yNote: Query tokens need to have a minimum of 3 characters for substring matching to work correctly. See Data Catalog Search Syntax for more information.
page_sizeintThe maximum number of resources contained in the underlying API response. If page streaming is performed per-resource, this parameter does not affect the return value. If page streaming is performed per-page, this determines the maximum number of resources in a page.
order_bystrSpecifies the ordering of results, currently supported case-sensitive choices are:relevance, only supports descendinglast_access_timestamp [asc|desc], defaults to descending if not specifiedlast_modified_timestamp [asc|desc], defaults to descending if not specifiedIf not specified, defaults to relevance descending.
retrygoogle.api_core.retry.RetryA retry object used to retry requests. If None is specified, requests will be retried using a default configuration.
timeoutfloatThe amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadataSequence[Tuple[str, str]]Additional metadata that is provided to the method.
gcp_conn_idstrOptional, The connection ID used to connect to Google Cloud. Defaults to 'google_cloud_default'.
impersonation_chainUnion[str, Sequence[str]]Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

Searches Data Catalog for multiple resources like entries, tags that match a query.

This does not return the complete resource, only the resource identifier and high level fields. Clients can subsequently call Get methods.

Note that searches do not have full recall. There may be results that match your query but are not returned, even in subsequent pages of results. These missing results may vary across repeated calls to search. Do not rely on this method if you need to guarantee full recall.

See also

For more information on how to use this operator, take a look at the guide: Search resources

Was this page helpful?