DataprocCreateClusterOperator

Google

Create a new cluster on Google Cloud Dataproc. The operator will wait until the creation is successful or an error occurs in the creation process. If the cluster already exists and use_if_exists is True then the operator will:

View on GitHub

Last Updated: Jul. 19, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

project_idstrThe ID of the google cloud project in which to create the cluster. (templated)
cluster_nameRequiredstrName of the cluster to create
labelsDict[str, str]Labels that will be assigned to created cluster
cluster_configUnion[Dict, google.cloud.dataproc_v1.types.ClusterConfig]Required. The cluster config to create. If a dict is provided, it must be of the same form as the protobuf message ClusterConfig
regionstrThe specified region where the dataproc cluster is created.
delete_on_errorboolIf true the cluster will be deleted if created with ERROR state. Default value is true.
use_if_existsboolIf true use existing cluster
request_idstrOptional. A unique id used to identify the request. If the server receives two DeleteClusterRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.
retrygoogle.api_core.retry.RetryA retry object used to retry requests. If None is specified, requests will not be retried.
timeoutfloatThe amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadataSequence[Tuple[str, str]]Additional metadata that is provided to the method.
gcp_conn_idstrThe connection ID to use connecting to Google Cloud.
impersonation_chainUnion[str, Sequence[str]]Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

Create a new cluster on Google Cloud Dataproc. The operator will wait until the creation is successful or an error occurs in the creation process. If the cluster already exists and use_if_exists is True then the operator will:

  • if cluster state is ERROR then delete it if specified and raise error

  • if cluster state is CREATING wait for it and then check for ERROR state

  • if cluster state is DELETING wait for it and then create new cluster

Please refer to

https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters

for a detailed explanation on the different parameters. Most of the configuration parameters detailed in the link are available as a parameter to this operator.

See also

For more information on how to use this operator, take a look at the guide: Create a Cluster

Was this page helpful?