DataprocCreateClusterOperator

Google

Create a new cluster on Google Cloud Dataproc. The operator will wait until the creation is successful or an error occurs in the creation process. If the cluster already exists and use_if_exists is True then the operator will:

View Source

Last Updated: May. 7, 2021

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

project_idstrThe ID of the google cloud project in which to create the cluster. (templated)
cluster_namestrName of the cluster to create
labelsDict[str, str]Labels that will be assigned to created cluster
cluster_configUnion[Dict, google.cloud.dataproc_v1.types.ClusterConfig]Required. The cluster config to create. If a dict is provided, it must be of the same form as the protobuf message :class:`~google.cloud.dataproc_v1.types.ClusterConfig`No role entry for "class" in module "docutils.parsers.rst.languages.en". Trying "class" as canonical role name.Unknown interpreted text role "class".
regionstrThe specified region where the dataproc cluster is created.
request_idstrOptional. A unique id used to identify the request. If the server receives two DeleteClusterRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.
retrygoogle.api_core.retry.RetryA retry object used to retry requests. If None is specified, requests will not be retried.
timeoutfloatThe amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadataSequence[Tuple[str, str]]Additional metadata that is provided to the method.
gcp_conn_idstrThe connection ID to use connecting to Google Cloud.
impersonation_chainUnion[str, Sequence[str]]Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
delete_on_errorbool
use_if_existsbool

Documentation

Create a new cluster on Google Cloud Dataproc. The operator will wait until the creation is successful or an error occurs in the creation process. If the cluster already exists and use_if_exists is True then the operator will:

  • if cluster state is ERROR then delete it if specified and raise error

  • if cluster state is CREATING wait for it and then check for ERROR state

  • if cluster state is DELETING wait for it and then create new cluster

Please refer to

https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters

for a detailed explanation on the different parameters. Most of the configuration parameters detailed in the link are available as a parameter to this operator.

See also

For more information on how to use this operator, take a look at the guide: Create a Cluster

Was this page helpful?