01 Run dataproc clusters describe command (Windows/macOS/Linux) using the name of the Google Cloud Dataproc cluster that you want to re-create as the identifier parameter, to describe the configuration information available for the selected cluster:
gcloud dataproc clusters describe tm-prod-dataproc-cluster
--region=us-central1
--format=json
02 The command output should return the requested configuration information:
{
"clusterName": "tm-prod-dataproc-cluster",
"config": {
"configBucket": "dataproc-staging-us-central1-123456789012-abcdabcd",
"masterConfig": {
"diskConfig": {
"bootDiskSizeGb": 500,
"bootDiskType": "pd-standard"
},
"machineTypeUri": "https://www.googleapis.com/compute/v1/projects/cc-bigdata-project-123123/zones/us-central1-a/machineTypes/n1-standard-4",
"minCpuPlatform": "AUTOMATIC",
},
...
"tempBucket": "dataproc-temp-us-central1-6123456789012-abcdabcd"
},
"projectId": "cc-bigdata-project-123123",
"status": {
"state": "RUNNING",
"stateStartTime": "2024-03-04T08:20:00.000Z"
},
"statusHistory": [
{
"state": "CREATING",
"stateStartTime": "2024-03-04T08:20:00.000Z"
}
]
}
03 Run dataproc clusters create command (Windows/macOS/Linux) with the information returned at the previous step as the configuration data for the new cluster, to create a new Google Cloud Dataproc cluster. Use the ‑‑no-address parameter with the ‑‑network flag to create a Dataproc cluster that will utilize a subnetwork with the same name as the network in the region where the cluster is created. With ‑‑no-address, the cluster will assign private, internal IP addresses to all its instances:
gcloud dataproc clusters create tm-new-dataproc-cluster
--project=cc-bigdata-project-123123
--region=us-central1
--single-node
--master-machine-type=n1-standard-4
--master-boot-disk-size=500GB
--master-boot-disk-type=pd-standard
--network default
--no-address
04 The command output should return the information (region and URL) available for the new Dataproc cluster:
Waiting for cluster creation operation...done.
Created [https://dataproc.googleapis.com/v1/projects/cc-bigdata-project-123123/regions/us-central1/clusters/cc-new-dataproc-cluster] Cluster placed in zone [us-central1-c].
05 If required, migrate the source cluster data to the newly created (target) cluster.
06 Update your application to reference the new Google Cloud Dataproc cluster.
07 Once the new cluster is operating successfully, you can remove the source cluster in order to stop adding charges to your Google Cloud bill. Run dataproc clusters delete command (Windows/macOS/Linux) using the name of the resource that you want to remove as the identifier parameter, to delete the specified Dataproc cluster:
gcloud dataproc clusters delete tm-prod-dataproc-cluster --region=us-central1
08 Type Y and press Enter to confirm the resource removal. All the cluster disks will be permanently deleted, therefore make sure that your data has been successfully exported to the new cluster before removal:
The cluster 'tm-prod-dataproc-cluster' and all attached disks will be deleted.
Do you want to continue (Y/n)? Y
09 The output should return the dataproc clusters delete command request status:
Waiting for cluster deletion operation...done.
Deleted [https://dataproc.googleapis.com/v1/projects/cc-bigdata-project-123123/regions/us-central1/clusters/tm-prod-dataproc-cluster].
10 Repeat steps no. 1 – 9 for each Dataproc cluster that you want to re-create, available in the selected GCP project.
11 Repeat steps no. 1 – 11 for each GCP project deployed in your Google Cloud account.