Enable Instance Group Autohealing

Risk Level: High (not acceptable risk)

Ensure that your Google Cloud Managed Instance Groups (MIGs) are configured with Autohealing feature. Autohealing allows re-creating virtual machine instances when they become unresponsive.

Reliability

Managed Instance Groups (MIGs) maintain high availability of your cloud applications by proactively keeping your VM instances available, in the "RUNNING" state. A MIG automatically re-creates an instance that is not in a "RUNNING" state. However, relying only on the VM state may not be sufficient. You may need to re-create instances when an application freezes, crashes, or runs out of memory. Application-based autohealing improves application availability by relying on a health checking signal that detects application-specific issues such as freezing, crashing, or overloading. If a health check determines that your cloud application has failed on a virtual machine within the instance group, the group automatically re-creates that VM instance.

Audit

To determine if all your Managed Instance Groups (MIGs) are using autohealing, perform the following operations:

Using GCP Console

01 Sign in to Google Cloud Management Console.

02 Select the Google Cloud Platform (GCP) project that you want to examine from the console top navigation bar.

03 Navigate to Google Compute Engine dashboard at https://console.cloud.google.com/compute.

04 In the navigation panel, select Instance groups to access the list with the VM instance groups created for the selected project.

05 Click inside the Filter resources box, select Group type and google.internal.cloud.console.clientapi.gce.mig.instancegrouptype.managed to list all the Managed Instance Groups (MIGs) available within the selected project.

06 Click on the name of the MIG that you want to examine.

07 Select the Details tab to access the resource configuration details.

08 On the Details panel, check for any health check entries displayed under Autohealing. If there are no health checks listed under Autohealing, the selected Google Cloud Managed Instance Group (MIG) is not using autohealing, therefore the Compute Engine service will re-create the group instances only when they're not running.

09 Repeat step no. 6 – 8 for each MIG resource provisioned for the selected project.

10 Repeat steps no. 2 – 9 for each project deployed in your Google Cloud account.

Using GCP CLI

01 Run projects list command (Windows/macOS/Linux) using custom query filters to list the IDs of all the Google Cloud Platform (GCP) projects available in your Google Cloud account:

gcloud projects list
    --format="table(projectId)"

02 The command output should return the requested GCP project IDs:

PROJECT_ID
cc-web-stack-project-123123
cc-app-stack-project-112233

03 Run compute instance-groups managed list command (Windows/macOS/Linux) using the ID of the GCP project that you want to examine as identifier parameter and custom query filters to describe the name and zone for each Managed Instance Group (MIG) created for the selected project:

gcloud compute instance-groups managed list
    --project cc-web-stack-project-123123
    --format="table(name,location)"

04 The command output should return the name(s) of the MIG(s) available in the selected GCP project:

NAME                           ZONE
cc-production-instance-group   us-central1-a
cc-internal-vm-instance-group  us-central1-a

05 Run compute instance-groups managed describe command (Windows/macOS/Linux) using the name and the zone of the Managed Instance Group that you want to examine as identifier parameter and custom filtering, to describe the URL of the health check resource configured for the selected instance group:

gcloud compute instance-groups managed describe cc-production-instance-group
    --zone=us-central1-a
    --format="yaml(autoHealingPolicies[].healthCheck)"

06 The command output should return the requested resource URL:

null

If the compute instance-groups managed describe command output returns null, there is no health check configured for instance autohealing, therefore the selected Google Cloud Managed Instance Group (MIG) is not using autohealing for their virtual machine instances.

07 Repeat step no. 5 and 6 for each MIG resource available within the selected project.

08 Repeat steps no. 3 – 7 for each GCP project deployed in your Google Cloud account.

Remediation / Resolution

To enable autohealing for your existing Google Cloud Managed Instance Groups (MIGs) using health checks, perform the following operations:

Note: As example, this conformity rule demonstrates how to enable autohealing by creating a health check that looks for a web server response on port 80 (HTTP).

Using GCP Console

01 Sign in to Google Cloud Management Console.

02 Select the GCP project that you want to access from the console top navigation bar.

03 Navigate to Google Compute Engine dashboard at https://console.cloud.google.com/compute.

04 In the navigation panel, select Instance groups to access the list with the VM instance groups created for the selected project.

05 Click inside the Filter resources box, select Group type and google.internal.cloud.console.clientapi.gce.mig.instancegrouptype.managed to list only the Managed Instance Groups (MIGs) available within the selected project.

06 Click on the name of the MIG resource that you want to reconfigure, and choose EDIT GROUP to access the instance group editing page.

07 Under Autohealing, select Create a health check from the Heath check dropdown list to initiate the setup process.

08 On the MIG heath check setup panel, perform the following actions:

In the Name box, give the new health check resource a unique name.
(Optional) For Description, provide a short and concise description for the health check.
For Protocol, make sure that HTTP is selected.
For Port, use port 80.
For Proxy protocol, make sure that NONE is selected.
For Request path, enter the path of the HTTP health check request. The default is /.
Under Health criteria, use the Check interval field to configure how often (in seconds) to send a health check. The default is 10 seconds.
For Timeout, configure how long to wait (in seconds) before a request is considered a failure. The default is 5 seconds.
For Healthy threshold, set a healthy threshold (number) in order to determine how many consecutive successful health checks must be returned before an unhealthy virtual machine is marked as healthy. The default is 2.
For Unhealthy threshold, configure an unhealthy threshold (number) to determine how many consecutive unsuccessful health checks must be returned before a healthy VM instance is marked as unhealthy. The default is 3.
Click Save and continue to create the health check resource and associate it with the selected instance group.

09 For Initial delay, set the time to allow an instance to boot and applications to fully start before the first health check. This setting delays autohealing from potentially prematurely re-creating the virtual machine if the VM is in the process of starting up.

10 Click Save to apply the configuration changes.

11 Health check probes come from IPv4 addresses in the ranges 130.211.0.0/22 and 35.191.0.0/16, therefore make sure that your network firewall rules allow the health check to connect. If TCP port 80 (HTTP) is not already open in the network firewall associated with the selected instance group, perform the following actions to create the required firewall rule:

Navigate to VPC Network dashboard at https://console.cloud.google.com/networking.
In the navigation panel select Firewall and choose CREATE FIREWALL RULE to create a new firewall rule.
On the Create a firewall rule setup page, perform the following:
- For Name, enter a unique name for the firewall rule (e.g. allow-health-check).
- For Description, provide a short description for the new rule.
- Select the VPC network associated with the reconfigured instance group from the Network dropdown list.
- From Targets dropdown list, choose All instances in the network if you don’t have target tags configured.
- Select IP ranges from the Source filter dropdown list.
- For Source IP ranges, enter 130.211.0.0/22 and 35.191.0.0/16.
- In Protocols and ports section, choose Specified protocols and ports, select tcp, and enter 80 in the tcp port box.
- Click CREATE to create the required firewall rule.

12 Repeat steps no. 6 – 11 to reconfigure other Managed Instance Groups (MIGs) available in the selected GCP project.

13 Repeat steps no. 2 – 12 for each GCP project created within your Google Cloud account.

Using GCP CLI

01 Run compute health-checks create http command (Windows/macOS/Linux) to create a HTTP health check. The following command example creates a health check that looks for a response on port 80 (HTTP), that can tolerate some failure before it marks the associated virtual machine instances as unhealthy and causes them to be re-created. In the following example, the VM instances are marked healthy if they respond successfully once and unhealthy if they respond unsuccessfully 3 consecutive times:

gcloud compute health-checks create http cc-production-health-check
    --port 80
    --check-interval 30s
    --healthy-threshold 1
    --timeout 10s
    --unhealthy-threshold 3

02 The command output should return the URL of the new health check resource:

Created [https://www.googleapis.com/compute/v1/projects/cc-web-stack-project-123123/global/healthChecks/cc-production-health-check].

NAME                        PROTOCOL
cc-production-health-check  HTTP

03 The health check probe come from the IP addresses within the ranges 130.211.0.0/22 and 35.191.0.0/16, therefore you need to make sure that your network firewall rules allow health check probe access. If TCP port 80 (HTTP) is not already open in the network firewall associated with your instance group, run compute firewall-rules create command (Windows/macOS/Linux) to create a new firewall rule that allows probe access:

gcloud compute firewall-rules create cc-allow-health-check
    --allow tcp:80
    --source-ranges 130.211.0.0/22,35.191.0.0/16
    --network cc-web-stack-network

04 The command output should return the new firewall rule information:

Created [https://www.googleapis.com/compute/v1/projects/cc-web-stack-project-123123/global/firewalls/cc-allow-health-check].

NAME                   NETWORK               DIRECTION  PRIORITY  ALLOW   DISABLED
cc-allow-health-check  cc-web-stack-network  INGRESS    1000      tcp:80  False

05 Run compute instance-groups managed update command (Windows/macOS/Linux) using the name of the Managed Instance Group (MIG) that you want to reconfigure as identifier parameter, to enable autohealing by attaching the health check resource created at the previous steps to the selected MIG. Once the health check is attached, it can take 30 minutes before autohealing begins monitoring the instances in the selected group:

gcloud compute instance-groups managed update cc-production-instance-group
    --health-check cc-production-health-check
    --initial-delay 300
    --zone us-central1-a

06 The output should return the full URL of the reconfigured instance group:

Starting Updated [https://www.googleapis.com/compute/v1/projects/cc-web-stack-project-123123/zones/us-central1-a/instanceGroupManagers/cc-production-instance-group].

07 Repeat steps no. 1 – 6 to reconfigure other Managed Instance Groups (MIGs) provisioned for the selected project.

08 Repeat steps no. 1 – 7 for each GCP project deployed within your Google Cloud account.

References

Google Cloud Platform (GCP) Documentation
Instance groups
Setting up health checking and autohealing

GCP Command Line Interface (CLI) Documentation
gcloud projects list
gcloud compute instance-groups managed list
gcloud compute instance-groups managed describe
gcloud compute health-checks create http
gcloud compute firewall-rules create
gcloud compute instance-groups managed update

Publication date May 10, 2021

Audit

Using GCP Console

Using GCP CLI

Remediation / Resolution

Using GCP Console

Using GCP CLI

References

Related ComputeEngine rules