Idle AWS ElastiCache Nodes

Risk Level: High (not acceptable risk)

Rule ID: EC-015

Identify any Amazon ElastiCache cluster nodes that appear to be idle and delete them to help lower the cost of your monthly AWS bill. By default, an Amazon ElastiCache cluster node is considered "idle" when meets the following criteria:
- The average CPU Utilization has been less than 2% for the last 7 days.
The CloudWatch metric used to detect idle ElastiCache cluster nodes is:
- CPUUtilization (host-level metric): the percentage of CPU resources used by ElastiCache cache nodes (Units: Percentage).

This rule resolution is part of the Conformity solution.

Sustainability

Cost
optimisation

Idle Amazon ElastiCache cluster nodes represent a good candidate for reducing your monthly AWS costs. Regularly checking your ElastiCache cluster nodes for CPU utilization will help you to detect and remove any idle ElastiCache resources from your AWS account in order to avoid accumulating unnecessary charges.

Note 1: Knowing the role and the owner of an Amazon ElastiCache cluster before you take the decision to remove its cache nodes is very important. For this rule Trend Cloud One™ – Conformity assumes that your ElastiCache clusters are tagged with "Role" and "Owner" tags which provide visibility into their usage profile and help you decide whether it's safe or not to terminate the cluster resources.
Note 2: You can change the default threshold for this rule in your Conformity account and set your own values for the CPU usage and the recording time range (days) for each condition in order to configure the node idleness.
Note 3: If the Amazon ElastiCache cluster selected for the checkup is needed within your application stack, you can suppress (disable) the conformity rule check for your cluster from the Conformity account console.

Audit

To identify any idle Amazon ElastiCache cluster nodes available within your AWS cloud account, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon ElastiCache console available at https://console.aws.amazon.com/elasticache/.

03 In the main navigation panel, under Resources, choose Redis caches to access the cache clusters created with Redis or Memcached caches to access the cache clusters created with Memcached.

04 Click on the name (link) of the Redis/Memcached cache cluster that you want to examine.

05 Select the Metrics tab, choose the CPU Utilization graph thumbnail, click on the options button (i.e. 3-dot icon), and select Enlarge to maximize the panel with the node CPU usage. On the CPU Utilization metric panel, set the following parameters:

From the "Period" dropdown list, select 1 Hour.
From the "Statistic" dropdown list, select Average.
From the "Time range" menu, choose Custom, select the Relative tab, and choose 1 Weeks. Once the monitoring data is loaded, check the node CPU usage recorded in the last 7 days. If the average usage (percentage) has been less than 2%, the nodes provisioned for the selected Redis/Memcached cluster are eligible to be considered idle. Choose Close to close the metric panel.

06 Determine the cache cluster role and owner within your application stack by checking the Role and Owner tags assigned to your Amazon ElastiCache cluster in order to decide whether or not it's safe to terminate the cluster. To check for the required tags, perform the following actions:

Select the Tags tab to access the tag sets configured for the selected Redis/Memcached cache cluster.
Check the Role tag value, available in the Value column, or any Role-like tag value that can provide information about the usage profile of the cache cluster in order to decide if the selected resource can be deleted or not.
Check the Owner tag value, available in the Value column, or any Owner-like tag value that can provide the contact information of the resource owner such as name, email, phone number, etc., in order to decide if the verified cache cluster can be terminated or not.

07 If all the conditions outlined at step no. 5 and 6 are met, the cluster nodes are considered idle, therefore, the selected Amazon ElastiCache cluster can be deleted in order to stop incurring charges for cache compute resources.

08 Repeat steps no. 4 – 7 for each Amazon ElastiCache cluster available within the current AWS region.

09 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.

Using AWS CLI

01 Run describe-cache-clusters command (OSX/Linux/UNIX) to list the identifier (name) of each Amazon ElastiCache cluster available in the selected AWS cloud region:

aws elasticache describe-cache-clusters
  --region us-east-1
  --output table
  --query 'CacheClusters[*].CacheClusterId'

02 The command output should return a table with the requested cluster names:

-------------------------------------
|       DescribeCacheClusters       |
+-----------------------------------+
|  cc-production-memcache-cluster   |
|  cc-production-redis-cluster-001  |
|  cc-production-redis-cluster-002  |
+-----------------------------------+

03 Run get-metric-statistics command (OSX/Linux/UNIX) to get the statistics recorded by Amazon CloudWatch for the CPUUtilization metric, representing the CPU usage of the selected ElastiCache cluster. Change the --start-time (start recording date) and --end-time (stop recording date) command parameters value to choose your own time frame for recording the CPUUtilization metric usage. Set the --period parameter value to define the granularity (in seconds) of the returned datapoints, based on your monitoring requirements. A period can be as short as 1 minute (60 seconds) or as long as 1 day (86400 seconds). The following example returns the average CPU utilization recorded for a cache cluster named "cc-production-memcache-cluster", usage data captured over a time period of 7 days, using 1-hour period as the granularity for the returned datapoints:

aws cloudwatch get-metric-statistics
  --region us-east-1
  --metric-name CPUUtilization
  --start-time 2024-05-18T10:32:00
  --end-time 2024-05-25T02:32:00
  --period 3600
  --namespace AWS/ElastiCache
  --statistics Average
  --dimensions Name=CacheClusterId,Value=cc-production-memcache-cluster

04 The command output should return the CPUUtilization metric usage information:

{
	"Datapoints": [
		{
			"Timestamp": "2024-05-18T10:32:00Z",
			"Average": 1.0380,
			"Unit": "Percent"
		},
		{
			"Timestamp": "2024-05-18T11:32:00Z",
			"Average": 1.2113,
			"Unit": "Percent"
		},
		{
			"Timestamp": "2024-05-18T12:32:00Z",
			"Average": 1.1460,
			"Unit": "Percent"
		},

		...

		{
			"Timestamp": "2024-05-25T00:32:00Z",
			"Average": 0.530999999999999993,
			"Unit": "Percent"
		},
		{
			"Timestamp": "2024-05-25T01:32:00Z",
			"Average": 0.22833333333333333,
			"Unit": "Percent"
		},
		{
			"Timestamp": "2024-05-25T02:32:00Z",
			"Average": 0.12783333333333333,
			"Unit": "Percent"
		}
	],
	"Label": "CPUUtilization"
}

If the average usage (percentage) has been less than 2%, the cache nodes provisioned for the selected Redis/Memcached cluster are eligible to be considered idle.

05 Run list-tags-for-resource command (OSX/Linux/UNIX) to describe the tag sets configured for the selected Amazon ElastiCache cluster:

aws elasticache list-tags-for-resource
  --region us-east-1
  --resource-name arn:aws:elasticache:us-east-1:0123456789012:cluster:cc-production-memcache-cluster

06 The command output should return the tag sets (key-value pairs) defined for the selected cache cluster. The Role and Owner tag values can be used to determine the resource role within the application stack and to contact the cluster owner for more information in order to decide if your ElastiCache resource can be deleted or not:

{
	"TagList": [
		{
			"Value": "webapp-test-cache-cluster",
			"Key": "Role"
		},
		{
			"Value": "trendmicro.com",
			"Key": "Owner"
		}
	]
}

If the data returned for the step no. 4 and 6 satisfy the conditions set by the conformity rule (i.e. cluster role, cluster owner, and CPUUtilization metric), the selected Amazon ElastiCache cluster can be terminated in order to stop incurring charges for cache compute resources.

07 Repeat steps no. 3 – 6 for each Amazon ElastiCache cluster provisioned in the selected AWS region.

08 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.

Remediation / Resolution

Option 1: Terminate idle clusters. To terminate (delete) idle Amazon ElastiCache cache clusters, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon ElastiCache console available at https://console.aws.amazon.com/elasticache/.

04 Select the Redis/Memcached cache cluster that you want to terminate, choose Actions, and select Delete.

05 In the confirmation box, choose whether to create a final backup for your cache cluster (for Redis clusters only), type the name of the selected cluster in the required text box, then choose Delete to confirm the cluster removal.

06 Repeat steps no. 4 and 5 for each idle Amazon ElastiCache cluster that you want to delete, available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other regions.

Using AWS CLI

01 For Redis cache clusters:

To remove an idle Redis cache cluster (replication group) from your AWS cloud account, run delete-replication-group command (OSX/Linux/UNIX):
```
aws elasticache delete-replication-group
  --region us-east-1
  --replication-group-id cc-production-redis-cluster
```

The output should return the information available for the deleted Redis cache cluster:

{
	"ReplicationGroup": {
		"ReplicationGroupId": "cc-production-redis-cluster",
		"Description": " ",
		"GlobalReplicationGroupInfo": {},
		"Status": "deleting",
		"PendingModifiedValues": {},
		"AutomaticFailover": "disabled",

		...

		"SnapshotRetentionLimit": 0,
		"SnapshotWindow": "05:00-06:00",
		"TransitEncryptionEnabled": true,
		"AtRestEncryptionEnabled": true,
		"LogDeliveryConfigurations": [],
		"DataTiering": "disabled"
	}
}

02 For Memcached cache clusters:

To remove an idle Memcached cache cluster from your AWS cloud account, run delete-cache-cluster command (OSX/Linux/UNIX):
```
aws elasticache delete-cache-cluster
  --region us-east-1
  --cache-cluster-id cc-production-memcache-cluster
```

The output should return the information available for the deleted Memcached cache cluster:

{
	"CacheCluster": {
		"CacheClusterId": "cc-production-memcache-cluster",
		"CacheNodeType": "cache.r5.large",
		"Engine": "memcached",

		...

		"TransitEncryptionEnabled": true,
		"AtRestEncryptionEnabled": false,
		"ReplicationGroupLogDeliveryEnabled": false
	}
}

03 Repeat steps no. 1 and 2 for each idle Amazon ElastiCache cluster that you want to terminate, available in the selected AWS region.

04 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

Option 2: Disable the conformity rule check. If your idle Amazon ElastiCache cluster is expected to be used soon and the resource role within your application stack is important, you can turn off the rule check for the cache cluster, from your Trend Cloud One™ – Conformity account console.

References

AWS Documentation
Amazon ElastiCache Product Features and Details
Dimensions for ElastiCache Metrics
Host-Level Metrics
Which Metrics Should I Monitor?

AWS Command Line Interface (CLI) Documentation
describe-cache-clusters
list-tags-for-resource
get-metric-statistics
delete-cache-cluster
delete-replication-group

Publication date May 2, 2017

Audit

Using AWS Console

Using AWS CLI

Remediation / Resolution

Using AWS Console

Using AWS CLI

References

Related ElastiCache rules