Use the Conformity Knowledge Base AI to help improve your Cloud Posture

AWS EMR Instance Type Generation

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Medium (should be achieved)
Rule ID: EMR-001

Ensure that all the Amazon EMR cluster instances are using the latest generation of instance types in order to get the best performance with lower costs. If you are using cluster instances from the previous generation, Trend Cloud One™ – Conformity strongly recommends that you upgrade your instances with their latest generation equivalents.

This rule can help you work with the AWS Well-Architected Framework.

This rule resolution is part of the Conformity Security & Compliance tool for AWS.

Performance
efficiency
Cost
optimisation

Using the latest generation of Amazon EMR cluster instances instead of the previous generation instances has tangible benefits such as better hardware performance (more computing capacity and faster CPUs, memory optimization and higher network throughput), and lower costs for memory and storage. For example, the new generation memory-optimized (R3) instances are 9% faster than the previous ones and the compute-optimized (C3 and C4) instances are 37% faster than the old generation (C1) instances.


Audit

To determine if your Amazon EMR clusters are using instances from the previous generation, perform the following actions:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

03 In the main navigation panel, under EMR on EC2, choose Clusters.

04 Click on the name (link) of the Amazon EMR cluster that you want to examine.

05 Select the Hardware tab and check the instance type for each instance provisioned within the selected cluster, listed in the Instance type column, to determine if the instance type is from the previous generation. If the instance type is from the previous generation, the instance type configured for the Amazon EMR cluster instances should be upgraded to the latest generation.

06 Repeat steps no. 4 and 5 for each Amazon EMR cluster available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.

Using AWS CLI

01 Run list-clusters command (OSX/Linux/UNIX) with custom query filters to list the name of each active Amazon EMR cluster provisioned in the selected AWS region:

aws emr list-clusters
  --region us-east-1
  --active
  --output table
  --query 'Clusters[*].Id'

02 The command output should return a table with the requested EMR cluster ID(s):

--------------------
|   ListClusters   |
+------------------+
|  j-ABCDABCDABCD  |
|  j-ABCD1234ABCD  |
+------------------+

03 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to examine as the identifier parameter and custom query filters to describe the Amazon S3 location URI used by the selected EMR cluster for the log files storage:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-ABCDABCDABCD
  --query 'Cluster.InstanceGroups[*].InstanceType'

04 The command output should return the EMR cluster instance type(s):

[
    "m1.xlarge"
]

Compare the instance type returned by the describe-cluster command output with the instance type(s) from the previous generation. If the instance type is from the previous generation, the instance type configured for the Amazon EMR cluster instances should be upgraded to the latest generation.

05 Repeat steps no. 3 and 4 for each Amazon EMR cluster available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.

Remediation / Resolution

To upgrade your previous generation EMR cluster instances to their latest generation equivalents, perform the following actions:

Using AWS CloudFormation

01 CloudFormation template (JSON):

{
	"AWSTemplateFormatVersion": "2010-09-09",
	"Description": "Upgrade Cluster Instance Generation by Setting the Latest Generation Instance Type Equivalent for 'ClusterInstanceType' Stack Parameter",
	"Parameters" : {
		"ReleaseLabel" : {
			"Type" : "String"
		},
		"ClusterInstanceType" : {
			"Type" : "String"
		},
		"EbsRootVolumeSize" : {
			"Type" : "String"
		},
		"SubnetId" : {
			"Type" : "String"
		}
	},
	"Resources": {
		"EMRCluster": {
			"Type": "AWS::EMR::Cluster",
			"Properties": {
				"Name": "cc-emr-production-cluster",
				"ReleaseLabel" : {"Ref" : "ReleaseLabel"},
				"Instances": {
					"MasterInstanceGroup": {
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-master-instance"
					},
					"CoreInstanceGroup": {
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-core-instance"
					},
					"TaskInstanceGroups": [
						{
							"InstanceCount": 1,
							"InstanceType": {"Ref" : "ClusterInstanceType"},
							"Market": "ON_DEMAND",
							"Name": "cc-task-instance-1"  
						},
						{
							"InstanceCount": 1,
							"InstanceType": {"Ref" : "ClusterInstanceType"},
							"Market": "ON_DEMAND",
							"Name": "cc-task-instance-2"  
						}
					],
					"Ec2SubnetId" : {"Ref" : "SubnetId"}
				},
				"EbsRootVolumeSize" : {"Ref" : "EbsRootVolumeSize"},
				"ServiceRole" : {"Ref": "EMRRole"},
				"JobFlowRole" : {"Ref": "EMREC2InstanceProfile"},
				"VisibleToAllUsers" : true
			}
		},
		"EMRRole": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
					{
						"Sid": "",
						"Effect": "Allow",
						"Principal": {
							"Service": "elasticmapreduce.amazonaws.com"
						},
						"Action": "sts:AssumeRole"
					}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"]
			}
		},
		"EMREC2Role": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
						{
							"Sid": "",
							"Effect": "Allow",
							"Principal": {
								"Service": "ec2.amazonaws.com"
							},
							"Action": "sts:AssumeRole"
						}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"]
			}
		},
		"EMREC2InstanceProfile": {
			"Type": "AWS::IAM::InstanceProfile",
			"Properties": {
				"Path": "/",
				"Roles": [ {
					"Ref": "EMREC2Role"
				} ]
			}
		}
	}
}

02 CloudFormation template (YAML):

AWSTemplateFormatVersion: '2010-09-09'
	Description: Upgrade Cluster Instance Generation by Setting the Latest Generation
		Instance Type Equivalent for 'ClusterInstanceType' Stack Parameter
	Parameters:
		ReleaseLabel:
		Type: String
		ClusterInstanceType:
		Type: String
		EbsRootVolumeSize:
		Type: String
		SubnetId:
		Type: String
	Resources:
		EMRCluster:
		Type: AWS::EMR::Cluster
		Properties:
			Name: cc-emr-production-cluster
			ReleaseLabel: !Ref 'ReleaseLabel'
			Instances:
			MasterInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-master-instance
			CoreInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-core-instance
			TaskInstanceGroups:
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-1
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-2
			Ec2SubnetId: !Ref 'SubnetId'
			EbsRootVolumeSize: !Ref 'EbsRootVolumeSize'
			ServiceRole: !Ref 'EMRRole'
			JobFlowRole: !Ref 'EMREC2InstanceProfile'
			VisibleToAllUsers: true
		EMRRole:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: elasticmapreduce.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole
		EMREC2Role:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: ec2.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
		EMREC2InstanceProfile:
		Type: AWS::IAM::InstanceProfile
		Properties:
			Path: /
			Roles:
			- !Ref 'EMREC2Role'

Using Terraform (AWS Provider)

01 Terraform configuration file (.tf):

terraform {
	required_providers {
		aws = {
			source  = "hashicorp/aws"
			version = "~> 4.0"
		}
	}

	required_version = ">= 0.14.9"
}

provider "aws" {
	region  = "us-east-1"
}

resource "aws_emr_cluster" "emr-cluster" {

	name          = "cc-prod-emr-cluster"
	release_label = "emr-5.35.0"
	applications  = ["Spark"]

	master_instance_group {

	# Upgrade Master Instance Generation 
	instance_type = "m5.xlarge"
	}

	core_instance_group {

	# Upgrade Core Instance Generation 
	instance_type  = "m5.xlarge"
	instance_count = 1

	ebs_config {
		size                 = "50"
		type                 = "gp2"
		volumes_per_instance = 1
	}

	}

	ebs_root_volume_size = 50
	service_role = aws_iam_role.iam_emr_service_role.arn

	ec2_attributes {
		subnet_id                         = "subnet-01234123412341234"
		emr_managed_master_security_group = "sg-01234abcd1234abcd"
		emr_managed_slave_security_group  = "sg-0abcd1234abcd1234"
		instance_profile                  = aws_iam_instance_profile.emr_instance_profile.arn
	}

}

resource "aws_iam_role" "iam_emr_service_role" {
	name = "cc-emr-service-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
	"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
			"Service": "elasticmapreduce.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_role_policy" "iam_emr_service_policy" {
	name = "cc-emr-service-role-policy"
	role = aws_iam_role.iam_emr_service_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"ec2:AuthorizeSecurityGroupEgress",
			"ec2:AuthorizeSecurityGroupIngress",
			"ec2:CancelSpotInstanceRequests",
			"ec2:CreateNetworkInterface",
			"ec2:CreateSecurityGroup",
			"ec2:CreateTags",
			"ec2:DeleteNetworkInterface",
			"ec2:DeleteSecurityGroup",
			"ec2:DeleteTags",
			"ec2:DescribeAvailabilityZones",
			"ec2:DescribeAccountAttributes",
			"ec2:DescribeDhcpOptions",
			"ec2:DescribeInstanceStatus",
			"ec2:DescribeInstances",
			"ec2:DescribeKeyPairs",
			"ec2:DescribeNetworkAcls",
			"ec2:DescribeNetworkInterfaces",
			"ec2:DescribePrefixLists",
			"ec2:DescribeRouteTables",
			"ec2:DescribeSecurityGroups",
			"ec2:DescribeSpotInstanceRequests",
			"ec2:DescribeSpotPriceHistory",
			"ec2:DescribeSubnets",
			"ec2:DescribeVpcAttribute",
			"ec2:DescribeVpcEndpoints",
			"ec2:DescribeVpcEndpointServices",
			"ec2:DescribeVpcs",
			"ec2:DetachNetworkInterface",
			"ec2:ModifyImageAttribute",
			"ec2:ModifyInstanceAttribute",
			"ec2:RequestSpotInstances",
			"ec2:RevokeSecurityGroupEgress",
			"ec2:RunInstances",
			"ec2:TerminateInstances",
			"ec2:DeleteVolume",
			"ec2:DescribeVolumeStatus",
			"ec2:DescribeVolumes",
			"ec2:DetachVolume",
			"iam:GetRole",
			"iam:GetRolePolicy",
			"iam:ListInstanceProfiles",
			"iam:ListRolePolicies",
			"iam:PassRole",
			"s3:CreateBucket",
			"s3:Get*",
			"s3:List*",
			"sdb:BatchPutAttributes",
			"sdb:Select",
			"sqs:CreateQueue",
			"sqs:Delete*",
			"sqs:GetQueue*",
			"sqs:PurgeQueue",
			"sqs:ReceiveMessage"
		]
	}]
}
EOF
}

resource "aws_iam_role" "iam_emr_profile_role" {
	name = "emr-instance-profile-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
	"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
			"Service": "ec2.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_instance_profile" "emr_instance_profile" {
	name = "emr-instance-profile"
	role = aws_iam_role.iam_emr_profile_role.name
}

resource "aws_iam_role_policy" "iam_emr_profile_policy" {
	name = "emr-instance-profile-policy"
	role = aws_iam_role.iam_emr_profile_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"cloudwatch:*",
			"dynamodb:*",
			"ec2:Describe*",
			"elasticmapreduce:Describe*",
			"elasticmapreduce:ListBootstrapActions",
			"elasticmapreduce:ListClusters",
			"elasticmapreduce:ListInstanceGroups",
			"elasticmapreduce:ListInstances",
			"elasticmapreduce:ListSteps",
			"kinesis:CreateStream",
			"kinesis:DeleteStream",
			"kinesis:DescribeStream",
			"kinesis:GetRecords",
			"kinesis:GetShardIterator",
			"kinesis:MergeShards",
			"kinesis:PutRecord",
			"kinesis:SplitShard",
			"rds:Describe*",
			"s3:*",
			"sdb:*",
			"sns:*",
			"sqs:*"
		]
	}]
}
EOF
}

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

03 In the main navigation panel, under EMR on EC2, choose Clusters.

04 Select the Amazon EMR cluster that you want to re-create and choose Clone from the console top menu.

05 In the Cloning <emr-cluster-id> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Choose Clone to start the cloning process.

06 On the Create Cluster - Advanced Options page, perform the following operations:

  1. Choose Step 1: Software and Steps from the left navigation panel and configure the software stack that will be installed on the new cluster. Choose Next to continue the setup process.
  2. For Step 2: Hardware, select the equivalent latest generation instance type for each provisioned instance listed in the Cluster Nodes and Instances section, regardless of the instance node type (i.e. master, core, or task). Choose the VPC network and subnet where the EMR cluster instances will be deployed from the Networking section, and set the EBS volume size for the root device from the EBS Root Volume section. Choose Next to continue.
  3. For Step 3: General Cluster Settings, choose whether to enable the Termination Protection safety feature, configure the cluster logging, and create any required tag sets. Choose Next to continue.
  4. For Step 4: Security, make sure that the right permissions are applied to the new cluster, select the appropriate EC2 key pair, configure the security options, then choose Create cluster to provision your new Amazon EMR cluster.

07 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for that EMR resource. To terminate the source Amazon EMR cluster, perform the following actions:

  1. Select the EMR cluster that you want to shut down and choose Terminate from the console top menu.
  2. Choose the Terminate button from the console top menu.
  3. Within the Terminate clusters confirmation box, review the cluster details, set the Termination protection to Off, then choose Terminate to remove the source EMR cluster from your AWS account.

08 Repeat steps no. 4 – 7 for each Amazon EMR cluster that you want to redeploy, available within the current AWS region.

09 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other AWS regions.

Using AWS CLI

01 Get the configuration details from the source (original) EMR cluster. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to re-create as the identifier parameter, to list the configuration information available for the selected cluster:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-AAAABBBBCCCCD

02 The command output should return the requested cluster configuration information:

{
   "Cluster": {
     "Name": "cc-hadoop-cluster",
     "ServiceRole": "EMR_DefaultRole",
     "Tags": [],
     "TerminationProtected": false,
     "NormalizedInstanceHours": 4,

     ...

     "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR",
     "VisibleToAllUsers": true,
     "BootstrapActions": [],
     "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/",
     "AutoTerminate": false,
     "Id": "j-AAAABBBBCCCCD"
   }
}

03 Run create-cluster command (OSX/Linux/UNIX) to re-create your Amazon EMR cluster with instances configured with the equivalent instance type(s) from the current generation. The following command example creates an EMR cluster with one m5.xlarge-type master instance and two m5.xlarge-type core instances, named "cc-emr-production-cluster":

aws emr create-cluster
  --region us-east-1
  --name cc-emr-production-cluster
  --release-label emr-4.0.0
  --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge
  --service-role EMR_DefaultRole
  --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-0abcd1234abcd1234,EmrManagedSlaveSecurityGroup=sg-01234abcd1234abcd,AvailabilityZone=us-east-1a,SubnetId=subnet-0abcd1234abcd1234
  --visible-to-all-users
  --no-auto-terminate

04 The command output should return the ID of your new Amazon EMR cluster:

{
  "ClusterId": "j-BBBBCCCCDDDDE"
}

05 (Optional) You can now terminate the source cluster in order to stop incurring charges for it. To terminate the source Amazon EMR cluster, run terminate-clusters command (OSX/Linux/UNIX) using the ID of the cluster that you want to delete as the identifier parameter (the command does not produce an output):

aws emr terminate-clusters
  --region us-east-1
  --cluster-ids j-AAAABBBBCCCCD

06 Repeat steps no. 1 – 5 for each Amazon EMR cluster that you want to redeploy, available in the selected AWS region.

07 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Feb 24, 2017