Use the Conformity Knowledge Base AI to help improve your Cloud Posture

Cluster in VPC

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Medium (should be achieved)
Rule ID: EMR-005

Ensure that your Amazon Elastic MapReduce (EMR) clusters are provisioned using the AWS EC2-VPC platform instead of EC2-Classic platform for better flexibility and control over security, better traffic routing, and availability.

This rule can help you with the following compliance standards:

  • PCI
  • HIPAA
  • APRA
  • MAS

For further details on compliance standards supported by Conformity, see here.

This rule can help you work with the AWS Well-Architected Framework.

This rule resolution is part of the Conformity Security & Compliance tool for AWS.

Security

Launching and managing Amazon EMR clusters using the EC2-VPC platform can bring multiple advantages such as better networking infrastructure (network isolation, private subnets and private IP addresses), flexible control over access security (network ACLs and security group outbound/egress traffic filtering), and access to newer and powerful EC2 instance types for your clusters. Even more, if you are processing sensitive data with your Amazon EMR clusters, you may want the additional access control provided by the EC2-VPC platform, that can be enabled by launching your clusters within a VPC.


Audit

To determine the type of the AWS platform used to launch your Amazon EMR clusters, perform the following actions:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon EC2 console at https://console.aws.amazon.com/ec2/v2/.

03 In the main navigation panel, choose EC2 Dashboard.

04 In the Account attributes upper-right section, check the Supported Platforms attribute value. If the Supported Platforms value is set to VPC, your AWS account uses only the EC2-VPC platform, i.e. all EMR cluster instances are launched within a VPC network, therefore the Audit process ends here. If the Supported Platforms is set to EC2 and VPC, your AWS account supports both EC2-Classic and EC2-VPC platforms. To identify Amazon EMR clusters launched with the EC2-Classic platform, continue the Audit with the next step.

05 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

06 In the main navigation panel, under EMR on EC2, choose Clusters.

07 Click on the name (link) of the Amazon EMR cluster that you want to examine.

08 Select the Summary tab and search for the Subnet ID configuration attribute listed in the Network and hardware section. The Subnet ID attribute value represents the ID of the VPC subnet where the EMR cluster instances have been provisioned. If there is no Subnet ID attribute listed in the Network and hardware section, the selected Amazon Elastic MapReduce (EMR) cluster was launched using the EC2-Classic platform and needs to be migrated to the EC2-VPC platform.

09 Repeat steps no. 7 and 8 for each Amazon EMR cluster available within the current AWS region.

10 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.

Using AWS CLI

01 Run describe-account-attributes command (OSX/Linux/UNIX) with custom query filters to list the platform type(s) supported by your AWS cloud account:

aws ec2 describe-account-attributes
  --region us-east-1
  --attribute-names supported-platforms
  --query 'AccountAttributes[*].AttributeValues[*].AttributeValue | []'

02 The command output should return the type(s) of the platform used for your AWS account:

[
    "EC2",
    "VPC"
]

If the describe-account-attributes command output returns only "VPC", your AWS cloud account supports only the EC2-VPC platform and all your Amazon EMR clusters were launched within a Virtual Private Cloud (VPC). If the command output returns "EC2" and "VPC", as shown in the output example above, your AWS account supports both EC2-Classic and EC2-VPC platforms. To identify Amazon EMR clusters launched with the EC2-Classic platform, continue the Audit with the next step.

03 Run list-clusters command (OSX/Linux/UNIX) with custom query filters to list the name of each active Amazon EMR cluster provisioned in the selected AWS region:

aws emr list-clusters
  --region us-east-1
  --active
  --output table
  --query 'Clusters[*].Id'

04 The command output should return a table with the requested EMR cluster ID(s):

---------------------
|   ListClusters    |
+-------------------+
|  j-AAAABBBBCCCCD  |
|  j-BBBBCCCCDDDDE  |
+-------------------+

05 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to examine as the identifier parameter and custom query filters to describe the ID of the VPC subnet where the cluster instances have been deployed:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-AAAABBBBCCCCD
  --query 'Cluster.Ec2InstanceAttributes.Ec2SubnetId'

06 The command output should return the requested subnet ID, such as "subnet-0abcd1234abcd1234", if the selected Amazon EMR cluster was provisioned within a VPC, otherwise, the describe-cluster command does not return an output. If the command does not produce an output, the cluster instances do not belong to a VPC subnet, therefore the selected Amazon Elastic MapReduce (EMR) cluster was launched using the EC2-Classic platform instead of the EC2-VPC platform.

07 Repeat steps no. 5 and 6 for each Amazon EMR cluster available in the selected AWS region.

08 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.

Remediation / Resolution

To migrate your Amazon EMR clusters from EC2-Classic to EC2-VPC platform, you must re-create the clusters within a Virtual Private Cloud (VPC). To relaunch your EMR clusters using the EC2-VPC platform, perform the following actions:

Using AWS CloudFormation

01 CloudFormation template (JSON):

{
	"AWSTemplateFormatVersion": "2010-09-09",
	"Description": "Specify the 'SubnetId' Stack Parameter to Deploy your Amazon EMR Cluster in a Virtual Private Cloud (VPC)",
	"Parameters" : {
		"ReleaseLabel" : {
			"Type" : "String"
		},
		"ClusterInstanceType" : {
			"Type" : "String"
		},
		"EbsRootVolumeSize" : {
			"Type" : "String"
		},
		"SubnetId" : {
			"Type" : "String"
		}
	},
	"Resources": {
		"EMRCluster": {
			"Type": "AWS::EMR::Cluster",
			"Properties": {
			"Name": "cc-emr-production-cluster",
			"ReleaseLabel" : {"Ref" : "ReleaseLabel"},
			"Instances": {
				"MasterInstanceGroup": {
				"InstanceCount": 1,
				"InstanceType": {"Ref" : "ClusterInstanceType"},
					"Market": "ON_DEMAND",
					"Name": "cc-master-instance"
				},
				"CoreInstanceGroup": {
					"InstanceCount": 1,
					"InstanceType": {"Ref" : "ClusterInstanceType"},
					"Market": "ON_DEMAND",
					"Name": "cc-core-instance"
				},
				"TaskInstanceGroups": [
					{
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-task-instance-1"  
					},
					{
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-task-instance-2"  
					}
				],
				"Ec2SubnetId" : {"Ref" : "SubnetId"}
			},
			"EbsRootVolumeSize" : {"Ref" : "EbsRootVolumeSize"},
			"ServiceRole" : {"Ref": "EMRRole"},
			"JobFlowRole" : {"Ref": "EMREC2InstanceProfile"},
			"VisibleToAllUsers" : true
			}
		},
		"EMRRole": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
					{
						"Sid": "",
						"Effect": "Allow",
						"Principal": {
						"Service": "elasticmapreduce.amazonaws.com"
						},
						"Action": "sts:AssumeRole"
					}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"]
			}
		},
		"EMREC2Role": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
					{
						"Sid": "",
						"Effect": "Allow",
						"Principal": {
						"Service": "ec2.amazonaws.com"
						},
						"Action": "sts:AssumeRole"
					}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"]
			}
		},
		"EMREC2InstanceProfile": {
			"Type": "AWS::IAM::InstanceProfile",
			"Properties": {
				"Path": "/",
				"Roles": [ {
					"Ref": "EMREC2Role"
				} ]
			}
		}
	}
}

02 CloudFormation template (YAML):

AWSTemplateFormatVersion: '2010-09-09'
	Description: Specify the 'SubnetId' Stack Parameter to Deploy your Amazon EMR Cluster
		in a Virtual Private Cloud (VPC)
	Parameters:
		ReleaseLabel:
		Type: String
		ClusterInstanceType:
		Type: String
		EbsRootVolumeSize:
		Type: String
		SubnetId:
		Type: String
	Resources:
		EMRCluster:
		Type: AWS::EMR::Cluster
		Properties:
			Name: cc-emr-production-cluster
			ReleaseLabel: !Ref 'ReleaseLabel'
			Instances:
			MasterInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-master-instance
			CoreInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-core-instance
			TaskInstanceGroups:
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-1
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-2
			Ec2SubnetId: !Ref 'SubnetId'
			EbsRootVolumeSize: !Ref 'EbsRootVolumeSize'
			ServiceRole: !Ref 'EMRRole'
			JobFlowRole: !Ref 'EMREC2InstanceProfile'
			VisibleToAllUsers: true
		EMRRole:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: elasticmapreduce.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole
		EMREC2Role:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: ec2.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
		EMREC2InstanceProfile:
		Type: AWS::IAM::InstanceProfile
		Properties:
			Path: /
			Roles:
			- !Ref 'EMREC2Role'

Using Terraform (AWS Provider)

01 Terraform configuration file (.tf):

terraform {
	required_providers {
		aws = {
			source  = "hashicorp/aws"
			version = "~> 4.0"
		}
	}

	required_version = ">= 0.14.9"
}

provider "aws" {
	region  = "us-east-1"
}

resource "aws_emr_cluster" "emr-cluster" {

	name          = "cc-prod-emr-cluster"
	release_label = "emr-5.35.0"
	applications  = ["Spark"]

	master_instance_group {
		instance_type = "m5.xlarge"
	}

	core_instance_group {
		instance_type  = "m5.xlarge"
		instance_count = 1

		ebs_config {
			size                 = "50"
			type                 = "gp2"
			volumes_per_instance = 1
		}

	}

	ebs_root_volume_size = 50
	service_role = aws_iam_role.iam_emr_service_role.arn

	ec2_attributes {

	# Configure the VPC subnet where your Amazon EMR cluster will be deployed
	subnet_id                         = "subnet-01234123412341234"

	emr_managed_master_security_group = "sg-01234abcd1234abcd"
	emr_managed_slave_security_group  = "sg-0abcd1234abcd1234"
	instance_profile                  = aws_iam_instance_profile.emr_instance_profile.arn

	}

}

resource "aws_iam_role" "iam_emr_service_role" {
	name = "cc-emr-service-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
	"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
			"Service": "elasticmapreduce.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_role_policy" "iam_emr_service_policy" {
	name = "cc-emr-service-role-policy"
	role = aws_iam_role.iam_emr_service_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"ec2:AuthorizeSecurityGroupEgress",
			"ec2:AuthorizeSecurityGroupIngress",
			"ec2:CancelSpotInstanceRequests",
			"ec2:CreateNetworkInterface",
			"ec2:CreateSecurityGroup",
			"ec2:CreateTags",
			"ec2:DeleteNetworkInterface",
			"ec2:DeleteSecurityGroup",
			"ec2:DeleteTags",
			"ec2:DescribeAvailabilityZones",
			"ec2:DescribeAccountAttributes",
			"ec2:DescribeDhcpOptions",
			"ec2:DescribeInstanceStatus",
			"ec2:DescribeInstances",
			"ec2:DescribeKeyPairs",
			"ec2:DescribeNetworkAcls",
			"ec2:DescribeNetworkInterfaces",
			"ec2:DescribePrefixLists",
			"ec2:DescribeRouteTables",
			"ec2:DescribeSecurityGroups",
			"ec2:DescribeSpotInstanceRequests",
			"ec2:DescribeSpotPriceHistory",
			"ec2:DescribeSubnets",
			"ec2:DescribeVpcAttribute",
			"ec2:DescribeVpcEndpoints",
			"ec2:DescribeVpcEndpointServices",
			"ec2:DescribeVpcs",
			"ec2:DetachNetworkInterface",
			"ec2:ModifyImageAttribute",
			"ec2:ModifyInstanceAttribute",
			"ec2:RequestSpotInstances",
			"ec2:RevokeSecurityGroupEgress",
			"ec2:RunInstances",
			"ec2:TerminateInstances",
			"ec2:DeleteVolume",
			"ec2:DescribeVolumeStatus",
			"ec2:DescribeVolumes",
			"ec2:DetachVolume",
			"iam:GetRole",
			"iam:GetRolePolicy",
			"iam:ListInstanceProfiles",
			"iam:ListRolePolicies",
			"iam:PassRole",
			"s3:CreateBucket",
			"s3:Get*",
			"s3:List*",
			"sdb:BatchPutAttributes",
			"sdb:Select",
			"sqs:CreateQueue",
			"sqs:Delete*",
			"sqs:GetQueue*",
			"sqs:PurgeQueue",
			"sqs:ReceiveMessage"
		]
	}]
}
EOF
}

resource "aws_iam_role" "iam_emr_profile_role" {
	name = "emr-instance-profile-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
	"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
			"Service": "ec2.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_instance_profile" "emr_instance_profile" {
	name = "emr-instance-profile"
	role = aws_iam_role.iam_emr_profile_role.name
}

resource "aws_iam_role_policy" "iam_emr_profile_policy" {
	name = "emr-instance-profile-policy"
	role = aws_iam_role.iam_emr_profile_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"cloudwatch:*",
			"dynamodb:*",
			"ec2:Describe*",
			"elasticmapreduce:Describe*",
			"elasticmapreduce:ListBootstrapActions",
			"elasticmapreduce:ListClusters",
			"elasticmapreduce:ListInstanceGroups",
			"elasticmapreduce:ListInstances",
			"elasticmapreduce:ListSteps",
			"kinesis:CreateStream",
			"kinesis:DeleteStream",
			"kinesis:DescribeStream",
			"kinesis:GetRecords",
			"kinesis:GetShardIterator",
			"kinesis:MergeShards",
			"kinesis:PutRecord",
			"kinesis:SplitShard",
			"rds:Describe*",
			"s3:*",
			"sdb:*",
			"sns:*",
			"sqs:*"
		]
	}]
}
EOF
}

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

03 In the main navigation panel, under EMR on EC2, choose Clusters.

04 Select the Amazon EMR cluster that you want to relaunch and choose Clone from the console top menu.

05 In the Cloning <emr-cluster-id> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Choose Clone to start the cloning process.

06 On the Create Cluster - Advanced Options page, perform the following operations:

  1. Choose Step 1: Software and Steps from the left navigation panel and configure the software stack that will be installed on the new cluster. Choose Next to continue the setup process.
  2. For Step 2: Hardware, choose the VPC network and subnet where the EMR cluster instances will be deployed from the Networking section, set the EBS volume size for the root device and configure the cluster nodes (instances) as needed. Choose Next to continue.
  3. For Step 3: General Cluster Settings, choose whether to enable the Termination Protection safety feature, configure the cluster logging, and create any required tag sets. Choose Next to continue.
  4. For Step 4: Security, make sure that the right permissions are applied to the new cluster, select the EC2 key pair, configure the security options, then choose Create cluster to provision your new Amazon EMR cluster.

07 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for that resource. To terminate the source Amazon EMR cluster, perform the following actions:

  1. Select the Amazon EMR cluster that you want to shut down and choose Terminate from the console top menu.
  2. Click on the Terminate button from the console top menu.
  3. Within the Terminate clusters confirmation box, review the cluster details, set the Termination protection to Off, then choose Terminate to remove the source cluster from your AWS account.

08 Repeat steps no. 4 – 7 for each Amazon EMR cluster that you want to redeploy, available within the current AWS region.

09 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other AWS regions.

Using AWS CLI

01 Get the configuration details from the source (original) EMR cluster launched using the EC2-Classic platform. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to re-create as the identifier parameter, to list the configuration information available for the selected cluster:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-AAAABBBBCCCCD

02 The command output should return the requested cluster configuration information:

{
   "Cluster": {
     "Name": "cc-hadoop-cluster",
     "ServiceRole": "EMR_DefaultRole",
     "Tags": [],
     "TerminationProtected": false,
     "NormalizedInstanceHours": 4,

     ...

     "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR",
     "VisibleToAllUsers": true,
     "BootstrapActions": [],
     "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/",
     "AutoTerminate": false,
     "Id": "j-AAAABBBBCCCCD"
   }
}

03 Run create-cluster command (OSX/Linux/UNIX) to re-create the existing Amazon EMR cluster within the selected VPC network using the configuration information returned at the previous step. The following command example creates an EMR cluster with one c5.xlarge type master instance and two c5.xlarge type core instances, named "cc-vpc-emr-cluster", inside a VPC subnet identified by the ID "subnet-0abcd1234abcd1234":

aws emr create-cluster
  --region us-east-1
  --name cc-vpc-emr-cluster
  --release-label emr-4.0.0
  --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=c5.xlarge
  --service-role EMR_DefaultRole
  --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-0abcd1234abcd1234,EmrManagedSlaveSecurityGroup=sg-01234abcd1234abcd,AvailabilityZone=us-east-1a,SubnetId=subnet-0abcd1234abcd1234
  --visible-to-all-users
  --no-auto-terminate

04 The command output should return the ID of your new Amazon EMR cluster:

{
  "ClusterId": "j-BBBBCCCCDDDDE"
}

05 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for it. To terminate the source Amazon EMR cluster, launched using the EC2-Classic platform, run terminate-clusters command (OSX/Linux/UNIX) using the ID of the cluster as the identifier parameter (the command does not produce an output):

aws emr terminate-clusters
  --region us-east-1
  --cluster-ids j-AAAABBBBCCCCD

06 Repeat steps no. 1 – 5 for each Amazon EMR cluster that you want to redeploy, available in the selected AWS region.

07 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Dec 19, 2017