- Knowledge Base
- Amazon Web Services
- Amazon EMR
- Use Customer Master Keys for EMR Log Files Encryption
Ensure that your Amazon Elastic MapReduce (EMR) clusters are configured to encrypt log files using customer-managed Customer Master Keys (CMKs), in order to have full control over the logging data encryption/decryption process. Amazon EMR automatically upload log files to Amazon S3 when logging is enabled. With this feature, you can associate customer-managed CMKs using KMS service when launching or cloning an EMR cluster. Previously, you could only encrypt log files written to Amazon S3 using Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3).
This rule can help you work with the AWS Well-Architected Framework.
When you use your own AWS KMS Customer Master Keys (CMKs) to protect your EMR log data (i.e. step logs, Hadoop logs and instance state logs) from unauthorized users, you have full control over who can use the encryption keys to access your logs. Amazon Key Management Service (KMS) service allows you to easily create, rotate, disable and audit Customer Master Keys created for your Amazon EMR log files.
Note: This conformity rule assumes that all your Amazon EMR clusters have logging enabled.
Audit
To determine the encryption status and configuration for your Amazon EMR log files, perform the following operations:
Using AWS Console
01 Sign in to AWS Management Console.
02 Navigate to EMR console at https://console.aws.amazon.com/elasticmapreduce/.
03 In the left navigation panel, under Amazon EMR, click Clusters to view your EMR clusters.
04 Select the EMR cluster that you want to examine, then click on the View details button from the dashboard top menu.
05 Select the Summary tab and check the Key name configuration attribute value listed in the Security and access section. If the Key name attribute value is missing (i.e. --), the log files captured for the selected Amazon EMR cluster are not encrypted at rest using a customer-managed CMK.
06 Repeat step no. 4 and 5 to determine the logging encryption status and configuration for other AWS EMR clusters available within the current region.
07 Change the AWS region from the navigation bar and repeat the entire audit process for other regions.
Using AWS CLI
01 Run list-clusters command (OSX/Linux/UNIX) using custom query filters to list the identifiers (IDs) of all the active Amazon EMR clusters provisioned in the selected region:
aws emr list-clusters --region us-east-1 --active --output table --query 'Clusters[*].Id'
02 The command output should return a table with the requested cluster IDs:
-------------------- | ListClusters | +------------------+ | j-ABCDABCDABCD | | j-ABCD1234ABCD | | j-1234ABCD1234 | +------------------+
03 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the AWS EMR cluster that you want to examine as identifier parameter and custom query filters to return the Amazon Resource Name (ARN) of the Customer Master Key (CMK) used to encrypt the log files captured for the selected cluster:
aws emr describe-cluster --region us-east-1 --cluster-id j-ABCDABCDABCD --query 'Cluster.LogEncryptionKmsKeyId'
04 The command output should return the requested resource ARN:
null
If describe-cluster command output returns null, as shown in the example above, the log files captured for the selected Amazon EMR cluster are not encrypted at rest using a customer-managed Customer Master Key (CMK).
05 Repeat step no. 3 and 4 to determine the logging encryption status and configuration for other AWS EMR clusters provisioned in the selected region.
06 Change the AWS region by updating the --region command parameter value and repeat steps no. 1 – 5 to perform the entire audit process for other regions.
Remediation / Resolution
To encrypt the logging data captured for your Amazon EMR clusters using your own KMS Customer Master Keys (CMKs), perform the following operations:
Using AWS CloudFormation
01 CloudFormation template (JSON):
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Use Customer Managed Keys for EMR Log Files Encryption", "Parameters": { "ReleaseLabel": { "Type": "String" }, "ClusterInstanceType": { "Type": "String" }, "EbsRootVolumeSize": { "Type": "String" }, "SubnetId": { "Type": "String" } }, "Resources": { "EMRRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "elasticmapreduce.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "Path": "/", "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole" ] } }, "EMREC2Role": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "Path": "/", "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role" ] } }, "EMREC2InstanceProfile": { "Type": "AWS::IAM::InstanceProfile", "Properties": { "Path": "/", "Roles": [ { "Ref": "EMREC2Role" } ] } }, "EMRCluster": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": "cc-emr-production-cluster", "ReleaseLabel": { "Ref": "ReleaseLabel" }, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": { "Ref": "ClusterInstanceType" }, "Market": "ON_DEMAND", "Name": "cc-master-instance" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": { "Ref": "ClusterInstanceType" }, "Market": "ON_DEMAND", "Name": "cc-core-instance" }, "TaskInstanceGroups": [ { "InstanceCount": 1, "InstanceType": { "Ref": "ClusterInstanceType" }, "Market": "ON_DEMAND", "Name": "cc-task-instance-1" }, { "InstanceCount": 1, "InstanceType": { "Ref": "ClusterInstanceType" }, "Market": "ON_DEMAND", "Name": "cc-task-instance-2" } ], "Ec2SubnetId": { "Ref": "SubnetId" } }, "EbsRootVolumeSize": { "Ref": "EbsRootVolumeSize" }, "ServiceRole": { "Ref": "EMRRole" }, "JobFlowRole": { "Ref": "EMREC2InstanceProfile" }, "VisibleToAllUsers": true, "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/", "LogEncryptionKmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd" } } } }
02 CloudFormation template (YAML):
AWSTemplateFormatVersion: '2010-09-09' Description: Use Customer Managed Keys for EMR Log Files Encryption Parameters: ReleaseLabel: Type: String ClusterInstanceType: Type: String EbsRootVolumeSize: Type: String SubnetId: Type: String Resources: EMRRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2008-10-17' Statement: - Sid: '' Effect: Allow Principal: Service: elasticmapreduce.amazonaws.com Action: sts:AssumeRole Path: / ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole EMREC2Role: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2008-10-17' Statement: - Sid: '' Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole Path: / ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role EMREC2InstanceProfile: Type: AWS::IAM::InstanceProfile Properties: Path: / Roles: - !Ref 'EMREC2Role' EMRCluster: Type: AWS::EMR::Cluster Properties: Name: cc-emr-production-cluster ReleaseLabel: !Ref 'ReleaseLabel' Instances: MasterInstanceGroup: InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-master-instance CoreInstanceGroup: InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-core-instance TaskInstanceGroups: - InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-task-instance-1 - InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-task-instance-2 Ec2SubnetId: !Ref 'SubnetId' EbsRootVolumeSize: !Ref 'EbsRootVolumeSize' ServiceRole: !Ref 'EMRRole' JobFlowRole: !Ref 'EMREC2InstanceProfile' VisibleToAllUsers: true LogUri: s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/ LogEncryptionKmsKeyId: arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd
Using Terraform (AWS Provider)
01 Terraform configuration file (.tf):
terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 4.0" } } required_version = ">= 0.14.9" } provider "aws" { profile = "default" region = "us-east-1" } resource "aws_iam_role" "iam_emr_service_role" { name = "cc-emr-service-role" assume_role_policy = <<EOF { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "elasticmapreduce.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF } resource "aws_iam_role_policy" "iam_emr_service_policy" { name = "cc-emr-service-role-policy" role = aws_iam_role.iam_emr_service_role.id policy = <<EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Resource": "*", "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CancelSpotInstanceRequests", "ec2:CreateNetworkInterface", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:DeleteNetworkInterface", "ec2:DeleteSecurityGroup", "ec2:DeleteTags", "ec2:DescribeAvailabilityZones", "ec2:DescribeAccountAttributes", "ec2:DescribeDhcpOptions", "ec2:DescribeInstanceStatus", "ec2:DescribeInstances", "ec2:DescribeKeyPairs", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaces", "ec2:DescribePrefixLists", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSpotInstanceRequests", "ec2:DescribeSpotPriceHistory", "ec2:DescribeSubnets", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcs", "ec2:DetachNetworkInterface", "ec2:ModifyImageAttribute", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RevokeSecurityGroupEgress", "ec2:RunInstances", "ec2:TerminateInstances", "ec2:DeleteVolume", "ec2:DescribeVolumeStatus", "ec2:DescribeVolumes", "ec2:DetachVolume", "iam:GetRole", "iam:GetRolePolicy", "iam:ListInstanceProfiles", "iam:ListRolePolicies", "iam:PassRole", "s3:CreateBucket", "s3:Get*", "s3:List*", "sdb:BatchPutAttributes", "sdb:Select", "sqs:CreateQueue", "sqs:Delete*", "sqs:GetQueue*", "sqs:PurgeQueue", "sqs:ReceiveMessage" ] }] } EOF } resource "aws_iam_role" "iam_emr_profile_role" { name = "emr-instance-profile-role" assume_role_policy = <<EOF { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF } resource "aws_iam_instance_profile" "emr_instance_profile" { name = "emr-instance-profile" role = aws_iam_role.iam_emr_profile_role.name } resource "aws_iam_role_policy" "iam_emr_profile_policy" { name = "emr-instance-profile-policy" role = aws_iam_role.iam_emr_profile_role.id policy = <<EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Resource": "*", "Action": [ "cloudwatch:*", "dynamodb:*", "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps", "kinesis:CreateStream", "kinesis:DeleteStream", "kinesis:DescribeStream", "kinesis:GetRecords", "kinesis:GetShardIterator", "kinesis:MergeShards", "kinesis:PutRecord", "kinesis:SplitShard", "rds:Describe*", "s3:*", "sdb:*", "sns:*", "sqs:*" ] }] } EOF } resource "aws_emr_cluster" "emr-cluster" { name = "cc-prod-emr-cluster" release_label = "emr-5.35.0" applications = ["Spark"] master_instance_group { instance_type = "m5.xlarge" } core_instance_group { instance_type = "m5.xlarge" instance_count = 1 ebs_config { size = "50" type = "gp2" volumes_per_instance = 1 } } ec2_attributes { subnet_id = "subnet-01234123412341234" emr_managed_master_security_group = "sg-01234abcd1234abcd" emr_managed_slave_security_group = "sg-0abcd1234abcd1234" instance_profile = aws_iam_instance_profile.emr_instance_profile.arn } ebs_root_volume_size = 50 service_role = aws_iam_role.iam_emr_service_role.arn log_uri = "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/" # Use Customer Managed Keys for EMR Log Files Encryption log_encryption_kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd" }
Using AWS Console
01 Sign in to AWS Management Console.
02 Navigate to KMS console at https://console.aws.amazon.com/kms/.
03 In the left navigation panel, click Customer managed keys.
04 Select the appropriate AWS region from the navigation bar (must match the region where your Amazon EMR cluster is running).
05 Click Create Key button from the dashboard top menu to initiate the setup process.
06 For Step 1 Configure key, choose Symmetric from the Key type section, and select KMS for the Key material origin, available under Advanced options. Click Next to continue.
07 For Step 2 Add labels, provide a unique name (alias) and a short description for your new KMS CMK, then use the Add tag button to create any required tag sets (optional). Click Next to continue the setup process.
08 For Step 3 Define key administrative permissions, choose which IAM users and/or roles can administer your new CMK through the KMS API. You may need to add additional permissions for the users or roles to administer the key from the AWS console. Click Next to continue.
09 For Step 4 Define key usage permissions, within This account section, select which IAM users and/or roles can use the new Customer Master Key (CMK) for cryptographic operations. (Optional) In the Other AWS accounts section, click Add another AWS account and enter an external account ID in order to specify another AWS account that can use this CMK to encrypt and decrypt your EMR log files. The owners of the external AWS accounts must also provide access to this CMK by creating appropriate policies for their IAM users. Click Next to continue the process.
10 For Step 5 Review and edit key policy, review the key policy, then click Finish to create your new KMS Customer Master Key (CMK). Once the key is successfully created, the KMS console will display the following confirmation message: "Success. Your customer master key was created with alias <key-alias> and key ID <key-id>
".
11 Navigate to EMR console at https://console.aws.amazon.com/elasticmapreduce/.
12 In the navigation panel, under Amazon EMR, click Clustersto access your AWS EMR clusters.
13 Select the EMR cluster that you want to relaunch (see Audit section part I to identify the right resource), then click on the Clone button from the dashboard top menu.
14 Inside the Cloning <emr-cluster-id>
dialog box, choose Yes to include the steps from the source cluster into the cloned (destination) cluster. Click Clone to start the cloning process.
15 On the Create Cluster - Advanced Options page, select Step 3: General Cluster Settings from the left navigation panel, select Log encryption checkbox, and choose the name (alias) of the newly created CMK from the Choose an AWS KMS key dropdown list. Click Next to continue the setup process, then select Create cluster to launch your new Amazon EMR cluster.
16 Once your new AWS EMR cluster has been successfully tested, you can terminate the source cluster in order to stop incurring charges for it. To terminate the source Amazon EMR cluster perform the following actions:
17 Go back to the navigation panel, and under Amazon EMR, choose Clusters.
18 Select the source Amazon EMR cluster that you want to shut down.
19 Click on the Terminate button from the dashboard top menu to initiate the removal process.
20 In the Terminate cluster confirmation box, review the source cluster details, then click Terminate.
21 Repeat steps no. 13 – 20 to enable logging data encryption for other Amazon EMR clusters available in the selected region, using your own KMS Customer Master Key (CMK).
22 Change the AWS region from the navigation bar to repeat the entire process for the other regions.
Using AWS CLI
01 Define the policy that enables the selected IAM users and/or roles to manage the new Customer Master Key (CMK), and to encrypt/decrypt your EMR cluster log files using the AWS KMS API. Create a new policy document (JSON format), name the file emr-log-cmk-policy.json, and paste the following content (replace the highlighted details, i.e. the ARNs for the IAM users and/or roles, with your own environment details):
{ "Id": "emr-log-cmk-policy", "Version": "2012-10-17", "Statement": [ { "Sid": "Enable IAM User Permissions", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": "kms:*", "Resource": "*" }, { "Sid": "Allow access for Key Administrators", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:role/AmazonEMRManager" }, "Action": [ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:TagResource", "kms:UntagResource", "kms:ScheduleKeyDeletion", "kms:CancelKeyDeletion" ], "Resource": "*" }, { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:role/AmazonEMRAdmin" }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }, { "Sid": "Allow attachment of persistent resources", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:role/AmazonEMRAdmin" }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": "true" } } } ] }
02 Run create-key command (OSX/Linux/UNIX) using the policy document created at the previous step (i.e. emr-log-cmk-policy.json) as value for the --policy parameter, to create your new AWS KMS Customer Master Key (CMK):
aws kms create-key --region us-east-1 --description 'AWS KMS CMK for encrypting EMR cluster log files' --policy file://emr-log-cmk-policy.json --query 'KeyMetadata.Arn'
03 The command output should return the ARN of the new KMS Customer Master Key:
"arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd"
04 Run create-alias command (OSX/Linux/UNIX) using the key ARN returned at the previous step to attach an alias to the new CMK. The alias must start with the prefix "alias/" (the command does not produce an output):
aws kms create-alias --region us-east-1 --alias-name alias/EMRLogFileCMK --target-key-id arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd
05 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to re-create (see Audit section part II to identify the right resource) to get all the configuration details from the source EMR cluster:
aws emr describe-cluster --region us-east-1 --cluster-id j-ABCDABCDABCD
06 The command output should return the configuration information available for the selected AWS EMR cluster:
{ "Cluster": { "Name": "cc-emr-prod-cluster", "ServiceRole": "EMR_DefaultRole", "TerminationProtected": false, "ReleaseLabel": "emr-5.30.1", "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/", ... "AutoTerminate": false, "Configurations": [], "NormalizedInstanceHours": 5, "KerberosAttributes": {}, "VisibleToAllUsers": true, "Id": "j-ABCDABCDABCD" } }
07 Run create-cluster command (OSX/Linux/UNIX) using the cluster configuration details returned at the previous step as values for the required parameters, to re-create the specified Amazon EMR cluster in order to enable logging encryption using the newly created KMS Customer Master Key (CMK), by setting the CMK ARN as value for the --log-encryption-kms-key-id parameter:
aws emr create-cluster --region us-east-1 --name cc-new-emr-cluster --release-label emr-5.30.1 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=c5.xlarge --service-role EMR_DefaultRole --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-abcdabcd,EmrManagedSlaveSecurityGroup=sg-aaaabbbb,AvailabilityZone=us-east-1a,SubnetId=subnet-abcd1234 --visible-to-all-users --security-configuration cc-emr-cluster-security-config --log-uri "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/" --log-encryption-kms-key-id "arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-abcd-1234-abcd1234abcd"
08 The command output should return the ID of the new Amazon EMR cluster:
{ "ClusterId": "j-AABBCCDDAABBC" }
09 Once your new AWS EMR cluster has been successfully tested, you can terminate the source cluster in order to stop incurring charges for the resource. To shut down the source Amazon EMR cluster, run terminate-clusters command (OSX/Linux/UNIX) using the ID of the cluster that you want to shut down as identifier parameter (the command does not produce an output):
aws emr terminate-clusters --region us-east-1 --cluster-ids j-ABCDABCDABCD
10 Repeat steps no. 5 – 9 to enable logging data encryption for other AWS EMR clusters provisioned in the selected region, using your own KMS Customer Master Key (CMK).
11 Change the AWS region by updating the --region command parameter value and repeat steps no. 1 – 10 to perform the entire process for other regions.
References
- AWS Documentation
- Amazon EMR FAQs
- Configure cluster logging and debugging
- AWS Command Line Interface (CLI) Documentation
- kms
- create-key
- create-alias
- emr
- list-clusters
- describe-cluster
- create-cluster
- terminate-clusters