Build a Secure Well-Architected Cloud Environment
Explore the Security pillar of the AWS and Azure Well-Architected Framework and be guided through the fundamental security controls that should be addressed when designing, transitioning to, and operating in a cloud environment.
Related articles in the Well-Architected series:
- Overview of All 5 Pillars
- Operational Excellence Pillar
- Reliability Pillar
- Performance Efficiancy Pillar
- Cost Optimization Pillar
In today’s operating environment, security is critical and businesses must be protected from accidental and malicious threats. These threats can come from any direction and at any moment in time. Just like any other cloud providers, Amazon Web Services (AWS) and Microsoft® Azure® have a shared security model. Each model indicates what responsibility lies with the provider and what is the responsibility of the customer, so appropriate steps can be taken to ensure security.
To kick things off, let’s review a few design principles that will help you to build well-architected environments. And while this is not a complete list, but it is a great place to start!
Well-Architected Design Principles
- Implement a strong identity foundation: Ensure that core security principles are followed, such as the principle of least privilege and the principle of separation of duties. It is exceedingly difficult to control access between applications, users, devices, and resources. Having solid policies and processes that define centralised identity management and authentication methodologies other than static credentials, such as passwords, is crucial. Controlling access can be difficult, and you must be diligent in watching over configurations (or have automation in place to do this for you) to ensure that Amazon Simple Storage Services (Amazon S3) buckets haven’t been granted ‘Full_Control’, for example.
- Enable traceability: When it comes to security incidents, the most important thing is knowing that an incident has occurred. Log collection and analysis, as well as metric tracking, are essential here. For example, enabling user activity logging on something like Amazon Redshift is a step in the right direction.
- Apply security at all layers: Defence in depth has been a security staple forever, as it helps to slow down attacks by detecting and preventing them earlier. To do this properly, security needs to be applied to virtual machines, operating systems, applications, virtual private clouds (VPCs), and the list goes on.
- Automate security best practices: It is human nature to make mistakes, but when it comes to security, every point that requires human interaction increases the chance of a security breach. When possible, it is always best to automate security controls to lessen the odds of an error occurring and enable more rapid expansion while controlling costs.
- Protect data in transit and at rest: Someone is always listening, watching, and waiting for data to be left in the clear. Why make it easy for hackers? Always protect data in transit and at rest, with encryption.
- Keep people away from customer data: The more access granted, the more likely an account will be compromised. With so many data regulations, direct access to customer data needs to be tightly controlled.
- Prepare for security events: It will happen. There will be a hack. There will be a compromise. Have teams and procedures in place to respond to those incidents. From detection to recovery, the more automated the tools are, more effective the teams will be that rely on them. A good starting point is to ensure you have the right subscribers to Amazon Simple Notification Service (SNS) messages. This is to ensure the right people get the messages and the wrong do not.
AWS defined the above security principles across five areas:
- Identity access management
- Infrastructure protection
- Data protection
- Incident response
Operational excellence applied to your cloud workload
Security must always be tailored to fit each individual business. To choose business-appropriate security controls, threat modelling and a risk assessment must be done. Once you have the results of those processes, you may need to revise your security decisions, especially within the cloud. Automated tools should be used to continuously scan your machine images, applications, APIs, or any other part of your infrastructure as code (IaC). You should be going through the exercise of threat modelling and assessing risk on a regular basis to ensure that you are up to date with the current threat landscape.
Cloud account management for architects
Unfortunate lessons have been learnt by companies like Code Spaces, who saw their business devastated by hackers who were able to compromise the company’s corporate AWS account. To reduce the risk of this happening to your business, access to the root account within AWS or Azure controlled with multi-factor authentication (MFA) is recommended. As well, separate accounts should be established for production, development, testing, because if one account becomes compromised, the others can remain secure.
AWS Organisations should be used to centrally managing all of the AWS accounts within a corporation. When using AWS Organisations, it is critical to ensure all settings are appropriately chosen. To control all accounts appropriately, all features should be enabled.
Want to never have to manually check for adherence to AWS Organisations’ best practices again? Have your AWS and Azure cloud infrastructure scanned for adherence to 750+ cloud guidelines by signing up for our free trial.
Identity and access management (IAM)
IAM is extremely crucial to protecting your business, as it allows organisations to control who can and cannot access data, accounts, etc. If someone or something cannot access our data, then they should not be able to alter or steal it, however, this is not the only tool we need.
When controlling access, we need to control both humans and machines. When it comes to access control for humans, we are talking about users, administrators, developers, and customers. While machines are usually less obvious, AWS placed all virtual machines, APIs, applications, servers, routers, and switches into this category.
There are a few critical things that we should be doing with our identities.
- Centralise control: When it is decentralised, we often have overlapping permissions, gaps in our control, inconsistent permissions, and the list goes on. Centralised control allows for greater visibility and management. In the cloud, this can be done by an identity provider (IdP), whereas with AWS, you can federate individual AWS accounts using SAML 2.0.
- Single sign-on (SSO): If users need access to multiple accounts, then AWS SSO can be utilised. AWS SSO can connect with AWS Organisations to manage accounts with greater ease. You can also connect AWS SSO to your Microsoft Active Directory (AD) environment.
- Group users together: If there are many users with similar access needs, it is best to group them together, saving you from managing each individual user.
- Strong sign-in procedures: The days of relying on a password to authenticate a user should be long behind us, but unfortunately this is not the case. This needs to change, especially when controlling access to cloud resources. When dealing with users, at a minimum, we need to utilise multi-factor authentication (MFA). However, with machine identities, utilising temporary credentials with access keys is normal. Frequent rotation of these keys is critical. There are also scenarios where IAM is not used, such as database logins, in which case secrets are used. If using AWS, you can take advantage of AWS Secrets Manager to securely store this information, but always verify that it is securely storing that information.
- Permissions management: Once identities are provisioned, you must determine the level of access to grant. There are key security principles to apply here, such as the principle least privilege and need-to-know principle.
Protecting the AWS accounts you are managing access to is also critical. This starts with defining guardrails for the organisation, which allows configurations with service control policies (SCP) to prevent the deletion of common resources. It is critical to ensure you have the right configurations within AWS Organisations. You can see the many IAM best practices on the Trend Micro Cloud One™ – Conformity Knowledge Base, such as ensuring access keys are rotated, multi-factor authentication (MFA) is enabled for the AWS root account, and that AWS IAM roles cannot be used by untrusted accounts via cross-account access feature.
As previously mentioned, it is critical to know when an incident occurs. Without knowing, it is impossible to respond, fix, or correct. If detection takes too long and your response does not mitigate damage early on, you may be faced with higher fines by violating the regulations of GDPR or HIPAA. So, how can this be prevented? With proper configurations and investigations.
Having the proper configuration for your systems to log and alert your network operations centre (NOC) or security operations centre (SOC) is the first step. AWS offers a variety of tools to build a comprehensive and automated detective environment. These include:
- AWS CloudTrail—creates a record of all account activity.
- AWS Config—provides you with a detailed inventory of your AWS resources and their current configurations. It also allows for auto-remediation if actions are taken to change configurations in appropriately.
- Amazon GuardDuty—Think of this as your guard dog. It will monitor your cloud, looking for malicious activity and unauthorised behaviour.
- AWS Security Hub—A tool that gathers, organises, and prioritises notifications, alerts, and findings from both AWS and third-party products
The challenge is to ensure those products are properly configured. Trend Micro Cloud One™ – Conformity ingests the data from these services and products (along with 90 other AWS and Azure cloud services and resources) and automatically cheques for misconfigurations from the Conformity Knowledge Base.
Interested in knowing how well-architected you are? See your own security posture in 15 minutes or less.
It is critical to have the ability to investigate and respond to incidents. When an incident is detected, there should be a playbook of processes for investigation. This will enable teams to respond effectively to an incident, however, this is just part one, part two is having an automated response configured for certain events.
Cloud infrastructure protection
Infrastructure protection is broken down by AWS into network and compute protection mechanisms. Network protection involves traditional tools such as firewalls and access control lists. Compute protection involves tactics such as code analysis and patching.
Network protection mechanisms start with the traditional security concept of defence in depth. Having a single protection mechanism in front of a resource is not sufficient. As we construct our networks, it is necessary to ensure that logging and alerts are enabled, so responses can be initiated immediately. AWS offers Amazon Virtual Private Cloud (Amazon VPC), which allows for network segmentation and control to create a virtual network where you can specify and configure your IPv4 or IPv6 addresses, as well as decide whether or not it is accessible from the public internet, amongst other things. It is critical that you get everything setup correctly within Amazon VPC to protect your resources. Conformity has several rules that you can use to manually check your own Amazon VPC configuration, or if you start a trial, your entire environment will be automatically scanned for misconfigurations.
Compute protection is about the edge computing resources. To start, you’ll want to have tools for code analysis, as clean code is critical to ensuring their aren’t open doors that hackers or their malicious software (malware) can get through. Once applications/software/operating systems/etc. are deployed, updates or patches need to be applied as the flaws or bugs are revealed.
Other infrastructure protection best practices:
- Harden the system. The Centre for Internet Security (CIS) and National Institute of Standards and Technology (NIST) provide useful documentation on configurations that are product specific.
- Reduce unused components (applications, software modules, OS packages)
- Automate administrative tasks, using products like AWS Lambda, Amazon Relational Database Service, and Amazon Elastic Container Service (Amazon ECS)
- Validate software integrity by using code signing. Signatures and checksums establish source and integrity of software
Protecting data at rest and in transit is essential, using methods such as encryption and classification. Data classification is essential to understanding what data we possess and what needs to be done to protect it appropriately. Data and resources can be tagged so the systems can recognise what kind of resource it is, and then utilise service control policies (SCP) to control access utilising attribute-based access control.
Cryptography can be used to protect data at rest and in transit. Drives, folders, and buckets can be encrypted while resting on AWS servers. As always, it is critical that configurations are done correctly.. For example, you must ensure that encryption is enabled for Amazon Athena query results, especially since there are multiple ways to configure this, such as server-side encryption (SSE) or client-side encryption (CSE).
When encrypting data in transit, there are many different configuration options as well. It is not as simple as ensuring that Transport Layer Security is enabled. Controlling certificates is critical here, and there are many things to manage with AWS Certificate Manager.
Key management is essential to cryptography for data in transit and at rest. If done properly, compliance with PCI-DSS, GDPR, and other regulations is supported. One choice to consider is a hardware security module (HSM) or tokenization.
When an adverse event or incident occurs, correct, efficient, and effective responses are essential.
Here are the AWS incident response phases:
- Educate—Education for our incident response teams and security operations staff is crucial. If they do not understand the cloud, your services, or available information, they will not be able to respond effectively.
- Prepare—Having plans and procedures is critical for effective responses. The teams must understand those plans and know what tools are available to be able to respond.
- Simulate—The saying “practice makes perfect” holds true here. While responses will never be perfect, practice will help to continually improve.
- Iterate—Deconstruct the simulations and build automated responses. This allows incident response to start immediately as an incident occurs, rather than waiting for humans to intervene.
As more security breaches hit the news and data protection has become a key focus, meeting this pillar’s standard should always be in mind. Conformity can help you stay compliant to the well-architected framework with its 750+ best practice cheques. As mentioned above, if you are interested in knowing how well-architected you are, see your own security posture in 15 minutes or less. Learn more by reading the other articles in the series, here are the links: 1) overview of all 5 pillars 2) operational excellence 3) performance efficiency 4) reliability 5) cost optimisation.