A Guide to the Well-Architected Framework
This article explores the five pillars of the Amazon Web Services (AWS) and Azure Well-Architected Framework, examining best practices and design principles to leverage the cloud in a more efficient, secure, and cost-effective manner.
Designing and building a secure cloud environment is critical to ensuring your business has a strong security posture, but this is easier said than done. In the past, when the data center was fully within your control and you only had a single server to be concerned about, it was already challenging to uphold strong security posture management. Fast forward to the world of cloud, where we don’t have physical control over our data centers, the challenge grows a lot greater. Luckily, there is good news. Just because you don’t have physical control over your data center, doesn’t mean you can’t secure it.
So, the burning question is, what should and can we do to build a secure cloud. Even if your security posture is strong today, there is no guarantee it will be tomorrow. Hackers are always looking to get around the new defenses being implemented, so you need to stay a step ahead. Otherwise, the hackers will make their way in and may cause devastating consequences to the business. To prevent an attack from occurring in your business requires careful planning, using tactics such as the Deming Cycle of Plan-Do-Check-Act, which is an iterative four-step management method used in business for the control and continuous improvement.
Discerning cloud service default settings
One of the exceedingly difficult challenges with cloud is knowing what all the default settings are for each cloud service and what you need to change to have a secure implementation. Take, for example, the simple creation of an Amazon Simple Storage Service (Amazon S3) on Amazon Web Services (AWS):
With the Command Line Interface (CLI), it takes but a few lines to create a bucket, but is that all you need to do? Unfortunately, that isn’t enough to ensure the data stored in the bucket is secure. Here is a look at a well-architected bucket:
Not so easy, huh? Luckily, Microsoft® Azure® and AWS have created several white papers on the Well-Architected Framework to explain cloud architectural design principals that can help guide you through the process. For example, in the case of an Amazon S3 bucket, you need to remember to disallow public read access, ensure logging is enabled, use customer-provided keys to ensure encryption is on, and so on.
With so many cloud services and resources, it can be a lot to remember what to do and what configurations should be there. However, as you can see from the links to the articles on infrastructure configuration, Trend Micro has lots of information about what should be done to build cloud architecture to best practice levels. The Trend Micro Cloud One™ – Conformity Knowledge Base contains 750+ best practice articles to help you understand each cloud best practice, how to audit, and how to remediate the misconfiguration.
Cloud infrastructure misconfiguration automation
Automation is an essential step to minimize the risk of a breach, always scanning and providing feedback to stay ahead of the hackers. For anyone building in the cloud, having an automated tool that continuously scans your cloud infrastructure for misconfigurations is a thing of beauty, as it can ensure you are always complying with those 750+ best practices without the heavy lifting. If you would like to be relieved from manually checking for adherence to well-architected design principals, sign up for a free trial of Conformity. Or, if you’d to see how well-architected your infrastructure is, check out the free guided public cloud risk self-assessment to get personalized results in minutes.
The five Pillars of a Well-Architected Framework
Conformity and its Knowledge Base are based on the AWS and Azure Well-Architected Frameworks, which are defined by five pillars:
- Operational excellence—focus on running and monitoring systems
- Security—focus on protecting information and systems
- Reliability—focus on ensuring a workload performs as it should
- Performance efficiency—focus on efficient use of IT
- Cost optimization—focus on avoiding unnecessary costs
Each of these pillars has its own set of design principals, which are extremely useful for evaluating your architecture and determining if you have implemented design principles that allow you to scale over time.
Starting with the Operational Excellence pillar, creating the most effective and efficient cloud infrastructure is a natural goal. So, when creating or changing the infrastructure, it is critical to follow the path of best practices outlined in the AWS Operational Excellence pillar.
The Operational Excellence pillar focuses on two business objectives:
- Running workloads in the most efficient way possible.
- Understanding your efficiency to be able to improve processes and procedures on an ongoing basis.
The five design principles within the Operational Excellence pillar
To achieve these objectives, there are five critical design principles can be utilized:
- Perform operations as code, so you can apply engineering principles to your entire cloud environment. Applications, infrastructure, and so on, can all be defined as code and updated as code.
- Make frequent, small, reversible changes, as opposed to large changes that make it difficult to determine the cause of the failure—if one were to occur. It also requires development and operations teams to be prepared to reverse the change that was just made in the event of a failure.
- Refine operations procedures frequently by reviewing them with the entire team to ensure everyone is familiar with them and determine if they can be updated.
- Anticipate failure to ensure that the sources of future failures are found and removed. A pre-mortem exercise should be conducted to determine how things can go wrong to be prepared..
- Learn from all operational failures and share them across all teams. This allows teams to evolve and continue to increase procedures and skills.
CI/CD is good, but to ensure operational excellence, there must be proper controls on the process and procedures for building and deploying software, which include a plan for failure. It is always best to plan for the worst, and hope for the best, so if there is a failure, we will be ready for it.
With data storage and processing in the cloud, especially in today's regulatory environment, it is critical to ensure we build security into our environment from the beginning.
The seven design principles within the Security pillar
There are several critical design principles that strengthen our ability to keep our data and business secure, however, here are the seven recommended based on the Security pillar:
- Implement a strong identity foundation to control access using core security concepts, such as the principle of least privilege and separation of duties.
- Enable traceability through logging and metrics throughout the cloud infrastructure. It is only with logs that we know what has happened.
- Apply security at all layers throughout the entire cloud infrastructure using multiple security controls with defense in depth. This applies to compute, network, and storage.
- Automate security best practices to help scale rapidly and securely in the cloud. Utilizing controls managed as code in version-controlled templates makes it easier to scale securely.
- Always protect data in transit and at rest, using appropriate controls based on sensitivity. These controls include access control, tokenization, encryption, and etc.
- Keep people away from data to reduce the chance of mishandling, modification, or human error.
- Prepare for security events by having incident response plans and teams in place. Incidents will occur and it is essential to ensure that a business is prepared.
Five areas to configure in the cloud to help achieve a well-architected infrastructure
There are several security tools that enable us to fulfill on the design principles, above. AWS has broken security into five areas that we should configure in the cloud:
- Identity and access management (IAM), which involves the establishment of identities and permissions for humans and machines. It is critical to manage this through the life cycle of the identity.
- Detection of an attack. The challenges most businesses face is detection attacks. Even though an attack may not be malicious, it could simply be a user making a mistake, it can be costly. Enablement of logging features, as well as the delivery of those logs to the SIEM is essential. Once the SIEM has detected something bad has happened, alerts should be sent out.
- Infrastructure protection of the network and the compute resources is critical. This is done through a variety of tools and mechanisms that are either infrastructure tools or code protection, such as virtual private clouds (VPCs), code review, vulnerability assessments, gateways, firewalls, load balancers, hardening, code signing, and more.
- Data protection in transit and at rest is critical. This is primarily done with IAM and encryption. Most discussions of encryption review what algorithms are used and what the key size is. The most important piece to discuss, in relationship to encryption, is where is the key and who has control over it. It is also important to be able to determine the authenticity of the public key certificates.
- Incident response is the ability to respond immediately and effectively when an adverse agent occurs. The saying goes “failing to plan is planning to fail”. If we do not have incident responses planned and practiced, an incident could destroy the business.
What is essential to remember is that security of a cloud ecosystem is a split responsibility. AWS and Azure have defined where responsibility lies with them versus where it lies with the consumer. It is good to review the AWS and/or Azure shared responsibility models to ensure you are upholding your end of the deal.
Reliability is important to think about for any IT-related system. IT must provide the services users and customers need, when they need it. This involves understanding the level of availability that your business requires from any given system.
The five design principles within the Reliability pillar
When it comes to the Reliability pillar, just like the others, AWS has defined critical design principles:
- Automatically recover from failure. Depending on the business needs, it might be essential that there are automated recovery controls in place, as the time it takes a human to intervene may be longer than a business can tolerate.
- Test recovery procedures. Backing up the data from an Amazon S3 bucket is good first step, but the process is not complete until the restoration procedure is verified. If the data cannot be restored, then it has not been successfully backed up.
- Scale horizontally to increase aggregate workload availability as an alternate way to envision a cloud infrastructure. If the business is using a virtual machine with a large amount of CPU capacity to handle all user requests, you may want to consider breaking it down into multiple, smaller virtual machines that are load balanced. That way, if a machine failed, the impact is not a denial of service, and if planned well, the users may never know there was a failure at all.
- Stop guessing capacity. Careful planning and capacity management is critical to the reliability of an IT environment, and may just save you money where you are spending on unnecessary capacity needs.
- Manage change and automation so alternations to the cloud do not interfere with the reliability of the infrastructure. Change management is core to ITIL. Changes should not be made unless they are planned, documented, tested, and approved. There must also be a backup plan for if/when a change breaks your environment.
With availability being at the core of this pillar, it is good to understand its definition. AWS defines availability as:
When availability is understood, it is possible to choose the right systems and the right configurations needed to fulfill the needs of the business. This should be in our design goals from the beginning. Redesigning a system later to match goals that we did not understand from the beginning is a very costly alternative.
Moving right along, the fourth pillar, Performance Efficiency, focuses on your ability to use computing resources as efficiently as possible and maintain that efficiency as the demands from the business change and technologies evolve.
The five design principles within the Performance Efficiency pillar
In order to fulfill the pillar of Performance Efficiency, the following design principles should be adhered to:
- Democratize advanced technologies. For example, when new technologies are available it might be best to leave the learning and management of these technologies to the cloud provider, consuming them as a service instead. This frees up the time and skills of your developers and operations departments to focus on their jobs.
- Go global in minutes to ensure the services accessed by users or customers are as close to them as possible. As the workforce expands to a global presence, the workload within the cloud can and should expand to regions that are closer to the end user to improve the response time for everyone.
- Use serverless architectures to leave management of physical servers to the cloud provider. While the idea of serverless sounds like it is a strange new technology that removes the actual servers, it is actually just an alternative way of running services and lowering costs. It creates the illusion that there is no server, but rather the cloud provider manages it in a completely transparent manner, allowing the customer to focus on their functions or applications.
- Experiment more often to discover the best infrastructure or configuration that serves the business most efficiently. Virtualization makes it possible to spin up a new resource quickly and if it does not work for the business, then it can be shut down just as quickly.
- Consider mechanical sympathy by looking at how your data is accessed and utilized. From a data-centric perspective, you want to find the best technology to meet the needs of the business.
Justifying business spending on any given service is no cake walk. Cloud does not change that. In fact, it could be more difficult to justify because it changes the way businesses look at their IT cost. Traditionally, IT is a capital expenditure, meaning, equipment is purchased based on the prediction that the equipment will be used for the next three to five years. With cloud services, money is spent as an operational expenditure, which means that money is spent on the services needed/used each month. There are many choices when configuring and building cloud environments that make a difference in your bottom line, therefore, it is important to strike a balance between money spent on services and what you actually use.
The five design principles within the Cost Optimization pillar
To help you make the most informed decision, let’s take a look at the design principles for Cost Optimization:
- Implement cloud financial management by creating a cost-aware culture. It is essential that budgets and forecasts are used to predict and determine the amount of money that should be spent on any given cloud service. It is also good practice to determine any reasons that the business may incur cost overruns, using root cause analysis from existing processes.
- Adopt a consumption model that enables IT to take actions that reduce or increase usage, based on your business’ needs. The cloud has a basic design principal of paying only for what you consume. For example, in a test environment, shutting down the virtual servers at the end of the day would save money.
- Measure overall efficiency by tracking metrics for specific usage parameters to the specific department that uses the service to enable better calculations and proof of efficiencies made by increasing the output and functionality.
- Stop spending money on undifferentiated heavy lifting. The cloud service providers do the heavy lifting for you—buying servers, racking them, and managing them. By having the cloud provider do this work, your business only has to spend money on the actual usage.
- Analyzing the attributed expenditure using a tagging method. Tagging allows the usage of a cloud service to be attributed to certain departments within a business and enables more accurate Return on Investment (ROI) calculations.
Growing with the cloud and remaining well-architected
Cloud services provide so many advantages to so many different types of businesses, however these advantages are coupled with the fear of change. The technology available today is simply amazing and it is hard to imagine what we will have in the future. It is possible to design and engineer a secure infrastructure that allows businesses to take advantage of the cloud and evolving technology, while still protect data. Taking into consideration the services needed, locations of the end users, data security requirements, and budget management. Attention and care will allow your business to utilize the cloud in the more efficient, secure, and cost-effective manner. So explore, evolve, and push the technology boundaries for successful business management!
As more security breaches hit the news and data protection becomes a key focus, ensuring your organization adheres to the well-architected framework’s design principles is crucial. Conformity can help you stay compliant to the well-architected framework with its 750+ best practice rules. As mentioned above, if you are interested in knowing how well-architected you are, see your own security posture in 15 minutes or less. If you liked this topic, don’t miss other articles in this series: 1) security 2) performance efficiency 3) operational excellence 4) reliability 5) cost optimization.