Create Efficient Well-Architected Cloud Infrastructure
This article explores the Performance Efficiency pillar of the Amazon Web Services and Azure Well-Architected Framework. We will examine how to create performance efficiency in the compute, storage, database, and network elements of cloud infrastructures.
Related articles in the Well-Architected series:
- Overview of All 5 Pillars
- Security Pillar
- Reliability Pillar
- Operational Excellence Pillar
- Cost Optimization Pillar
The Amazon Web Services (AWS) Well-Architected Framework provides best practices that can guide you towards building a strong and secure cloud infrastructure. Included in the Well-Architected Framework are five pillars that each have their design principals to ensure you are well-architected. In this article, we will focus on the fourth pillar, Performance Efficiency.
Achieving performance efficiency requires a lot of hard work, but the cloud can help a business achieve the performance they need as efficiently as possible, both now and into the future. With the tools in place from cloud providers like AWS and Azure, the most efficient designs can be realized.
The five design principles for Performance Efficiency within a Well Architected Framework
AWS has defined five core design principles to achieve the efficiency needed and wanted by businesses today:
- Democratize advanced technology by allowing your employees to work efficiently within their current knowledge. With innovative technology there is a learning curve that developers/technicians need to go through to make the most of that technology. If you allow the cloud provider to do the work to learn and build the innovative technology, then as a customer, you can just lease access to that service as you need. This leaves developers/technicians concentrating on what they do best.
- Go global in minutes by deploying your workload into regions around the world that are close to the systems’ users. This allows for lower latency and better performance. The shorter the distance from the user to the server, the more responsive a system can feel to the user.
- Use serverless architectures—leaving the management of physical data centers and servers to the cloud provider. As the customer of a cloud provider, there is no need to do the heavy lifting of building a physical data center, buying servers, racking servers, and running and maintaining them. You only have to worry about the service that you need, which alleviates you from the burden of capital expenditure and moves financial management to operational expenditure.
- Experiment more often by exploring what technology would work best for you now. With virtualization services, it is quick and easy to spin up a new instance of a virtual machine, use it, and then turn it back off. So, go ahead and experiment. Find the service that provides the performance that you need and turn everything else off.
- Consider mechanical sympathy by exploring technology and utilizing what works best for your business. It is hard to say exactly what the best options are until to try them, so taking the time for exploration can help you find the technology that meets your business needs.
Interested in knowing how well-architected you are? Check out the free guided public cloud risk self-assessment to get your own results in minutes.
Four areas to achieve Performance Efficiency within a Cloud Infrastructure
AWS states that you should focus on the following four areas to adhere to the Performance Efficiency pillar:
Selection of cloud services
The first area we will dive into is “selection”. The selection of services is an essential step that needs to be carried out thoughtfully so that you end up with the services needed to satisfy your business’ requirements. A single service is not likely to meet the needs of the business; it is likely that there will be a variety of services required to build an environment that offers the performance needed to make use of the cloud in an efficient and effective manner.
Critical concepts for selecting cloud services
There are critical concepts to work through when selecting any type of technology. AWS gives us insight into this with these concepts:
- Understand the available services and resources to discover the solution that is best for your business. Research and learn about what is available—this is not a one-off task, as innovative technology and offerings are continuously being offered by the cloud providers.
- Define a process for making the best architectural choices. These processes could include using published use cases and white papers, or utilizing internal experience that you have with past cloud design and engineering projects. The chosen process should include both benchmarking and load testing to ensure the choice will be able to function for the business.
- Be sure to factor cost requirements into decisions to ensure you spend money wisely. There is always a limited amount in a budget, which requires an understanding of what you will get for the money spent. Once you understand what a service delivers, you can compare that to the benefits that would be achieved for the business. Just because something is possible in the cloud, does not mean that it is a good financial choice for the business.
- Use policies or reference architecture to guide the process of technology selection. Policies and procedures need to be updated on occasion to ensure that they continue to guide you to the right technology. Reference architectures are especially useful documents in this process. There are reference architectures to be found at the US National Institute of Standards and Technology, such as NIST SP 500-299, the International Standards Organization with ISO 17789, and the Cloud Security Alliance, with their document titled “Reference Architecture”. There is a lot of knowledge and experience that has been collected and put together into those documents and it can shorten your path to the most efficient infrastructure for your business.
- Use guidance from your cloud provider. They know their technology and most vendors have departments dedicated to analyzing your decisions and making recommendations for improvements.
- Benchmark existing workloads at the beginning of a selection process. This is easier to setup and run than a load test, and is commonly used at the beginning of a selection process. This can be done using synthetic tests that will generate data about how your workload would operate on that service in the cloud.
- Load test your workload to show more accurately how your workload would perform. This would be done with your actual workload, which means that data must be sanitized or synthetic data must be used. Be sure to conduct your load test across multiple platforms, monitor the tests, and utilize the knowledge learned from that test to modify designs to achieve the performance desired.
Selection of cloud technology falls into the basic categories of compute, storage, database, and network. There are a lot of choices in each of those categories and they must be selected with care. Let's run through the many options that AWS offers.
Within compute, there are three basic categories of services to select from: Instances, containers, and functions. Choosing the wrong solution will impair the performance of the cloud in achieving the goals of the business.
Cloud computing selection categories
- Instance: A virtual server with options that allow for the selection of solid state drives (SSDs), graphics processing unites (GPUs), or a variety of other choices. The Amazon Elastic Cloud Compute (Amazon EC2) virtual server has a variety of different instance choices that define the running server. While the selection of the right server may be difficult, Trend Micro Cloud One™ – Conformity has defined rules to help with a variety of EC2 situations. In particular, being able to identify an over utilized instance that would impede performance.
- Container: As described by AWS, containers are a “method of operating system virtualization that allow you to run an application and its dependency in a resource-isolated process”. In other words, you do not have to manage a full server, just the applications that you wish to load. If you wish to manage the server as well, Amazon EC2 would be the service to select. If you just want to run the applications without concern for server management, then AWS Fargate is the solution for you. Once you’ve made a decision on using either AWS Fargate or Amazon EC2, the next question is, which container orchestrator do you want?
- Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service. Amazon ECS allows for management of containers on Amazon EC2 or AWS Fargate.
- Amazon Elastic Kubernetes Service (AmazonEKS) is a fully managed Kubernetes® orchestration service. EKS allows for management of containers on AWS Fargate.
- Functions allow for a different level of abstraction, so you can run code without having to worry about the execution environment. For example, AWS Lambda allows your code to run without a virtual server instance. The execution of that code could be from another AWS services’ call, an API gateway, or a direct call.
Critical cloud computing selection options
Within the compute selection categories, there are a variety of options to choose from. Understanding these options is critical to achieving the performance that you require. Some of the critical features to choose from include:
- GPU, which has a high degree of parallelism that regular computing needs can benefit from. A GPU has the support of hardware acceleration for computations and encoding.
- Field-programmable gate arrays (FGRA) allow hardware acceleration for the execution of workloads. There is a requirement to define your algorithms using programming languages of C or Go, which can also be done with hardware-oriented languages like Verilog or VHDL.
- AWS Inferentia are instances that allow for large scale machine learning when working with image or speech recognition, such as fraud detection or natural language recognition to name a few uses.
- Burstable instance families run at a moderate workload, unless there is a need for a burst of CPU. This is a good service selection when working with a small database, developer workloads, or web servers. When the workload is low, there is an accumulation of CPU credits, which would be used later when needed.
Whatever choices you select, it is always a good idea to make use of the elasticity of resources. The cloud will scale up or down as needed, but it requires planning to have it work the way you need it to. Use metrics to understand your usage, as well as test-scenarios to ensure that your configurations will scale up and down as needed, when needed.
Once selection of compute service has been made, it is essential they are monitored. The true utilization of data can reveal a lot of information that you can use to make informed decisions to improve performance efficiency in the future.
Storage solutions come in a variety of formats. Choosing the best solution for your business depends on what you are using it for. If you choose the wrong options, it can be very detrimental to your performance.
In order to make the right decisions for your business, it is critical to understand what the data access needs are, whether data is accessed from one source or many, if the throughput is consistent, and whether access speed or reliability is more important to the business.
Cloud storage considerations
Some of the options to consider include:
- Access method: Block, file, object
- Block has the lowest latency and is great if your storage is accessed from one instance
- File system has low and consistent latency, but is better if many instances are accessing this storage
- Object has low latency. Amazon S3 has cross-region replication available, which reduces the latency for each regions’ access
- Patterns of access: Random or sequential
Throughput required. Amazon Elastic File System (Amazon EFS) supports highly-parallelized access, which is great for multiple threats and multiple instances. However, it is necessary to load test to select the correct performance mode
- Frequency of access online, offline, and archival
- Frequency of update is another factor that would drive storage selection. This would be: Write once, read many (WORM) or dynamic
- Availability constraints
- Durability constraints. Services like Amazon S3 have an 11 nines durability, which is unheard of at this point, especially in traditional data centers.
- Growth rate
- Amazon S3 and Amazon EFS have unlimited storage
- Amazon Elastic Block Store (EBS) has a pre-determined storage size
Metrics should be established and monitored with appropriate alarms/alerts when storage reaches certain thresholds.
Cloud database architecture
Database architecture selection can be a bit confusing, as there are many solutions to choose from. The correct solution for your business will depend on latency, availability, scalability, consistency, query capability, and partition tolerance. The first place to start is to understand the data characteristics in a workload. If you know your workload, then it is easier to match it to the purpose-built database engine types.
Eight purpose-built database engines
There are many purpose-built database engines. Let’s look at them here:
- Relational databases store data in predefined schemas and there are linked relationships amongst data records. Common applications of relational databases include customer relationship management (CRM), e-commerce and enterprise resource management (ERM).
- Key-value databases deliver fast response times, even if there is an extreme number of concurrent requests. Gaming systems and high-traffic web applications are candidates for this type of database.
- In-memory database engines store data in memory. Applications that require microsecond, instead of millisecond response times would use these databases. This is great for geospatial use, gaming leaderboards, and so on.
- Wide column store is a type of NoSQL database. Like relational, it puts data into tables that contain rows and columns, except each row can uniquely define the value for each column. Wide column store databases are used with fleet management, equipment management, and route optimization.
- Graph database engines are used for fraud detection or social networking, where the application must find the links between millions of graph datasets with a millisecond latency.
- Time-series database engines can look at data that changes over a period of time and collect, synthesize, and derive insights for that data. DevOps and IoT can utilize this type of database.
- Ledger database engines provide a cryptographically verified, immutable record of transactions by a centralized and trusted authority. This is great for banking transactions or supply chain.
Want to never have to manually check for adherence to AWS or Azure configuration best practices again? Have your AWS and Azure cloud infrastructure scanned for adherence to 750+ cloud configuration guidelines by signing up for our free trial.
Cloud computing network
Networking is virtualized on AWS, as are most services that they offer. Selecting and building your virtual network requires evaluating the available options, and then testing and measuring the service to ensure your performance efficiency.
It is critical to understand how your network impacts the performance of your workloads. Understanding network throughput, latency, and errors is essential for defining metrics and analyzing the data to see what needs to be changed to improve efficiency.
Once the you have a handle on the data, you can analyze network features that you can use to configure or alter your network to improve performance. Product features like Amazon S3 Transfer Acceleration, can improve performance dramatically. Others include the AWS Global Accelerator, Amazon S3 Content Accelerator, and the Amazon Elastic Network Adapter, all these options improve performance in some way.
You can also utilize load balancing to take advantage of the elasticity of the cloud. Network Load Balancer is great for TCP traffic that needs extremely high performance. This load balancer can deliver ultra low latency while handling millions of requests per second, even in an extremely volatile traffic pattern. If you offload the encryption process to the Elastic Load Balancer, performance can be improved even more.
Another way to improve performance is choosing the AWS regions that are closest to where your users are. The shortened travel time to the servers truly improves the user experience. Utilizing a content delivery network (CDN), which caches data closest to the users, also offers a great benefit to performance because of its global presence. Amazon Cloudfront is a global CDN. There is also AWS Local Zones, which delivers single-digit millisecond latency for applications like video rendering services.
Understanding business workloads allows for the optimal selection of network options that profile the best performance efficiency.
Cloud computing environment configuration review
There are a limited number of options to choose from when designing your cloud environment for your workloads, but new services do come along periodically, like AWS Local Zones. So, it’s best not to assume that you still have the best service and configuration. It always good practice to go back and review your setup, and analyze your environment from a data driven approach. With that, there are certain things to consider:
- Use something like AWS CloudFormation templates to design your infrastructure as code.
- CI/CD is an efficient way to function, as it allows for consistent and repeatable iterations.
- Develop well-defined metrics like key performance indicators (KPIs), to track both technical and business the measure the level performance you are experiencing. An example of a metric to examine would be the time to the first byte for a web application.
- Evolve your workload, AWS is constantly releasing new technology that could improve your performance. Having a process that allows you to review and improve workload performance is critical.
Cloud environment performance monitoring
Once your workloads are up and running, the work is not complete. It is necessary to monitor the cloud environment for performance, looking at five distinct phases, as outlined by AWS:
- Real-time processing and alarming
Amazon CloudWatch is a tool for monitoring cloud resources, such as your EC2. It will collect and track metrics and log files, sending alarms as necessary. The dashboard for Amazon CloudWatch generates graphs to make it easy to visualize your performance and any improvements or changes to performance.
When monitoring, there is both active and passive approach (AM and PM). PM watches actual traffic, whereas AM generates traffic to simulate user activity. Fundamentally, you are monitoring to ensure workloads are performing as designed.
When incidents or events occur, it is critical to go back and look at the performance metrics to see where things went wrong, and then create alarms to alert you to failures right away so you can intercede.
It is always good to continue to monitor your metrics over time, because they can be your window into future performance improvements.
Cloud performance trade-offs
When selecting services, there are always going to be trade-offs. It is possible to trade off consistency for latency, for example. So, the question to ask and find an answer to is: What is the performance that your workload requires?
Do research. Understand what design choices exist. AWS has a phenomenal builders’ library that has a lot of concepts to research and inform you decisions. As changes are made based on your research, it is especially important to continue to monitor and analyze your metrics. The changes that you make could result in improved performance or they might not, and the sooner you realize what works and doesn’t, the more you can improve.
If you would like to never have to manually check for adherence to well-architected design principals again, sign up for a free trial to have your AWS and Azure cloud infrastructure scanned for misconfigurations and adherence to 750+ cloud best practices. Learn more by reading the other articles in the series, here are the links: 1) overview of all 5 pillars 2) security 3) operational excellence 4) reliability 5) cost optimization.
Performance efficiency is a challenge
There are many choices that can be made when establishing a cloud infrastructure to the computer, network, database, and storage elements. However, when you put in the work to understand data flows within your workloads and the options available, it is possible to choose solutions that truly optimize a businesses’ cloud infrastructure.
Assess your cloud security posture