Post written by Jason Dablow
Blue Green deployment (or Red/Black, A/B) is a deployment methodology to eliminate downtime from your workloads by bringing up a parallel production environment and implementing required changes before moving the traffic from one group to another. It is an effective technique to minimize risk in application changes ensuring you have appropriate time to test while your users are unaffected and being handled as normal. There are also security events which can be handled similarly.
Let’s take a closer look at some specifics and scenarios.
In this figure, we have a set of EC2 instances running behind an elastic load balancer in production and labeled as Blue. These instances could be running a number of different services including but not limited to LAMP stack or application logic.
Next, we’ll bring up a parallel architecture that mirrors the blue workloads. These instances could be in a separate region, availability zone, or subnet. You might even place them in the exact same location with an enumerated version in the tags or separate it by using a different naming convention.
However, you decide to bring up the parallel environment, your next goal will be to apply the change on the green side while blue is still handing production. This change could be new application logic or patch, an operating system hotfix, anything that could cause an outage to your customers that would require testing.
After the change has been made, it’s important to test all aspects of the instance to ensure proper functionality. Since these are about to go into production and the whole purpose of this technique is to eliminate downtime, testing is the most critical stage.
Finally, when you complete your testing, you will promote the green side to production. At this stage, you’ll want to also start draining the connections from the blue side so they end gracefully, so all traffic will eventually only go to the green, now production, side. Elastic Load Balancers in AWS do support a feature called, “Connection Draining” which will complete all in-flight requests to an instance before they are removed from the connection pool.
Once the connections are drained from the blue instances, they should be terminated and removed from your inventory since they are no longer needed. Now, with green as your new productions, new AMIs, either dynamically or static, can be spawned from these new instances into your blue side for additional changes, testing, and promotion to production. As you move back and forth with this code deployment and application development, you can minimize the impact to your users to hopefully zero.
From a security perspective, this also allows you to buy yourself time during a breach or attack. By bringing up a parallel environment, you can test new firewall or intrusion prevention rules, pull in a new security hotfix, or even just remove an attackers footing in the existing instances causing them to start their attack over.
You could also use techniques like quarantining instances into a locked down security group to run forensic analysis on or switching over the deployments automatically in the case of a malware or other alert from your security tool.
The ability to swap back and forth between parallel production environments allows you to deal with many situations since it effectively makes compute disposable. If you can move your workloads seamlessly without loss of user connectivity, it gives your environment resiliency and flexibility to respond to any situation (hopefully automatically).