Recently, there have been some high profile failures of cloud computing, including the Sidekick outage, the DDos attack on Amazon’s EC2 and disruption to Google’s hosted email. Following these debacles, some people have expressed scepticism about the cloud computing model. For example, a response to a CNET article was: “Putting all your beans in a single point of failure for users (in an enterprise or corporation) is suicide.”
Here I will consider a range of activities as “Cloud Computing” including SaaS, PaaS and IaaS. All three raise some concerns for companies. Companies that find the benefits of cloud computing compelling should plan and execute their cloud computing strategy in a way that avoids the risk of catastrophic failure.
When moving systems into the cloud, a company needs to consider a range of potential catastrophic failure scenarios and consider actions they can take to mitigate those situations. Cloud computing vendors (particularly IaaS vendors) typically have tight limits on their responsibility (see Todd’s post on “Who Owns the Mess?”), so the company is unlikely to have any legal recourse there (unless the vendor was negligent).
Companies need to consider that other customers or their provider of cloud computing facilities may be the target of attacks (such as the DDOS attack that hit Amazon EC2 customer Bitbucket). One approach to minimize the risk of this is to distribute applications across cloud computing vendors. Some applications (such as storage) may be suited to such a distributed approach, while other applications (such as using SaaS spam and virus filtering) may be difficult to distribute across vendors.
To mitigate against system failure, companies need to evaluate how fault tolerant the systems they intend to use really are. It is important for the company to assess their needs and the robustness of the infrastructure / systems they intend to use (for an example of a non-robust cloud infrastructure, see a discussion on Sidekick’s infrastructure in Andrew’s post on “The Sky is Falling on Cloud Computing”.
IT staff should consider that with IaaS they are typically using a more homogenous computing environment than is typical inside a company. This monoculture has both advantages and disadvantages. The environment can be more efficient because it is better understood, security patches can be applied systematically to all instances, administration can be centralised. However, the security downside is that these very same features open the door to exploitation. Intruders potentially have the opportunity to hire the same computing environment and test it for weaknesses. Some of the potential breaches are due to the virtualization techniques used and can be quite unexpected (for example, at the recent BlackHat conference, researchers Becherer, Stamos and Wilcox considered the issue of exploiting cloud computing instances by using the lack of randomness in random number generation).
The company deploying into the public cloud needs to consider how administrative access will be granted to cloud computing resources (see “Cloud Danger #3: Reliance on Passwords”). One approach to reduce the risk of passwords is to use two factor authentication for administrative purposes (see ).
Another situation which comes under the heading of catastrophic failure is data theft and data loss. Given that access to the cloud computing resources will be remote, the company needs to consider measures such as encrypting data in the cloud (for example, see Amazon’s whitepaper on encrypting data in the cloud ). Ideally the deploying company should hold the encryption keys rather than the IaaS provider.
The use of cloud computing does not necessarily equate with “putting all your eggs in one basket”. If due care is taken to minimize the risk of catastrophic failure then the benefits of cloud computing are available to many companies – and we will continue to see meltdowns occur where due care is not taken.