AWS outage shines a light on hybrid cloud

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

As the dust begins to settle on yet another cloud outage, chatter will once again center on the wisdom of companies putting all their digital eggs in a single cloud provider’s basket.

Amazon’s AWS “US-East-1” cloud region went down in North Virginia yesterday, disrupting some of Amazon’s own applications and a slew of third-party services that rely on AWS. The cause? An “impairment of several network devices” that led to multiple API errors, which in turn impacted myriad AWS services including Amazon Elastic Compute Cloud (EC2), Connect, DynamoDB, Athena, Chime and more.

This isn’t the first time AWS and its customers have suffered at the hand of technical glitches — a similar event occurred just last November that impacted the very same AWS region. And while all the major cloud providers including Microsoft and Google have suffered similar fates at various junctures in the past, as the world’s largest public cloud provider, AWS outages often have the farthest-reaching impact.

For several hours yesterday, services such as Disney+, Netflix, Instacart, and McDonald’s were impacted, often to humorous (and somewhat inconvenient) effect, as one McDonald’s visitor demonstrated:

Disaster recovery and mitigation

With more and more business dollars going toward cloud computing infrastructure, incidents such as this serve to highlight why companies need to adopt robust disaster recovery and mitigation plans.

While this might include using third-party data backup services, major cloud outages also serve to support those that argue in favor of hybrid or multi-region cloud strategies — particularly for mission-critical services. With hybrid in particular, companies can use their own on-premises infrastructure, leaning on the public cloud only to ensure that their in-house systems don’t crumble under peak traffic.

Chris Gladwin, founder and CEO of “exabyte-scale” database technology company Ocient, says that despite all the hype around cloud migration, the risks posed by major outages mean that “hybrid” will likely be the best approach for many bigger companies.

“This is not the first time AWS has experienced these issues,” Gladwin said. “For mission critical applications, we see organizations turning to on-prem and hybrid cloud deployments that ensure they have greater line-of-sight and control over their deployments, uptime and, ultimately, business results.”

Service level agreements (SLAs) also play an important part in companies’ cloud strategies. While any amount of downtime — even minutes — can cost businesses a lot of money, this needs to be balanced against the cost of using public cloud platforms. For example, a company that requires 100% uptime for their application will likely want to host their application across multiple regions, even though this will cost a lot more money — but a company that can live with a few hours of downtime once or twice a year might want to hedge their bets and pay less for a single cloud region or zone with a 99% uptime guarantee.

“A cloud service level agreement of 99% uptime still allows almost 8 hours per month of downtime,” said John Pescatore, director of emerging security trends at cybersecurity training and certification company Sans Institute. “Businesses need to invest in redundant or backup capabilities, or pay for higher levels of guaranteed availability to preserve critical business services when running in the cloud.”

Pescatore also highlighted the potential “concentration risk” that large companies face if too many parties in their supply chain use the same single cloud service provider.

“Larger businesses need to look at their suppliers and see if they are subject to concentration risk — too high a percentage of suppliers on one cloud service, and even a short outage can be disastrous to business,” he said.

Originally appeared on: TheSpuzz