For as long as I can, nothing of my interactions on the Internet, nothing of my computer storage or backups, will be done in the cloud. The same stricture should, frankly, apply to businesses. Businesses should construct their own cloud—there’s no doubt clouds are deucedly convenient and, when working properly are highly efficient—and isolate that cloud from the Internet.
Here’s an example of why, brought to us by Amazon.
A glitch with an obscure Amazon database disrupted life for millions of people across the US as core internet services failed to function for an array of companies.
Alexa devices couldn’t hear. Corporate Slack messages wouldn’t post. Students couldn’t turn in assignments or access materials from courses. Financial trades were impossible on certain platforms. Users of Zoom, Venmo, Instacart, and a host of other services faced prolonged outages that rippled through homes and businesses.
Some of that is poor design. Alexa needs connectivity with the Internet? To execute an order to buy this, or to give the weather forecast, sure. But to set a timer for cooking? To set a reminder? An alarm clock? Really?
Back to the main subject. That minor database? It got a minor update that scotched the whole affair.
A minor update to what is called the Domain Name System, or DNS—the kind of software tweak that happens millions of times a day on the internet—sent the well-oiled machine that underpins the modern web careening toward a crash.
DNS acts as a kind of telephone directory for the internet, instructing machines on how to find each other. The faulty update gave the wrong information for DynamoDB, an Amazon Web Services, or AWS, product that has become one of the world’s most important databases.
Suddenly, machines on the East Coast that tried to process trillions of requests were getting the internet’s equivalent of a wrong number.
Think about the effects of time-sensitive moves being blocked from execution for hours and hours, often long enough to prevent the move from happening until it’s far too late and by delay, far more costly. That happens in the financial world—think in extremis overnight repos needing to be unwound….
That kind of foul-up could have been worse: computers not knowing how to contact each other could have prevented the computers hosting the DynamoDB database from being contactable by humans working from other computers in their attempts to correct the glitch in that database. That didn’t happen this time, but next time? Or it might have this time, but Amazon had backup systems it could switch into while they took the offender systems off line to correct.
Regardless, it took the better part of a day for Amazon to get their minor update of a minor database corrected and the effects of the repair to ripple across the Internet.
The article’s subheadline summarizes the situation and is why I avoid the cloud as much as possible:
Outage offers reminder of fragility of global internet connectivity