Amazon Web Services said it has resolved an outage affecting its US-EAST-1 Region in northern Virginia that had knock-on effects across the internet, taking down major websites including Snapchat and Reddit and continuing to roil services for much of the day on Monday.
The company said services in US-EAST-1 initially experienced increased error rates and latencies from 7:49 a.m. BST to 10 a.m. BST, which also disrupted services that rely on US-EAST-1 endpoints, such as DynamoDB Global Tables and Identity and Access Management, or IAM.
At 8:26 a.m. BST, AWS said it identified the trigger for the problem as DNS resolution issues for the region’s DynamoDB service endpoints.
Cascading glitches
The company said it resolved the DynamoDB DNS issue at 10:24 a.m. BST, but the initial problem resulted in further impairments that were not fully resolved until 11:01 p.m. BST.
The further issues included an impairment to the internal subsystem of EC2 responsible for launching EC2 instances, which depends on DynamoDB, and impairments to Network Load Balancer status checks, which caused network connectivity problems with services including Lambda, DynamoDB and CloudWatch.
The Network Load Balancer status checks were recovered as of 5:38 p.m. BST and AWS said all services had fully returned to normal operations as of 11:01 p.m. BST.
In the UK, the problems affected websites including Lloyds Bank, Bank of Scotland, Vodafone, HMRC and Gov.uk.
Many internet-connected devices also ceased to work, including Ring doorbells and Amazon Alexa-enabled smart plugs that some people use to control home electrical devices, according to users.
Connected devices offline
Outage reports globally on Downdetector peaked at more than 6.5 million on Monday morning.
Some reported that work-related services such as the Slack messaging platform or the Zoom videoconferencing system were experiencing errors for a period of several hours.
A reader named Christina, who is unable to walk without crutches, told CNN that the Alexa-enabled smart plugs she used to control the lights and music in her room via voice control had ceased working.
The outage is the biggest internet-related disruption since a faulty CrowdStrike security update in July 2024 caused millions of Microsoft Windows systems to crash, creating chaos across airlines, hospitals, banks and other businesses.
Mehdi Daoudi, chief executive of internet performance monitoring firm Catchpoint, estimated that the impact of the incident would reach into the hundreds of billions of dollars, due to lost productivity and businesses being forced to halt or delay operations.
“The incident highlights the complexity and fragility of the internet, as well as how much every aspect of our work depends on the internet to work,” Daoudi said in a statement to Silicon UK.
Fragility
Other industry watchers said the incident showed how many companies had relied on a single cloud provider, increasing the chances for a major outage across the internet.
The largest cloud providers Amazon, Microsoft and Google, provide about 63 percent of all cloud services, according to figures from Synergy Research Group.
Amazon’s own services, including Amazon.com, AWS cloud services and the Alexa digital assistant, were affected by the problems, users said.
Amazon warehouse and delivery staff reported on social media that internal systems were offline at many sites, with some warehouse workers saying they were instructed to stand by in break rooms and loading areas during their shift to wait for systems to come back online.
Online games including Roblox and Fortnite experienced outages, as did online graphic design tool Canva and generative AI search offering Perplexity.