Amazon Web Services has officially identified the culprit behind the massive outage that disrupted millions of customers and Amazon’s own operations on October 19 and 20, 2025.
The company confirmed that a DNS resolution problem affecting regional DynamoDB service endpoints triggered the cascading failure that lasted approximately two hours and thirty-five minutes.
A DNS Problem Cascades Across the Cloud
The outage began at 11:49 PM PDT on October 19, sending shockwaves through the AWS ecosystem. The issue wasn’t caused by widespread infrastructure collapse or hardware failures.
Instead, the system experienced a specific DNS resolution problem that prevented it from properly directing requests to DynamoDB endpoints.
DynamoDB, Amazon’s high-performance database service, powers countless applications globally.
When DNS couldn’t route traffic correctly to these services, it triggered a domino effect throughout the entire AWS infrastructure.
Amazon.com itself went dark during the incident, along with numerous Amazon subsidiary services and even AWS customer support operations.
AWS engineers sprang into action at 12:26 AM PDT when they identified the DNS resolution issue at the heart of the problem.
By 2:24 AM PDT, they had successfully resolved the core DynamoDB DNS issue. However, fixing the primary problem didn’t instantly restore complete normalcy.
A small subset of internal subsystems remained impaired even after the DNS issue was resolved, requiring additional intervention.
Throttling Operations to Prevent Complete Collapse
AWS took a strategic approach to prevent further system degradation. Engineers deliberately throttled certain operations, particularly new EC2 instance launches.
This intentional slowdown might seem counterintuitive, but it actually helped the system recover more smoothly by preventing overwhelming cascades of failed requests.
Rather than allowing the system to crash completely, controlled throttling guided it toward stability.
By 12:28 PM PDT, significant recovery progress became visible across AWS services and customer systems.
AWS teams continued to gradually reduce the throttling on EC2 instance launch operations throughout the afternoon, carefully monitoring system health at every step.
They addressed remaining impact areas methodically, ensuring nothing else would give way.
By 3:01 PM PDT on October 20, AWS announced that all services had returned to normal operations.
While the actual outage lasted only about two and a half hours, the complete recovery process extended approximately fifteen hours from initial detection to full restoration.
Amazon has published a detailed post-event summary explaining exactly what happened, how teams responded, and what preventative changes they’re implementing to avoid similar incidents.
The company advises customers still experiencing any lingering issues to check the AWS Health Dashboard for real-time status updates and additional information about any services that may still need attention.
Cyber Awareness Month Offer: Upskill With 100+ Premium Cybersecurity Courses From EHA's Diamond Membership: Join Today