Amazon Web Services has confirmed that a widespread service outage in its critical US-EAST-1 region has been fully resolved after almost 24 hours of disruption.
The incident, which began late on October 19, 2025, impacted over 140 AWS services and left customers around the globe unable to access compute, storage, database, and networking features until the afternoon of October 20.
DNS Resolution Failure Triggers Cascading Service Problems
The disruption commenced at approximately 11:49 PM PDT on October 19 when AWS engineers observed unusually high error rates and slow responses across multiple services in US-EAST-1.
Initial investigation identified a DNS resolution issue affecting regional DynamoDB service endpoints.
AWS confirmed that DNS queries for DynamoDB were failing, creating a ripple effect through dependent systems.
After correcting the DNS configuration at 2:24 AM, engineers discovered that EC2’s internal subsystem responsible for provisioning new virtual machines remained impaired due to its reliance on DynamoDB for metadata retrieval.
Following the DynamoDB fix, the situation worsened when Network Load Balancer health checks began failing, leading to degraded connectivity for services such as Lambda, DynamoDB, and CloudWatch.
To stabilize the environment, AWS implemented temporary throttling of several operations, including EC2 instance launches, SQS queue processing via Lambda Event Source Mappings, and asynchronous Lambda invocations.
This approach allowed engineers to systematically restore load balancer health checks, a milestone achieved at 9:38 AM PDT on October 20.
Over the ensuing hours, operation throttling was gradually lifted as network performance improved.
By 3:01 PM PDT, all affected AWS services had returned to normal operations.
Despite the primary restoration, some globally distributed services, such as AWS Config, Redshift, and Connect, continued processing accumulated message backlogs for several hours.
Customers particularly felt the impact on IAM authentication, DynamoDB Global Tables, and tasks involving EC2 instance deployments and Lambda function invocations.
The outage also temporarily prevented users from creating or updating support cases, compounding operational challenges for organizations relying on AWS during the incident.
AWS has pledged to publish a detailed post-incident report outlining the root cause, resolution steps, and planned improvements to prevent recurrence.
Early recommendations for customers include configuring Auto Scaling Groups across multiple Availability Zones and avoiding zone-specific instance targeting to bolster resilience against regional failures.
Moving forward, AWS aims to enhance its DNS infrastructure and service dependency management to reduce the risk of cascading disruptions in one of its largest and most critical regions.
Cyber Awareness Month Offer: Upskill With 100+ Premium Cybersecurity Courses From EHA's Diamond Membership: Join Today