On October 20, 2025, thousands of organizations that rely on Amazon Web Services (AWS) unexpectedly found themselves offline and unable to operate. It wasn’t a cyberattack, and no data center was struck by a natural disaster. Rather, a seemingly minor domain name system (DNS) resolution failure at one of the world’s most trusted and relied upon cloud platforms halted operations, fractured customer experiences, and put revenue at risk across industries.
From airline check-in kiosks to social media platforms and enterprise systems, the ripple effects were immediate and widespread. These events underscore a hard reality for any organization: even the most resilient-seeming platforms can fail. The question isn’t if disruption will occur; it’s how prepared your organization is when it does.
The False Security of Cloud Reliance
Cloud computing has often been touted as the ultimate solution that’s scalable, secure, and always available. But this AWS outage shatters that notion. A single internal issue in one region cascaded across services, exposing the fragility of that dependence.
The CrowdStrike incident in July 2024—when a faulty security update crashed millions of Windows machines globally, disrupting hospitals, airports, and financial institutions—was similarly alarming. Intended to protect systems, the update triggered mass outages, proving that even trusted vendors can become points of failure.
Organizational resilience isn’t about avoiding disruption altogether. It’s about building processes, systems, and capabilities that absorb shocks, adapt quickly, and recover with minimal damage.
The Real-World Impact of Disruption
When cloud services fail, the consequences are immediate and tangible.
- Operational Paralysis: Employees can’t access critical systems, leading to customers facing delays and services grinding to a halt.
- Financial Loss: Global survey data indicates that the median cost of a high-impact outage is approximately $1.9 million per hour.1
- Reputational Damage: Clients and customers don’t care why systems fail, only that they didn’t receive the services they expected.
During the AWS outage, some companies couldn’t process transactions, while others lost access to internal tools, communication platforms, and customer-facing applications.
In the CrowdStrike case, hospitals reverted to paper records, airlines grounded flights, and banks delayed transactions—all because of a single vendor update.
What Does Resilience Look Like?
True resilience is proactive, not reactive, and is built on three foundational capabilities to foster continuity, recovery, and control during disruptions such as these.
Business Continuity
Business Continuity (BC) helps ensure critical operations can continue even when key technologies or services are unavailable.
In the case of the AWS outage, organizations with strong BC capabilities had already identified cloud service dependencies and developed workarounds or manual procedures to maintain essential functions. For example, if a payroll system like Workday or a collaboration tool like Slack went down, a resilient organization would have:
- Predefined alternative workflows or local access to critical data
- Communication protocols to keep teams aligned
- A clear understanding of which services are mission-critical and how to temporarily sustain them
BC planning is about operating through disruption, not just recovering from it.
Disaster Recovery
Disaster Recovery (DR) focuses on restoring systems and data quickly and effectively.
One of the biggest lessons from the AWS outage is the risk of single-region dependence. Many services hosted exclusively in US-East-1 were affected. Organizations with multiregion configurations, cross-cloud failover, or hybrid infrastructure were able to reroute traffic and restore systems faster.
Proactive DR planning includes:
- Replication of workloads across multiple regions or providers
- Automated failover mechanisms
- Regular testing of recovery procedures
These strategies reduce downtime and the impact of infrastructure-level failures.
Crisis Management
Crisis Management (CM) helps ensure the organization can coordinate a response, communicate effectively, and protect its reputation during disruption.
While many companies during the AWS outage struggled to explain service interruptions to customers, partners, and internal stakeholders, those with CM capabilities were better prepared to:
- Activate incident response teams
- Deliver clear, well-timed communications across channels
- Manage external perception and maintain trust
Crisis management empowers leadership to act decisively, communicate clearly, and maintain control when disruptions like this occur.
How Resilient Are You?
Ask yourself: If our most relied-upon technologies failed tomorrow, would we be ready?
Disruption is no longer a distant possibility; it’s a recurring reality. The organizations that thrive are those that treat resilience as a strategic priority, not a compliance checkbox.
Whether it’s a cloud outage, a vendor misstep, or a cyber incident, your ability to respond quickly and confidently can mean the difference between a temporary setback and a lasting crisis.
If you’re unsure where to start, the Organizational Resilience team at Forvis Mazars can help build resilience strategies tailored to your organization’s needs.
Reach out to us today to begin strengthening your resilience before the next disruption hits.
- 1“New Relic Study Reveals IT Outages Cost Businesses Up to $1.9 M Per Hour,” https://newrelic.com/press-release/20241022, October 22, 2024.