
Recent Outages in Major Cloud Services: Microsoft and Google
Last week, Microsoft 365 experienced a significant disruption on the 9th, caused by an outage in the Exchange Admin Center (EAC).Compounding the issue, just a day later, users reported being locked out of their family subscriptions due to a bug affecting the platform.
Similarly, Google Cloud is not immune to outages. At the end of last month, the platform suffered a substantial incident when its uninterruptible power supply (UPS) system failed to operate as intended, leading to a prolonged outage of nearly six and a half hours. This disruption primarily affected the “us-east5-c”zone located in Columbus, Ohio, which utilizes systems powered by AMD EPYC and Intel Xeon processors.
Understanding the Google Cloud Outage
In a detailed support article, Google clarified the timeline and cause of this incident:
On Saturday, 29 March 2025, multiple Google Cloud Services in the us-east5-c zone experienced degraded service or unavailability for a duration of 6 hours and 10 minutes.
The root cause of the service disruption stemmed from a loss of utility power in that zone, which triggered a cascading failure within the uninterruptible power supply (UPS) system. This system is designed to maintain power during utility outages but suffered a critical battery failure, rendering it unable to fulfill its essential role.
As a direct result of this UPS failure, virtual machine instances in the affected zone lost power, leading to service downtime for numerous customers. This outage also caused secondary issues, such as packet loss affecting network communication and performance, along with a limited number of storage disks becoming unavailable.
Response and Resolution
Google has since shared the corrective actions taken to address the outage:
To lessen the impact on certain services, Google engineers diverted traffic from the affected location. They successfully bypassed the failed UPS and restored power via generator by 14:49 US/Pacific on Saturday, 29 March.
Most Google Cloud services recovered shortly thereafter, though some required extended restoration time due to the necessity for manual intervention.
Commitment to Improvement
In a sincere message to its Cloud customers, Google extended an apology for the disruption and outlined proactive steps to prevent future incidents:
To our Google Cloud customers whose services were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to enhance the platform’s performance and availability.
Google is determined to avoid a recurrence of this issue and has committed to the following actions:
- Strengthening the power failure recovery protocol to ensure a quicker and more reliable service restoration after power is regained.
- Conducting an audit of systems that did not failover automatically, addressing any gaps that obstructed this functionality.
- Collaborating with our UPS vendor to investigate and resolve the issues encountered within the battery backup system.
We are dedicated to continuously improving our technology and operations to prevent future service disruptions. We greatly appreciate your patience and apologize once again for the impact this incident may have had on your organization. We thank you for your continued support.
For More Information
Comprehensive details regarding the recent outage can be found in the support article here on Google’s Cloud status website.
Leave a Reply