News

Amazon Cloud Outage Continues to Degrade Service for Many Apps

Customers such as Foursquare, Reddit, Quora and Hootsuite were among those impacted by the disruptions, according to published reports.

The Amazon Elastic Compute Cloud (EC2) suffered outages in some parts of the country on Thursday causing service degradation for hundreds of applications including some high profile Web sites and social media services.

On Friday Amazon Web Services reported on its Service Health Dashboard that it had made progress on the latency and connectivity issues; volume recovery for most of the service disruptions was expected to take a matter of hours:

We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.


The trouble first occurred at its northern Virginia datacenter early Thursday morning, disrupting the Amazon EC2 and the Relational Database Service. Amazon continued to report issues throughout the day and night on its Service Health Dashboard.

Customers such as Foursquare, Reddit, Quora and Hootsuite were among those impacted by the disruptions, according to published reports.

Only Reddit attributed its service disruption to the Amazon's outage. According to a notice on Reddit's site: "Reddit is in 'emergency read-only mode' right now because Amazon is experiencing a degradation. They are working on it but we are still waiting for them to get to our volumes. You won't be able to log in. We're sorry and will fix the site as soon as we can."

Amazon pointed to a single availability zone in the Virginia datacenter. According to a status message at 4:48 eastern time:

"All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests."

Service at Amazon's northern California datacenter and its facilities in Europe and the Asia Pacific appeared to be running fine, according to the dashboard. Experts say customers can avoid the impact of such problems by selecting multiple availability zones or even better, multiple cloud providers.

"If your business relies on a Web site to be up, why do you allow a failure in a single availability zone to shut down your business?," wrote Scott Sanchez, security and privacy officer at ScaleUp Cloud, in a blog post. "There are so many tools out there at this point to simplify deployment, scaling and resiliency across multiple availability zones or even across multiple cloud providers – frankly, you have no excuse."


About the Author

Jeffrey Schwartz is executive editor of Redmond magazine, an editor-at-large at Redmond Channel Partner and an editor of The Cloud Report newsletter. Follow him on Twitter @JeffreySchwartz.

Reader Comments:

Add Your Comment:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above