Bad week for the cloud as both Microsoft and Google suffer outages

The fact that cloud services depend on a lot of infrastructure that has evolved gradually as the Internet expanded provoked more headaches for Office 365 this week. Whereas the August 17 outage for North American users of Exchange Online was due to failed network components, Microsoft could not have been held responsible for the outage that afflicted Office 365, SkyDrive, Azure, Hotmail and other cloud services on September 8/9 as the root cause appears to lie within the Domain Naming Service (DNS). Clients reported an inability to resolve the DNS names required to reach the Microsoft services and so were unable to connect.

The incident started at approximately 5AM (mainland Europe, 8PM in Seattle on September 8). The effects were initially felt by users in the Asia-Pacific region. As time went by, European users added to those who wanted to connect and added their voice to the clamour asking what was going on. Of course, as DNS wasn’t working, it was impossible to get status updates from the Microsoft service dashboard so the information void had to be filled by Twitter updates from the official Office365 account. We therefore saw the best and the worst of the Internet – some web services from some vendors (such as Twitter) were available while others were not.

Office365tweets

In a blog post at 7:49AM, Microsoft reported that they had to make DNS configuration changes that then had to propagate before normal operations could be resumed. Microsoft didn't offer any details about what exactly they had to do to fix DNS. The reconfiguration seemed to take effect from about 8:20AM with users gradually being able to reconnect to all services. Based on the non-scientific measurement of Twitter reports, all users were able to reconnect within an hour. The total incident is in the region of 140 minutes at best, 200 at worst.

It will be interesting to see whether Microsoft issues a service credit for this incident. As you might recall, they allowed a 25% credit to all users after the August 17 outage. However, in this case the root cause might be outside Microsoft’s control if the DNS issue was caused externally. On the other hand, someone may well have made a mistake in the configuration of DNS inside Microsoft's datacenters. We shall await the root cause analysis and the deliberations of Microsoft management.

The folks over in Mountain View also had their challenges this week as Google Docs had an outage on September 7 that lasted approximately an hour (some reports state that the outage was for 30 minutes; I use the data from the Google Apps dashboard shown below). The incident highlighted the lack of offline access for Google Docs, something that Google is busily working on to provide. On the plus side, the Google team seems to have been pretty efficient at getting the service back online in short order.

GoogleProblem

It hasn’t been a good week for cloud services and users could be forgiven for questioning the wisdom of moving any important application plus its data into the cloud. But then again, you can argue that the memory of users is selective and has been erased of any data about outages of internal IT systems. And you might also comment that internal IT departments have been no more capable of fast response and resolution than the cloud providers.It’s wonderful to live in a world where access to information is available all the time – until you lose that access!

Discuss this Blog Entry 1

on Sep 12, 2011
Perhaps the cloud is better than some internal IT departments and each organization should remember that nothing is perfect when considering how best, if at all, to leverage it. This is a reminder that no system, platform or technology is 100% immune to problems and in the interconnected world, there are always multiple points of failure with very few exceptions. While software and systems can be improved, whats missing here is the notion that people and policies they follow can also be improved to avoid problems. Chris Rich Product Manager NetWrix Corporation NetWrix is #1 for Change Auditing: Simple, Lightweight, Affordable

Please or Register to post comments.

What's Tony Redmond's Exchange Unwashed Blog?

On-premises and cloud-based Microsoft Exchange Server and all the associated technology that runs alongside Microsoft's enterprise messaging server.

Contributors

Tony Redmond

Tony Redmond is a senior contributing editor for Windows IT Pro and the author of Microsoft Exchange Server 2010 Inside Out (Microsoft Press) and Microsoft Exchange Server 2013 Inside Out: Mailbox...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×