Impressive Office 365 uptime data means more pressure on on-premises IT managers

Microsoft has published a pretty good set of numbers for Office 365 uptime achieved in the last four quarters. Although difficult to compare SLAs across cloud platforms, Microsoft's performance - and that of Google - pretty well puts to bed the concern that cloud applications are unreliable. A little bit more pressure descends on the shoulders of IT managers who'd like to make the same kind of SLA numbers but doesn't have the same resources to throw at the problem.

I guess that we should all be dazzled by Microsoft’s proclamation about “Cloud Services that you can Trust: Office 365 availability”. The numbers for availability look good and the statements lists some impressive commitments to continued performance, all of which is good news. Then again, the suspicion might arise that Microsoft is hinting that some cloud services exist that shouldn’t really be trusted, which quite takes the gloss off the whole thing. Perhaps they are referring to the cloud services that have been penetrated by PRISM, something that Microsoft’s General Counsel Brad Smith empathically denies to their “enterprise email and document storage”, which I assume to refer to Office 365.

But in any case, the news reported in the blog post is good for Office 365 (which is of course the reason why the post is there) because it describes the performance against SLA in the form of uptime numbers for Office 365 in the last four quarters:

July 2012 October 2012 January 2013 April 2013
99.98% 99.97% 99.94% 99.97%

These are very impressive numbers that will, no doubt, be compared to the data for Google’s competing cloud suite. As always with numbers, you have to be sure that you compare like with like. For instance, since January 2011 Google, does not include scheduled downtime in its SLA calculation. Microsoft promises an SLA of 99.9% and says that services like Exchange Online or SharePoint line have no scheduled downtime, so any faltering of these services immediately impacts their numbers.

At first glance, Google’s SLA definition is much simpler than Microsoft’s, perhaps evidence that corporate lawyers have more influence over Office 365 contracts than their Google counterparts, but more likely reflecting the more complex nature (in a good sense) of Microsoft's offerings. I was surprised that the SLA doesn't cover Postini as message hygiene filtering is a pretty fundamental part of an enterprise email system.

Gmail promises 99% uptime and trumpted its achievement of a 99.984% SLA in 2011. However, Google's definition of SLA measurement contains an odd qualification: ""Downtime"; means, for a domain, if there is more than a five percent user error rate. Downtime is measured based on server side error rate". This could be construed as a get-out clause to allow Google to avoid accruing downtime if less than five percent of its users are affected. Five percent seems small but it can be a pretty large number in cloud terms. For instance, if four percent of Gmail's users were having a problem, then Google would register no downtime even though some 17 million users were affected (taking 4% of the 425 million Gmail users reported in 2012). How odd!

Google has been quieter about their SLA data recently, perhaps because of some recent problems such as the 40-minute outage on July 10. Perhaps it’s my inability to use Internet search tools that let me down, but I wasn’t able to track down any more recent reports of Google performance against SLA since 2011.

If only because we don’t have all the data necessary to make an apples-to-apples comparison, understanding the finer points of SLA measurement and reporting for Office 365 and Google Apps can be complicated. There’s also a big difference between a problem in a cloud datacenter that absolutely will affect the SLA and problems that arise from Internet or local network connectivity that prevent users getting to a cloud service. These problems do not count when SLAs are measured, even if the user perception is that the “service is down”. Cloud vendors cannot be blamed for excluding the Internet from their calculations as they exert no control outside the boundaries of their datacenters.

The complexity of SLA calculations doesn’t take away from the fact that the fears that many had that cloud services would be unreliable have proven to be unfounded. I’ve used Office 365 since its official launch in June 2011 and apart from some initial hiccups it’s been as reliable as I could have wished. I don’t use Gmail as much as I used to, but my perception is that it’s a reliable service too. Don't get me wrong - cloud outages happen all the time. For example, last week Outlook.com experienced problems for seven hours while Google was down for a few minutes. When cloud outages occur, they tend to affect millions of people and are awfully public. By comparison, an IT outage inside the boundaries of a single company affects just that company's users and is hardly ever revealed outside.

Given that the cloud services have delivered excellent performance against their published SLAs, the biggest problem that has arisen is the pressure created on on-premises IT managers to deliver the same kind of reliable and robust services from in-house systems. Sure, they don’t have the kind of resources that Microsoft or Google dedicate to their datacenters and a lot more of their work is likely to be manual instead of the automated processes used to deliver cloud services (automation is fundamentally necessary to achieve the economics of cloud services). However, these facts are unlikely to be given much importance when weighed by CIOs who compare the costs of in-house delivery against those for cloud services. Lower cost and better performance is a difficult duo to argue against – and that’s why successful cloud services create problems for IT managers.

Follow Tony @12Knocksinna

Discuss this Blog Entry 4

on Aug 22, 2013

Apples to Apples ? Switching camps :)
To be fair Outlook.com and soon to be renamed SkyDrive were out for more than hours ..., but Cloud is in.
I cannot imagine how at least smaller companies can justify on prem.

on Aug 30, 2013

This past week has proven this data to be massaged, manipulated or out right false.

We have seen 3 service disrupting outages since Monday totaling 8.5 HOURS. We are only 1.5 hours shy of blowing the annual SLA of 99.9%.

Where is the reporting on this? Where is the reporting on Microsofts decision to perform upgrades and maintenance during core business hours? Where is the reporting on Microsoft Support's efforts to mask, deceive and out right lie about the scope of a service disruption?

We migrated from an On-Prem Exchange environment under the misguided belief that Microsoft could do Exchange better than we could. This has been the most painful week I have had with Exchange - ever. Why? Because with On-Prem I can report to my Exec's and Employees what is happening - with Office 365 I cannot report anything other than "Microsoft is having a Service Disruption... again."

on Aug 30, 2013

@morgannelson, I heard about these outages via a ZDNet post http://www.zdnet.com/office-365-outage-thursday-night-7000020069/ but didn't experience any problems myself (I use Office 365 all the time). It seems to have been a localized outage. I also know of some specific problems that certain companies are having with Office 365 that are forcing them to reconsider plans. I guess the net-net is that using a "pile 'em high, sell 'em cheap" one-size-fits-all service is not for everyone. If it doesn't work for you, then you can consider staying on-premises and control your own destiny or look for a hosted provider that will give you the kind of support that you want. It will cost extra, but I think you'd be happier.

on Aug 30, 2013

And BTW, the SLA might not be affected by the outages that are reported. If the problem is with your company's IT infrastructure or is due to some Internet hiccup, it doesn't count. The only hits taken by the SLA is when Microsoft has a problem that they acknowledge within their datacenters. All part of the new world of cloud services.

Please or Register to post comments.

What's Tony Redmond's Exchange Unwashed Blog?

On-premises and cloud-based Microsoft Exchange Server and all the associated technology that runs alongside Microsoft's enterprise messaging server.

Contributors

Tony Redmond

Tony Redmond is a senior contributing editor for Windows IT Pro and the author of Microsoft Exchange Server 2010 Inside Out (Microsoft Press) and Microsoft Exchange Server 2013 Inside Out: Mailbox...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×