When Microsoft CEO Steve Ballmer famously said, "We're all in," when asked about Microsoft's commitment to cloud services, it was a revealing metaphor. In poker, when you go all in, you're betting everything you have. If you lose that hand, you're done. Microsoft, of course, isn't abandoning its on-premises software, nor do I expect the company to do so as long as customers want such offerings—and that's likely to be for a very long time, given how many customers are still running long-dead products such as Windows 2000 and Novell GroupWise.
But the recent service outage made me think about what this all-in strategy means, both for Software as a Service (SaaS) customers and those who haven't yet embraced the cloud. The key question in my mind is this: Could you do better yourself if you kept these services on-premises? If you make an all-in bet on SaaS, you're essentially betting that your cloud service provider (CSP)—in the case of Office 365, Microsoft—can do a better job of providing services than you can. Is that bet justifiable?
Although Microsoft hasn't publicly released details of the outage, it appears to have spanned two separate data centers in two separate physical locations. That's the first interesting aspect. If you have only one physical data center—and the majority of customers I work with have only one—then right there, the cloud has a significant redundancy advantage—provided that the other components of the system, such as the network, remain intact.
Because of the physical separation of facilities, it seems likely that the Office 365 outage was due to a network failure of some kind. Most of us have choices for Internet providers, and many customers have embraced that choice by providing multiple connections to the Internet or their private WAN . . . but how many of those seemingly redundant connections actually traverse the same physical path? If you have connections from two companies, and those connections are carried on the same fiber to your building, they're both equally vulnerable to interruption. (Fascinating fact: Internet provider Level3 Communications estimates that
17 percent of fiber outages nationwide are caused by squirrels.)
Even if the cause was a network failure, that doesn't excuse Microsoft for its service failure. Microsoft has admirably offered service credits to all affected users, and no doubt there are vigorous postmortems going on right now in Redmond, seeking to prevent this particular type of outage from ever happening again. I give them high marks for transparency because their monitoring and status tools are quite a bit better than those of most of their major competitors. However, the fact remains: If you're going to go all in on the cloud, as Microsoft suggests you do, you're betting that your CSP will do a better job of maintaining availability than you can. Be sure you know what your CSP is capable of providing and how it compares to the service level you could provide on premises.
Of course, Microsoft has hedged its bets: You can still buy Exchange Server 2010, Lync 2010, and SharePoint and deploy them in your own data center, optionally linking them to Office 365. This approach lets you hedge your bets as well, and I'll be very interested to see data on the percentage of Office 365 customers who maintain hybrid operations over the long term (as opposed to those who do so merely for temporary coexistence during migration.)
In the meantime: Watch out for squirrels.