Microsoft really came out in the open at MEC and discussed how Exchange Online is run at massive scale inside Office 365. I thought this was great and lapped up all the details that were on offer. Although some might find it strange, being open about the challenges and issues involved in running so many servers at such a high SLA drives the credibility of the service and makes Office 365 come alive. More, I say!
I have usedsince its launch in June 2011 and followed its development closely since. After two initial hiccups in August and September 2011, both seemingly rooted in manual intervention, the service has delivered an impressive record of reliability and availability. Despite this, some in the industry are still wary of moving utility applications such as email to the cloud.
It is true that developments such as the PRISM revelations have thrown new concerns into the cloud mix. However, I think that a perception that Office 365 is a black box where you hand over all control and receive back application services is a nagging concern for many. If you don’t understand what’s happening then you cannot appreciate how the service functions. In short, a black box is less likely to be trusted than something which is open and obvious.
It was therefore very interesting to me that Microsoft made a real effort at the recent Microsoft Exchange Conference (MEC) to disclose details of Office 365 operations, chiefly through a compelling presentation by Vivek Sharma on how Exchange Online functions, but also through many other statements and insights offered by presenters in other sessions.
Of course, another trend during MEC was the focus on Office 365 as the place where new stuff happens, where new features appear much sooner than they can be provided to on-premises customers because Microsoft controls the infrastructure and, if you use Outlook Web App (OWA), the client interface too. So you could argue that Microsoft was revealing more about its Office 365 operations to offset a growing realization within its customer base that Office 365 is preeminent when it comes to engineering investments and new developments.
But I don’t think so. Instead, I think that talking about Office 365 operations simply reflects the growing maturity of the service. There’s a lot of good stuff here too: a focus on standardization, ruthless automation, a solid workflow management system, and attention to detail. All of these are essential attributes when you deal with an ever-growing pool of servers (now over 100,000 Exchange servers) and a growth rate of 600% since 2012, managing petabytes of logs, or indeed handling 10K or more hardware issues (mostly disk failures) a month. As Vivek reminded everyone, “a .01% error rate when dealing with a billion transactions daily can be catastrophic” and “every action has consequences for a vast number of users.”
Most people will never visit an Office 365 datacenter. Information of the type presented at MEC increases the credibility of cloud services because it helps people understand the investment that has been put in place in both money and software engineering to create the fabric that allows the service to function. Knowing some details of components like the Data Insight Engine that processes between 100 and 500 million events an hour to locate, analyze, and surface issues happening inside Office 365, is a rare chance to gain an appreciation of what’s going on. It also helps you understand why Microsoft has invested in the Managed Availability subsystem in Exchange 2013 (a great example of technology transfer from cloud to on-premises). Automation is everything because humans are prone to error and errors at scale are truly horrible events.
Microsoft has already released the presentation and recording of “Office 365 at scale” to MEC attendees. This content will be available to everyone soon. If you can, get a hold of both and spend 75 minutes listening to the recording. It might just open your eyes to what happens inside Office 365.
Follow Tony @12Knocksinna