Microsoft made a change to the Office 365 configuration that caused the Exchange 2013 Hybrid Configuration Wizard (HCW) to fail. Changing software can introduce new problems at the best of times, which is the reason why we are so careful about testing updated code before introducing it into production. It seems like the testing was deficient in this situation. Hopefully Microsoft can learn from the problem and a similar one won't happen again.
The publication of KB2988229 on July 30 and an accompanying EHLO post provided formal acknowledgement that a problem exists with the Hybrid Configuration Wizard (in all supported versions of the product). The problem causes the wizard to crash when it attempts to set up the connection between an on-premises Exchange organization and an tenant or to edit details of an existing connection.
Microsoft has individual updates (patches) available for Exchange 2013 SP1/CU5 and a permanent fix is included in CU6, which is due to be available soon. You cannot get the fix from a web site and have to contact Microsoft support to obtain the code. Be prepared for the normal interrogation about the configuration of your on-premises servers before the mists clear and the support team realizes that you need the patch.
The HCW crashes when it runs a task to check prerequisites within Office 365 and signals the error “CheckPrereqs execution failed: Check Tenant Prerequisites.” I have heard that other faults are signalled but this is the basic problem. It's interesting that the problem does not affect the Exchange 2010 variant of the HCW. Perhaps Exchange 2010 needs to know less about an Office 365 tenant before its HCW can create a connection.
As it turns out, Microsoft made a change within Office 365 that caused the HCW to backfire. I know this for a fact because I have been told so but it’s also easy to deduce based on the circumstances, the error messages, and the interaction with support personnel. The thought then turns to why a problem like this would emerge in an environment that has supported the HCW very well for a number of years.
All software is prone to unforeseen consequences when it is changed and Office 365 is no different. As we know, for competitive reasons and to provide customers with requested functionality, the service is in a state of constant change. Microsoft made a big thing about a new approach to Office 365 change management at TechEd and through the subsequent release of the Office 365 roadmap. These very good developments are designed to keep customers informed about the ever-changing vista of Office 365 so that each customer can integrate updates into their plans.
Unfortunately, as evident in the unexpected and unannounced appearance of updates since the new plan was revealed, some of the Office 365 engineers seem to be still unaware of the need to keep customers informed. This is a pity because it undermines the essential goodness of the roadmap.
You can consider the change that has affected the HCW to be another example of an unexpected update. However, I’m not so sure that this is in the same category because I am pretty convinced that the problems we are seeing are the side-effect of some internal configuration changes that were never expected to be seen by customers.
I don’t have a problem when Microsoft tweaks configuration and other internal settings to make Office 365 run more smoothly and effectively. I do have a problem when the tweaking seems to have been done without sufficient testing. Because the HCW is so badly and obviously affected by the update, a very strong suspicion is formed that the change was not tested in any real sense. If realistic testing had been done, the problem surely would have surfaced.
The other nagging doubt I have about how this particular affair has evolved is Microsoft’s reaction. Given that the issue is in a component that only Microsoft controls, it seems that the best course of action would have been to reverse the change and restore Office 365 to its previous functional configuration. At a stroke, the need for multiple customers to call support and engage in a costly to-and-fro exercise between customer and support (for both sides burn time on telephone calls) before eventually getting patches would have been eliminated, not to mention the frustration for many administrators who attempted to run the HCW and ran into this problem. After reversing course, Microsoft could have fixed the bug and re-introduced the changed configuration, perhaps alongside the release of Exchange 2013 CU6.
But life is never quite as simple or as straightforward as it appears. Office 365 is a massive multi-tenant environment that supports millions of users. It is entirely possible that the configuration was updated to support another component and that the effect on HCW came as a huge and unexpected shock to the product group. The triage that is done to assess the severity of all bugs then had to weigh the various courses of action that were available – from withdrawing the update to issuing patches to affected customers.
Microsoft made their choice based on the information available to them. It’s unfair to second-guess them at this remove because we do not have the same information. However, it is fair to ask how such a change could have been introduced without warning and without sufficient testing. I bet some hard questions were asked in Redmond as this episode unfolded.
Follow Tony @12Knocksinna