On July 27, 2011, the Microsoft Exchange Server development group closed out an embarrassing episode with thererelease of Rollup Update 4 (RU4) for Exchange 2010 Service Pack 1 that was first released on June 22. You can download the new version of RU4 from Microsoft Support. This version replaces the previous version of RU4 plus the interim patch that Microsoft rushed out after discovering the problem in RU4 that caused them to remove it on July 13. The resulting embarrassment was compounded by the fact that Microsoft had also to remove the previous rollup update (RU3) in March after a problem was found with duplicate messages on Blackberry devices.

About 90 minutes after Microsoft announced the rerelease of RU4, Kevin Allison, the General Manager of the Exchange development group, posted some details of the problem that caused Microsoft to withdraw RU4 to the EHLO blog. Kevin explained that the root cause of the problem was an attempt to fix a bug that prevented deleted public folders from being recovered. The fix exposed some code in Outlook that caused problems when clients moved items. Microsoft's explanation given when they withdrew RU4 said:

A small number of customers have reported when the Outlook client is used to move or copy a folder that subfolders and content for the moved folder are deleted. After investigation we have determined that the folder and item contents do not appear in the destination folder as expected but may be recovered from the Recoverable Items folder (what was previously known as Dumpster in older versions of Exchange) from the original folder.

Kevin then attempted to address the concern that many customers have expressed since the recalls of RU3 and RU4. Why didn't Microsoft's extensive regression testing pick up a problem in what is after all a pretty fundamental operation? Kevin noted that the Exchange team uses a suite of well over 100,000 automated tests to validate code. It's understandable that such an array of automated tests is used because otherwise it would be impossible to test a product such as Exchange that is deployed in so many different circumstances. Kevin said that the tests are supplemented with manual validation where necessary and admitted that the downside of depending on automated testing to such a degree is that scenarios that occur outside the boundaries of the testing might not pick up a lurking bug. In this case, the automated testing exercised move and copy functions but didn't emulate the code used by Outlook for the same functions.

I guess that a risk will always exist that testing won't catch bugs and that a software development group might become lulled into a false sense of protection when they look at successful results generated by over 100,000 automated tests. However, an irritating niggle in the back of my mind makes me think that this kind of problem should have been found by manual tests. Specifically, don't the Microsoft engineers use Outlook to move items and folders in their day-to-day work?

Ah, but then we realize that Microsoft lost interest in public folders long ago and that the company's engineers are extremely unlikely to use public folders in their work, so they could never find such a problem even if they ran development code for months before releasing it to customers. The same is probably true of the problem that caused Microsoft to withdraw RU3. How many Microsoft engineers use BlackBerry devices when they can use Windows Phone 7 phones? So the bugs creep through in scenarios that Microsoft cannot anticipate or test manually through normal use.

Kevin also explained that the RU4 bug is a legacy of a change made in the Exchange 2003 Information Store. That code was carried forward into Exchange 2007 but dropped in Exchange 2010 because Microsoft introduced the RPC Client Access service to provide a new MAPI endpoint for Outlook clients. This fact demonstrates just how difficult it is for engineering groups to preserve immaculate backwards compatibility when they make important strategic changes to their product. However, you still go back to the point that regression testing should have picked this issue up far earlier—maybe during the original development cycle for Exchange 2010 way back in 2008–2009.

To be fair to Microsoft and the Exchange team, they have done a good job of communicating to their customer base and acknowledged that they have to do better in the future. They bit the bullet and withdrew RU3 and RU4 in a very public manner when other software development groups might have sought to make the necessary updates behind the scenes, away from the interested gaze of public opinion. This is the way things should be done, and I wish other product groups were as honest and open.

Finally, Kevin's note says that they have conducted a top-to-bottom review of the process used by the Exchange team to triage, develop, and validate changes for rollup updates and service packs and are making some improvements. Interestingly, Kevin says that the Exchange team is now working closer with the Outlook team to use their automated testing tools against new versions of Exchange. Given that Outlook has been the premier client for Exchange since the first release of Outlook in 1998, one wonders why it has taken 13 years for the two teams to take this step together.

The problems with RU3 and RU4 have impacted customer confidence in the testing process used to validate new versions of Exchange. No one in Redmond has been covered in glory. It's now up to the Exchange development team to earn back the confidence and trust of their user community with an increase in quality and reliability in future releases. With Exchange 2010 SP2 on the horizon, now's a good time to start