I'm continually amazed at the number of bugs we encounter in the software that we install, configure, and maintain. Recently, I read about a Carnegie Mellon University research study that attempted to analyze why software contains so many problems. After analyzing software products for 5 years, researchers concluded that an average of 100 to 150 errors exists in every 1000 lines of code. Thus, an OS containing 20 million lines of code presents a potential 200,000 to 300,000 coding errors, many of which can negatively affect form, function, security, and reliability. The study attributed this high number of errors to complexity, poor design, and insufficient testing.

The number of bug fixes on Microsoft's Windows 2000 post-Service Pack 2 (SP2) list has reached 854 and doesn't include any of the security hotfixes Microsoft has released during the 13 months since SP2 went public. I checked the new/modified Server Operating Systems Technologies bug list, which covers a 5-week period from June 1, 2002, through July 7,2002, and discovered a total of 288 post-SP2 bug fixes, allocated as follows:

Application Center     1
ISA Server 2000        9
Windows NT             43
Windows 2000           187
Windows XP             48

If you support all five of these platforms, you’ve inherited a large bucket of potential problems during the past 5 weeks. If you toss Microsoft SQL Server, Exchange Server, and Systems Management Server (SMS) bug fixes into the mix, the list becomes almost too large to contemplate. Then, add security hotfixes. Microsoft's Critical Issues page lists the 45 security hotfixes the company has released since January 2002. Some of the hotfixes on this page apply exclusively to Windows .Net Server (WinNET Server) systems and Mac products and don't appear on the Security Bulletin page. Of the 60 hotfixes Microsoft released during 2001, half apply to the post-SP2 version of Win2K, which debuted May 16, 2001.

These numbers should send three strong messages to your organization: First, you need a test facility where you can duplicate all the configurations, including legacy platforms, you support in the production environment. Second, you must thoroughly test service packs, code fixes, and security hotfixes, individually and in combination, before you deploy them. Third, you significantly reduce the potential for serious problems when you staff the test facility with highly skilled engineers who follow rigorous testing and troubleshooting methodologies.

When vendors don't identify and correct the bugs, the job falls to the consumers. The mystifying factor is why we tolerate such a poor level of quality in these products in the first place. We need these products to keep our global networks operational. So why the complacency? It’s not just job security, is it?