Automating the patch quality-assurance process
Before you deploy patches, you need to test them to ensure that they don't break existing functionality on your systems. But so many patches exist for so many products and product versions that you need to automate patch testing as much as you can. Here's some advice for setting up a lab and scripting patch testing to quickly shake out the bugs with as little manual labor as possible.
Build a Test Lab
Your test lab should be a microcosm of your Active Directory (AD) forest. To make the lab affordable, you can use VMware Workstation or Microsoft Virtual PC 2004 to simulate your computer hardware, then install fully functional OSs and applications. These virtual systems can communicate over your physical network and make a real AD forest. For a tutorial about using VMware Workstation, see "VMware Workstation 2.0," February 2001, InstantDoc ID 16446.
To get your lab going, purchase at least two Pentium 4 machines with 2GB of RAM and 120GB of disk space each. You'll need ample RAM and CPU horsepower because one physical box will be hosting multiple virtual systems simultaneously. The extra disk space is for the dozens of virtual machine (VM) image files you'll eventually create. To convince management that the lab is worthwhile, prepare a cost/benefit analysis that shows the savings of avoiding even one patch-management catastrophe. And don't forget to point out the other uses for the lab, such as training new systems administrators, testing software updates, modeling Group Policy changes, and performing other software trials.
Your lab should start with at least one instance of each OS in use on your LAN, then add more VMs as necessary for different service pack levels and configurations. Install copies of mission-critical software on the appropriate VM images, and expect to later add special images for your Microsoft Exchange Server, Microsoft IIS, and Microsoft SQL Server systems.
Ideally, the lab's AD forest should have the same domains, trusts, DNS zones, and Group Policy Objects (GPOs) as your real forest, but give the domains slightly different names from your real domains in case your virtual and real forests ever need to interact, and use a small number of accounts to keep the AD database size small. You can use the new Group Policy Management Console (GPMC) to export the GPOs from your production forest and import them into your lab. For more information about GPMC and its import function, see "Windows Server 2003's Group Policy Management Console," July 2003, InstantDoc ID 39190. You can also export your Internet Information Services (IIS) 6.0 metabase and import it into your lab's Web server, as "IIS 6.0 Features," May 2003, InstantDoc ID 38496, explains. You can even restore the system state backups of production servers to the replicas of these servers in the lab.
In addition to implementing your forest in your lab, you should install your patch-deployment method—for example, Microsoft Systems Management Server (SMS), Microsoft Software Update Services (SUS), custom scripts, or third-party products. Then use this patch-deployment method to install updates in your lab.
You'll typically want to test patches in groups and roll them out in groups. You should test and apply crucial patches for your Internet-attached servers as quickly as possible, of course, but applying not-so-crucial patches one at a time is impractical. Microsoft now releases important updates on the second Tuesday of each month, so coordinate your testing and deployment cycle with that schedule.
System and Network Tests
After your lab is up and running, you can begin testing new patches. The strategy is to run a script that will perform a variety of tests, pipe the script's output to a text file, apply the patches, run the tests again, pipe their ouput to a second file, then compare the two files for differences. The two output files should be identical. If they're different, it means something changed after you applied the patches and you need to investigate.
Your scripts should first test for basic network connectivity. Run baseline.bat, which Listing 1 shows, on the test machine to be patched and redirect its output to a text file with a command such as
baseline.bat > before.txt
Then apply the patches, reboot, and run the batch file again, redirecting its output to another file:
baseline.bat > after.txt
By using the file-compare command to compare the two output files
fc.exe /l /n before.txt after.txt
you can quickly determine whether changes have occurred. Feel free to add more tools to the batch file to expand the scope of its coverage, but make sure you filter the output of these tools with a utility such as findstr.exe so that no irrelevant differences get into the output files to cause false positives. If you'd prefer to use a graphical tool instead of fc.exe to compare the output files, try windiff.exe from the Windows 2000 Server Support Tools.
The gpresult.exe command in baseline.bat is specifically intended to produce different output in the before.txt and after.txt files. This tool gives the exact time that the machine last processed GPOs. This information will be helpful when you check whether Group Policy is operating properly (more about this later).
Another tool to notice in baseline.bat is netdiag.exe, a Support Tool that automatically performs more than a dozen networking tests. Try running it in verbose mode (by using the /v switch) to show a wealth of configuration and troubleshooting information when a bad patch causes connectivity problems.
After you run baseline.bat on your lab machine, run remote.bat, which Web Listing 1 (http//:www.winnetmag.com, InstantDoc ID 41979) shows, from a remote system to test the lab system's remote manageability. The remote system must be running Windows Server 2003 or Windows XP for all the tools in remote.bat to run, but the target system can be running Win2K. Remote.bat uses rpcdump.exe to verify remote access to all remote procedure call (RPC) endpoints, wmic.exe to test Distributed COM (DCOM) access to the Windows Management Instrumentation (WMI) service, schtasks.exe to interact with the Task Scheduler, net.exe to map a drive letter to the C$ administrative share, and mstsc.exe to connect to Terminal Services/Remote Desktop with RDP for a thin-client session. For the RDP thin client, create and save a file named target.rdp that contains the IP address of the target machine, a username, and a password, then pass the filename to the client as the first argument. When the batch file runs, visually confirm that you're automatically logging on to the remote desktop when the thin-client window appears. Then, as you did with baseline.bat, create before.txt and after.txt files on the remote system and use fc.exe to compare them.
Finally, if you use IP Security (IPSec) on your production LAN, you should also use it in your lab. For information about how to configure IPSec, see "IP Security in Windows 2000," March 2000, InstantDoc ID 7976. If you use IPSec for VPN access, make sure you install the Network Address Translation-Traversal (NAT-T) upgrade from Microsoft (as described in the Microsoft article "L2TP/IPSec NAT-T Update for Windows XP and Windows 2000,"http://support.microsoft.com/?kbid=818043). In addition to testing patches that might affect the IPSec driver, plan to test IPSec configuration changes in the lab before deploying them on your production LAN—making a configuration mistake that leaves hundreds of machines stranded (and you without a job) is all too easy.
Crucial Applications and Services
In addition to network connectivity and remote manageability, you must test your mission-critical applications and services, including Group Policy processing. Begin by enabling on the machines in your lab all the logging of which your applications and services are capable; for example, enable all audit policies and turn on debug logging in DNS, RRAS, Certificate Services, SQL Server, and Exchange Server. Enable this logging so that if a patch breaks something, you'll have searchable audit logs that you can use to detect and diagnose the problem.
To write extensive Group Policy information to the Application log, set a REG_DWORD value named RunDiagnosticLoggingGroupPolicy to 1 under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Diagnostics registry subkey (you'll likely need to create both the value and the Diagnostics subkey), as the Microsoft white paper "Troubleshooting Group Policy In Windows 2000" (http://www.microsoft.com/windows2000/docs/gptshoot.doc) discusses. If you need to log even more detailed Group Policy information, you can also enable user environment logging, as the Microsoft article "How to Enable User Environment Debug Logging in Retail Builds of Windows" (http://support.microsoft.com/?kbid=221833) describes.
After enabling logging on all these services, check the logs to confirm that everything is working fine, then clear the logs and apply your patches. A quick way to erase a textual log file (not an event log) or to create a blank text file is to run the command
echo 1>nul 2>c:\logfile.txt
in which 1>nul redirects the standard output of the Echo command to nothing and 2>c:\logfile.txt redirects the standard error output to the specified file, overwriting its contents if it already exists or creating the file if it doesn't exist.
To clear the Windows event logs, use a batch file to run the commands
cscript.exe ClearEventLog.vbs 127.0.0.1 Application cscript.exe ClearEventLog.vbs 127.0.0.1 Security cscript.exe ClearEventLog.vbs 127.0.0.1 System
to invoke ClearEventLog.vbs, which Web Listing 2 shows. The first argument in each command is the IP address or name of the computer (local or remote) whose event log you want to clear, and the second argument is the log name. If you have event logs for DNS, AD, or other services, clear those as well.
Now that you've cleared the logs and have a clean slate, you can install the patches on the test machines and start running applications and stressing services to try to cause errors. Run all the features that users invoke as they work, with special emphasis on features that involve the network, printing, security controls, file system access, and anything else that's likely to break (perhaps because it's been fragile in the past).
For example, you could write a batch file that would execute the command
to launch Microsoft Word and have the application automatically open file.doc. In file.doc, create a macro named AutoOpen that prints the document to a remote printer, saves a copy of the file to a shared or Web Distributed Authoring and Versioning (WebDAV) folder, renders the document to a local Adobe Systems' Adobe Acrobat .pdf file, and so on. Naming the macro AutoOpen forces Word to automatically run it when Word opens the file that contains it. Creating Microsoft Office macros typically requires no programming; for example, in Word XP and Word 2000, select Macro from the Tools menu, select Record New Macro, name the macro AutoOpen, store it in the current document (not in the template), then perform the desired tasks. Word will record your keystrokes and mouse clicks for you. In Microsoft Excel XP and Excel 2000, name the macro Auto_Open to have Excel run the macro when the application opens the file that contains the macro. Check whether your non-Microsoft applications have similar automation features.
Find out what command-line arguments your applications support and leverage those arguments as much as possible in your batch files. Some applications expose a lot of their functionality in command-line switches. However, even if you need to test functionality that can be entered only by hand in a graphical interface, you're not out of luck. The keystrokes.vbs script that Web Listing 3 shows demonstrates how to use the SendKeys() method to send keystrokes to a graphical application. Pay attention to the commands you type as you test a GUI program—don't forget that the Alt key can be used to pull down menus—then have your script enter the same commands.
As you can see in keystrokes.vbs, the AppActivate("App Title>") method gives the application with App Title in its title bar the input focus to ensure that the correct application receives the script's keystrokes. The Sleep(Milliseconds) method helps the application keep up with the script by causing the script to wait the specified number of milliseconds before continuing to execute. Comments in keystrokes.vbs spell out the keystrokes for special and frequently used keys and key combinations. With a little trial-and-error debugging, you can coax most graphical applications into running hands-free.
You can use other scripts to stress services remotely. To do so, run applications on remote systems that interact with the service that you want to test on the target lab machine. For alphabetical listings of available tools and their descriptions, see the Help files that accompany the Support Tools and Microsoft resource kits. To test IIS servers, investigate load simulators such as Application Center Test (ACT), which the Microsoft article "INFO: Application Center Test (ACT) Tests Your Web Applications by Simulating Load" (http://support.microsoft.com/?kbid=307492) describes.
When you run your batch files and other scripts to give your services a workout, some errors, such as a blue screen of death or a frozen mouse pointer, will be easy to spot. But some malfunctioning services will just silently record their errors in one or more of your logs. Reviewing these logs manually is too tedious and error-prone to be practical. Instead, search the logs with tools or scripts that can automatically extract errors, warnings, and other interesting tidbits.
Text-file logs are the easiest to search. The Windows findstr.exe utility extracts from files lines that match patterns you specify, including basic regular expressions. To begin, examine the text logs typically produced by the service you're testing and identify all the strings (such as Fail, Error, or 500) that indicate problems. Save these strings, one per line, in a text file named signatures.txt. (You'll have a separate signatures file for each type of log you regularly search.) Then, use the command
findstr.exe /g:signatures.txt logfile.log > output.txt
to search a text log for all the lines that have at least one of these patterns and redirect them to an output file. If the output.txt file is larger than 0 bytes, you have a hit and you need to investigate the problem. For more information about findstr.exe, see the Command Line Reference in Windows Help. For a regular-expressions tutorial and tools that are much more powerful and user-friendly than findstr.exe, go to http://www.regular-expressions.info/tutorial.html.
Convert the binary Windows event logs to plain text and also search them with findstr.exe. The Microsoft Windows 2000 Resource Kit's dumpel.exe utility converts event-log data into tab-delimited text that you can pipe into findstr.exe. For example, dumplog.bat, which Web Listing 4 shows, contains the commands to extract only the error and warning messages from the Application and System logs and only the audit failures from the Security log. The /R switch in the script indicates that the weird-looking search string is a regular expression. You can also use XP's built-in EventQuery.vbs to dump just the errors and warnings, or just the failure audits, as Web Listing 5 shows. Be aware that if you enable Group Policy debug logging (as I mentioned earlier), you might get a lot of error events from the Userenv source in the Application log, so you might want to turn debug logging off or at least filter out those entries. Also, remember to change your default Windows Script Host (WSH) from wscript.exe to cscript.exe when working extensively in command windows; to do so, type the command
cscript.exe //h:cscript //nologo //s
As part of your testing, you should also confirm that you can remotely verify that your patches have been installed successfully. Querying the remote system's registry to see whether a patch has been applied sometime in the past (e.g., by using srvinfo.exe from the Win2K resource kit) isn't enough; your audit tool must be able to check the file versions on the hard disk.
If Microsoft's downloadable XML database of patch information happens to include the patches you're testing, this check will be easy because many popular tools (e.g., Shavlik Technologies' HFNetChk, Microsoft Baseline Security Analyzer—MBSA) use this database. But if the patches you're applying aren't in the XML database, you'll need to use SMS or custom scripts to check the versions of the updated files on the disk. For this task, you could script the use of the filever.exe program (from the Support Tools for Windows 2003, XP, or Win2K), which can display verbose file-version information. If you don't want to make this effort for noncritical patches, at least audit the important ones.
Some companies give away special-purpose audit tools for the worst of the network-threatening vulnerabilities—for example, Microsoft's scanners for the Blaster and Slammer worms, described in the Microsoft articles "How to Use the KB 824146 Scanning Tool to Identify Host Computers That Do Not Have the 823980 (MS03-026) and the 824146 (MS03-039) Security Patches Installed" (http://support.microsoft.com/?kbid=827363) and "SQL Server 2000 Security Tools" (http://support.microsoft.com/?kbid=813944) and the eEye Digital Security scanners for Nimda and CodeRed (http://www.eeye.com/html/research/tools). But don't count on such tools always being available or always being released quickly. "Security Groups as PC Status Containers," April 2004, InstantDoc ID 41834, describes a method of scanning computers for a particular hotfix or security patch.
Just in case a patch is faulty and must be uninstalled, test your rollback method as well. You don't want to use the Control Panel Add/Remove Programs applet to uninstall a patch one machine at a time; thus, you should choose a patch-management solution that automates patch rollback. If your solution doesn't support patch rollback, you can at least remotely trigger the uninstallation.
When installed, most Microsoft patches create a folder named %system root%\$ntuninstallpatchnumber$\sp-uninst, where patchnumber is the number of the Microsoft article that documents the patch. In that folder is spuninst.exe, which can take the -u command-line switch to uninstall the patch, the -q switch to close all applications, and the -f switch to reboot the system. You can execute this program with all the necessary switches through a variety of methods; for example, you can use a custom script pushed out through Group Policy, use schtasks.exe to remotely schedule a job to do it, use a WMI script (such as exec.vbs from the resource kit) to execute commands on remote systems, or perhaps use Sysinternals' PsExec. Sometime this year, Microsoft will start releasing .msi patches that use Windows Installer 3.0, and these will support rollback.
After you remove a patch, you should employ the scripts and techniques I've described above to verify remote manageability. Installing a patch can cause problems, but trying to uninstall one can kill a box, and you'll definitely want to discover this truth in the lab rather than in the real world.
After all the lab work is done, the time has come to do clinical trials on human subjects—that is, to try out new patches on a subset of your production systems before rolling the patches out to everyone. Software developers and other IT-savvy people make great guinea pigs because they're more sensitive to subtle problems that occur and they might be working with new code or applications that will soon be deployed. You'll need at least a dozen machines because you'll need to have a representative sampling of the OSs, service pack levels, applications, and services on your network. Push out patches to these machines first, monitor for problems and complaints, then start rolling out the patches to the rest of your enterprise in stages if all goes well. Some will argue that you don't have time to test crucial patches in this way, but you can shorten the "live animal testing" to just a few hours if pressed for time because show-stopping problems typically surface immediately after you apply faulty patches.
You should also conduct clinical trials for your production servers, but do a full backup of the guinea pig servers first. Select one of each of your types of servers (e.g., Web, mail, domain controller—DC, print server) and apply the patches to just those machines one at a time. For 24 × 7 service availability during frequent patching, failover clustering and the Windows NT Load Balancing Service (WLBS) are a big help.
Don't forget that others are facing the same patch-management problems as you are. Browse newsgroups dedicated to the products you manage (one place to find such groups is at http://groups.google.com), and subscribe to patch-related or security mailing lists such as NTBugtraq (http://www.ntbugtraq.com) or the patch-management list at http://www.patchmanagement.org. You can learn a ton from your peers in the IT community, and when your testing discovers faulty patches, you can share your findings and advice with them too.