Recently Microsoft support has fielded inquiries from several customers asking how to troubleshoot problems that cause delays during the boot and user logon processes on a desktop or laptop. The Windows Performance Toolkit xbootmgr.exe tool works well for troubleshooting boot and startup issues on a single machine. (See "Diagnose Shutdown Problems with Xbootmgr" for more information about using xbootmgr.exe.) But what if you're a large enterprise with thousands of desktops? You need a way to identify problem machines before a user reports them to the Help desk. You need to monitor desktop startup times over time and across the enterprise. And you need a way to collect boot time measurements, similar to those collected by xbootmgr.exe, from every machine for every boot. Here, I'll explain how you can use an event log, the new Windows 7 Event Viewer, and the Wevtutil tool to do these things.

A New Event Log to Aid in Troubleshooting
Beginning with Windows Vista, Windows now includes a new category of event logs: Applications and Services logs. The infrastructure underlying event logging now conforms to an XML schema. You can easily access the XML data for any event. The new event log interface lets you construct XML-based queries against event logs. The Event Viewer gives you to access to the new XML functionality in an easy-to-use graphical interface.

One of the logs in this new category is the Diagnostics-Performance/Operational log. This log contains events that record performance measurements similar to those provided by xbootmgr.exe. In fact, the data recorded is generated by the same mechanisms that Xbootmgr uses. Event IDs 100 through 110 record boot and startup performance statistics.

Using the Event Viewer in Windows 7
The new Event Viewer in Windows 7 lets you do more powerful filtering. The new UI lets you specify ranges of events. Under the covers, it builds the XPath query to filter out the events based on the criteria you specify. Figure 1 shows how you can filter for events 100 through 110. Figure 2 shows the XML version of the same query parameters. We're interested in Event ID 100 for the purposes of measuring the boot performance. Figure 3 and Figure 4 show the normal and XML views for event 100.

The XML presentation of the event contains a lot of interesting information. The BootTime value represents the number of milliseconds that elapsed from the time the system booted to the point after the user logged in that the system reached 80 percent idle for 10 seconds. The other time values listed represent the elapsed time for various stages during this boot process. You can find more information about the stages of the boot process in the Windows On/Off Transition Performance Analysis white paper.

The other boot performance events record information about specific events that contributed to delays during the boot/startup sequence. The trick is how to know which boot instance these messages belong to. This is where the ActivityID come in handy. In Figure 4, you'll see the following in the XML data:

 


All events related to this boot instance in this Event 100 record have this same ActivityID. By performing the following steps, we can use this information to create a more complex filter:

1. Click Filter Current Log… in the Actions Pane of Event Viewer.
2. On the XML tab, check the box Edit query manually, then answer Yes when prompted to continue.
3. Enter the following XML text into the query box:


 
   
 


4. Click OK.

Figure 5 shows a total of three events for this ActivityID. We can now examine these events to understand the problems that contributed to any boot/startup delays.

Collecting Data with Wevtutil
So far we've looked at only one boot instance. How do we collect data for all boot instances? By using Wevtutil, a Windows command-line tool for querying the Event logs. Using the following example, you can extract all the Event ID 100 records from the event log on a system:

wevtutil qe Microsoft-Windows-Diagnostics-Performance/Operational /rd:true /f:xml /q:"*\\[System\\[(EventID = 100)\\]\\]" /e:Events > boot.xml

This creates an XML data file that contains all instances of the boot performance event for a machine. With each ActivityID, you could then query for the other related events. For example, the following query extracts the same three event records we saw in Figure 5:

wevtutil qe Microsoft-Windows-Diagnostics-Performance/Operational /rd:true /f:xml /q:"* \\[System\\[(Correlation\\[@ActivityID='\\{00000001-0000-0000-1020-5CA87BB1CA01\\}'\\])\\]\\]" /e:Events > bootrelated.xml

Wevtutil.exe has many more options. Run the utility without any parameters to see a list of available options. For more information, see the MSDN article "Event Queries and Event XML." And for more information about learning XPath to define event queries, see XPath Syntax and XPath Examples.

Putting It All Together
Once you have the event data in XML format, it's fairly easy to extract the most interesting data points. Figure 6 shows some sample data I collected from one machine. In this example, I converted the time values to seconds. Differences in the number of applications that start at boot time could be significant when you're investigating changes in performance. With historical data like this, you can now begin to do some trend analysis. For example, this system was built on 1/13/2010. Application installations and configuration changes continued over the next couple of days. By 1/21/2010 the configuration changes had been completed. After that, the BootTime value was averaging about 124 seconds. However, notice that on 2/4/2010 and 2/9/2010 the times were significantly longer than average.

Extending the Value
Now that we have an automated way to extract the event data in XML form, we can collect this data periodically from multiple computers and store the results in a database. Using some simple reporting, it's easy to do trend analysis. A complete enterprise solution will require more code development and data management, but it's feasible. And that's exactly what one of my largest customers did.

Using a VBScript program I wrote, the customer collects event data into a SQL Server database. They've used this data to establish some baseline statistics for their desktop image build. They can pivot this data based on the hardware (e.g., memory, CPU, model) and software configuration. Using SQL Server Reporting Services, the customer built a dashboard view that displays the boot-time health status of all desktops in the enterprise. With specific reports, they can compare this baseline to new data collected after deploying new group policies, new security tools, or a hardware upgrade. They also use this data to proactively identify machines that take longer than the average baseline. This information allows IT to address issues before users call the Help desk, reducing the time to resolution and making end users happier.

Sean Wheeler (seanw@microsoft.com) is a senior premier field engineer in Microsoft's Premier Support team, assigned to support some of the largest enterprise customers. He is one of the original creators of the MPS Reporting tool. He specializes in scripting, debugging, and performance-related issues.