Using counters to diagnose your system's health

In March, I showed you how to start Windows NT's Performance Monitor, add counters, and save settings. I also mentioned a few of the critical counters. In this article, I'll look at those and other counters, and give you more details about when to use them. The aim is to evaluate the overall health of your system and network. This article will not give you a comprehensive list of available counters. Rather, it introduces how to use counters and focuses on those counters that are most useful to the systems administrator and power user.

Objects
Any discussion of counters must start with a description of the objects that generate counters. Common objects, such as Processor, are in the Performance Monitor selection list on all NT installations. Table 1 lists those Performance Monitor objects. Optional objects appear only if you choose certain options during setup. For example, you will see the NetBEUI object only if you select NetBEUI as a protocol to install. And the last group of objects, add-ons, are objects that you add to Performance Monitor with other software, not with NT.

First, let's examine the optional and add-on objects that you can track. For each object, I'll discuss some specific counters. You can start Performance Monitor on your system and follow along.

Optional Objects
Some objects appear in Performance Monitor only when the associated service or process is running. I'll highlight a few of these optional objects.

The Browser measures the various Browser Service transmissions. This object is relevant only if the computer is a browser or a potential browser. The Browser Service gives you the list of resources available on the network. When you use the Map Network Drive option from the My Computer icon, the Browser Service is responsible for displaying shared directories.

The Server Service is the complement to the Redirector. It makes resources on the local computer available to other users across the network. Therefore, the Server Service does not need to run on a workstation that acts as a client computer and doesn't share any of its local data or printers with the rest of the network.

Depending on the protocols and network services you have installed, you'll see one or more network objects on your Performance Monitor list, such as Appletalk; Client Service for NetWare; IP, TCP, Network Interface; NetBEUI and NetBEUI Resource; NWLink IPX, NWLink NetBIOS, and NWLink SPX; and RAS Port and RAS Total. Each network object measures multiple counters; overall, they monitor the network throughput.

Add-On Objects
Add-on objects are associated with software other than NT. For example, Microsoft SQL Server adds several objects, which you'll see only when SQL Server is running.

Counters
As I mentioned in March, Performance Monitor has about 350 counters. You'll probably use only a few of them on a regular basis for ongoing system monitoring and keep the rest for troubleshooting and tuning. Some counters are for programmers to use only in debugging and optimizing applications.

A good place to start is the default counter for each object. When you select an object, notice that the highlighted counter is not necessarily the first one in the list. Instead, the highlighted counter is the one that the NT developers thought would be the most useful. For example, when you select the Cache object, the default counter is Data Map Hits %, as Screen 1 shows on page 159. The following paragraphs highlight a selection of counters that you might find useful.

The Data Map Hits % counter under the Cache object shows how often requested data was found in the cache. This counter means you can retrieve the data rapidly from physical memory instead of having to read from the disk. A consistently low value, say below 80 percent when the system is very busy, can signify insufficient memory available.

The Average Disk Queue Length counter under the LogicalDisk object measures the average number of read and write requests that were queued for the selected disk during the sampling interval. A value greater than 1 or 2 indicates a potential bottleneck at the disk, and processes are forced to wait on disk access. Further investigation is in order before you can be sure the disk is the problem. Your system might have insufficient RAM, resulting in constant paging from memory to disk and back again. Resolve memory shortages before deciding that you have a disk problem.

The LogicalDisk object's Avg. Disk sec/Transfer counter shows how long, in seconds, the average disk transfer takes. On its own, this counter might tell whether you have a fast or slow disk, although the value will vary depending on the type of data you are processing. The actual value for short files will be in tens of milliseconds, which will show as 0.0nn seconds. If all your disk counters are zero, use the diskperf - y command to make sure that you turned on disk monitoring, as I described in March.

One powerful way to use Performance Monitor is to combine values from different counters. Suppose you have an Avg. Disk Queue Length of 3 and an Avg. Disk sec/Transfer of 0.033. With three requests (each taking about 33 milliseconds) in the queue, about 100 milliseconds (one-tenth of a second) will pass before the system can process a new request. This calculation gives you an idea of the delay that waiting on your hard disk causes.

The Pages/sec counter under the Memory object tracks the number of pages (a page is 4KB) read from or written to the disk because of page faults. A page fault occurs when a process requests data or code that is not in physical memory and has to be read from the disk. To make space for the required information, you might need to write some of the data that is currently in memory to the page file on the hard disk. You can look at the Pages Input/sec and Pages Output/sec separately.

The Processor object's %Processor Time counter represents the percentage of time that the processor spends executing applications or operating-system code. Each processor has an Idle thread, which it runs when no other threads are running. The percentage of the time spent running this thread is subtracted from 100 to get the percentage of productive time spent on system and application processes. You do not want %Processor Time to stay above 80 percent to 85 percent. If it does, you need to add another processor if your system will allow it, or upgrade the CPU. Spikes or bursts of 100 percent use for a few seconds (such as when you load a program) are normal and are not cause for concern.

If you want to see how much time the CPU spends in privileged and user modes, two Processor counters--%Privileged Time and %User Time--will give you this information. Applications and any environment subsystems in which they are running will run in User mode. Usually, only operating system processes can run in privileged mode. However, NT can switch application threads to privileged mode when the threads need operating-system services.

The counters I just discussed measure the overall use of system resources. What if you want to look at particular applications or services and see what load they are putting on your system? You can employ several useful counters for just that purpose.

The Process object's %Processor Time counter acts just like the Processor object's %Processor Time. But in this case, you look at the process, not the processor. You can see the percentage of the CPU resource allocated to a specific process. And the Process object also has %Privileged Time and %User Time counters, which tell how that time is split between the two modes. Screen 2 shows several applications running: You can see which takes up the CPU resource. The process App1-5 is highlighted in the legend box at the bottom of the screen, so its statistics are shown in the boxes just below the chart. The process is grabbing about 95 percent of the CPU time. The other four applications barely register on the graph.

Screen 3 shows another way to look at the same data. Click Options, Chart menu item, and select the Histogram option to see this display. The %Processor Time counter under the Thread object shows how much time is allocated to each thread within a process or application.

For an application that is running multiple threads, look no further than Performance Monitor. Screen 4 shows an example.

Another counter to monitor is the System object's Processor Queue Length, which tells you how many threads are waiting on the processor. A single queue for processor time exists even on multiprocessor computers. A sustained value greater than 2 is grounds for an upgrade.

Further Investigation
These counters are only the beginning. For in-depth coverage of Performance Monitor, check out Microsoft's NT resource kits.

Performance Monitor counter definitions are in a Help file, \Common\
Perftool\Cntrtool\Counters.hlp
on the CD-ROM that comes with Microsoft Windows NT Server Resource Kit and Microsoft Windows NT Workstation Resource Kit for NT 4.0. The file is not installed on the list of online documentation files under the resource kit folder, but it is copied to the hard disk and placed in \Ntreskit\PerfTool\CntrTool\
Counters.hlp.
NT 3.51 users can find a list of counter definitions in Volume 4, "Optimizing Windows NT," of Microsoft Windows NT Resource Kit for NT 3.51.