Measure four key server hardware components against your baselines
When you want your servers to perform at their best, Windows Server’s built-in performance monitoring and analysis tools offer insight into potential areas for improvement by letting you monitor current performance information and log this information over time. However, you must understand the core hardware performance factors of any server (i.e., Windows servers, Linux servers) to use the tools effectively.
The four key server hardware components that can be altered to improve performance are the CPU, memory, hard disks, and network interface card (NIC). Three of these components are internal (i.e., CPU, memory, hard disks) and the fourth component is the gateway to the network. Internal server performance determines whether the full NIC capabilities can be utilized, and NIC performance determines whether a well-performing internal system matters. As you can see, all four components are important and depend on one another.
In this article, I’ll cover these four areas of system performance and explain how to monitor them in Windows Server environments. First, I’ll explore how systems thinking helps you understand how these components affect one another. Then I’ll discuss the performance counters available in Windows as they relate to the four hardware components. I’ll also provide some recommendations for improving the performance of your system based on the results of performance monitoring.
Systems Thinking and Creating a Baseline
As you monitor and analyze Windows Server performance, it’s essential to employ systems thinking, which requires you to consider the relationships among the hardware components. For example, if CPU utilization is high, the CPU isn’t automatically seen as the problem. Instead, memory and hard disk utilization should be considered. Is the system using an excessive amount of virtual memory? If that’s the case, CPU utilization might be a symptom of a memory problem rather than evidence of an insufficient CPU speed.
I’ve performed analysis on hundreds of Windows servers and with all of this experience I’ve learned one important general guideline: faster CPUs don’t always solve performance problems. It’s tempting to throw more speed at the problem, but remember the old saying: If a man is lost in a city and he drives faster, he just gets lost faster. You could rephrase this saying for server performance tuning and say that a faster processor just loops faster while it’s waiting for the true bottleneck to finish working.
When analyzing the performance of a Windows server, you should analyze all four core components at the same time. Systems thinking indicates that you’re considering the system as a whole and not just evaluating a single component. Using the systems thinking process will enable you to locate the true performance bottleneck more quickly.
Before I begin exploring the performance counters, let me explain the need for a baseline. A performance baseline provides a representation of the system’s performance during acceptable operations. You can create a performance baseline by monitoring and logging performance counters during a period of normal operations. I prefer to monitor for an entire work window; for example, if the organization functions between 9 A.M. and 5 P.M., I’ll monitor during that entire time. Once you’ve created the performance log, you can open it in the Performance tool and narrow the viewing window to peak utilization times. If the server performed acceptably during peak utilization, you know that the server is well configured for your intended use.
As time goes by, the server is more heavily utilized in most implementations. Users become more familiar with the system and more productive, meaning they do things faster and place more demands on the server. Additionally, more users are often added to the system. All of these factors can result in a poor-performing system. You can create a new performance log and compare it with the original baseline to locate problem areas. As the counters are discussed in the following sections, remember to consider their use in and against a baseline rather than as simple point-in-time measurements.
The Reliability and Performance Monitor in Windows Server 2008 and the Performance tool (sometimes called System Monitor, but displayed simply as Performance) in Windows Server 2003 R2 and earlier provide several important counters related to the four core components. The key CPU counters are listed under the Processor and Process objects. My favorite Processor counters are the % Processor Time counter, the % User Time counter, and the % Privileged Time counter. These three counters are available in the Processor object and can be monitored for all CPUs or specific CPUs, as shown in Figure 1. They’re also available in the Process object and can be monitored for all processes or individual processes.
If you notice that the % Processor Time counter is high in the Processor object, you might want to monitor it in the Process object for each individual process. Doing so will give you insight into which processes are monopolizing the processor’s time. You might choose to offload some of the processes to a different server or you might even be able to stop running some processes. It’s amazing how many unused processes often run on Windows servers and even these unused processes can impact performance as the Windows kernel must still manage them. Examples of unused processes include startup applications that aren’t used, services that are unneeded, and optional application components that run as separate processes.
The % Processor Time counter is inclusive of both user mode and kernel mode OS functions. It’s technically a measurement of the time in which the System Idle Process isn’t running. The System Idle Process runs only when no other process is seeking processor time. I usually look for average % Processor Time values greater than 65 to 70 percent before I’m concerned about the processor.
The % User Time and % Privileged Time counters let you monitor user mode and kernel mode activities independently. These counters can help you determine whether a bottleneck is occurring within an application or within the OS. However, it’s important to remember the architecture of the Windows OS. Most actions are performed in kernel mode, so it’s not uncommon to see 70 percent or more of the activity occurring within kernel or privileged mode.
The most valuable memory counters for general server analysis are located in the Memory object, which is shown in Figure 2.
The memory counters that I find most useful are the Available Kbytes counter and the Pages/sec counter. The Available KBytes counter measures values that sit between Available Bytes and Available Mbytes. The level of detail provided by tracking kilobytes is better than the limited detail of megabytes and the overwhelming detail of bytes.
The Pages/sec counter is used to track the number of virtual memory pages read or written per second. On most systems, a 4KB memory page is used, so you can multiply the Pages/sec value times 4 to calculate the kilobytes passing to or from the virtual memory file each second, which will give you a better understanding of just how much data is moved from RAM to the disk each second.
Hard Disk Counters
The hard disk counters are divided into two objects: LogicalDisk and PhysicalDisk. The counters are very similar and the difference is in the way the disks are referenced. LogicalDisk references the disk by the drive letter and PhysicalDisk references the disk by the drive number (e.g., drive 0). Both objects show the same information for a selected counter. However, if you want to monitor disk activity for all partitions on a disk, you’ll need to use the PhysicalDisk object. The key counters to watch are Average Disk Queue Length, Disk bytes/sec, and Free Megabytes.
The Average Disk Queue Length counter can reveal whether the drive is keeping up with the demand of running processes. The most frequently cited threshold is two items in the queue. If the average is greater than 2, a drive bottleneck might be occurring. This counter should also be compared with the baseline. If the baseline shows an average of 2.3 items in the disk queue and performance was perceived as acceptable, there’s no reason to suggest that performance is unacceptable—at a later time—if the average is the same or lower. Remember, performance is measurable with statistics, but whether performance is “good” or “bad” is a relative issue.
The Disk bytes/sec counter can reveal whether the drive is living up to expectations. Many drives are rated at a certain speed, but they perform at lower speeds. This counter can reveal such behavior. In many cases, updating drive controller drivers might resolve such performance problems.
Free Megabytes isn’t really a performance counter, but it’s very useful in predicting future needs. For example, if you measure the free megabytes for each volume once per month, you can determine consumption rates. With consumption rates documented, you can predict when you’ll need to archive old data or upgrade to larger hard disk drives.
Network Interface Counters
The final counters are the network counters. These counters are found in the Network Interface object. The two key network counters are Bytes Total/sec and Output Queue Length. The Bytes Total/sec counter should be compared to the baseline. If this amount has increased dramatically, it could mean the server is more heavily utilized than it was when the baseline was captured; however, it could also be a sign of a network attack or the need to offload some processes. The Output Queue Length counter might help you decide. If this counter is averaging more than 2, it indicates that the network card (or the data rate of the infrastructure) isn’t able to handle the capabilities provided by the server. Stated differently, the server is throwing data at the NIC faster than the NIC can transmit it out on the wire.
Now that I’ve discussed the 10 most important counters that help you track the core performance factors in your server, let’s look at the process used to capture these counters. Use the following instructions to load these counters into the Performance tool in Windows 2003 R2 or Windows 2003:
After selecting the counters and clicking OK, you should see graphs similar to Figure 3.
By default the counters are monitored automatically and will continue to be monitored until you stop the process. You might have more or less activity on your server depending on current operations. Loading the performance counters into the Performance tool lets you monitor live activity. Monitoring live activity is just one way to use this powerful performance tool. In addition, you must create a performance log if you want to create a baseline. Use the following instructions to create a log that will capture performance data for any length of time:
You now have a performance log configuration. If you created the log configuration with the 10 counters covered in this article, you have an excellent configuration for creating baselines. Use this log to capture a baseline of your server’s performance when it’s performing well. Then, when users inform you that it’s not performing well, you can run the log again and compare the two log files. Figure 4 shows two line graphs generated in Excel 2007 from comma-separated value (CSV) log files created in the Performance tool.
Measure Hardware Performance Against a Baseline
The Performance tool provides counters that can be used to measure the performance of hardware against recommendations or baselines. Capturing the right counters is the key to success with this tool. It’s also important to know that new counters are added every time you install a major Microsoft application (e.g., Microsoft SQL Server, Microsoft Exchange Server, Microsoft IIS).