When you think of a computer system's performance, imagine a chain: The slowest component (or weakest link) affects the performance of the overall system. This weak link in the performance chain is also called a bottleneck. The best indicator that a bottleneck exists is the end user's perception of a lag in a system's or application's response time. To tune a system's performance, you need to determine where—CPU, memory, disk, network, applications, clients, or Windows NT resources—a bottleneck exists. If you add resources to an area that isn't choking your system's performance, your efforts are in vain.
You can use NT Server's native tools (or those of third-party vendors) to optimize the performance of your system and identify potential bottlenecks. NT Server's primary performance tools are Task Manager, which Figure 1, page 42, shows, and Performance Monitor, which Figure 2, page 42, shows. Task Manager can give you a quick look at what's happening in your system. Although it doesn't provide a logging mechanism, Task Manager displays specific information about your system's programs and processes. Task Manager also lets you manage the processes that might be adversely affecting your system. You can use Performance Monitor to obtain more detailed performance information (in the form of charts, alerts, and reports that specify both current activity and ongoing logging) based on system events. The Microsoft Windows NT Server 4.0 Resource Kit also contains tools that you can use for troubleshooting. (For a sampling of NT performance-monitoring tools, see Table 1, page 43.)
Before you start performance tuning, you must understand your system. You should know what server hardware you have, how NT operates, what applications you're running, who uses the system, what kind of workload the system handles, and how your system fits into the network infrastructure. You also need to establish a performance baseline that tells you how your system uses its resources during periods of typical activity. (You can use Performance Monitor to establish your baseline.) Until you know how your system performs over time, you won't be able to recognize slowdowns or improvements in your NT server's performance. Include as many objects in your baseline measurements as possible (e.g., memory, processor, system, paging file, logical disk, physical disk, server, cache, network interface). At a minimum, include all four major resources (i.e., memory, processor, disk, and network interface) when taking a server's baseline measurements—regardless of server function (e.g., file server, print server, application server, domain server).
Because all four of a server's major resources are interrelated, locating a bottleneck can be difficult. Resolving one problem can cause another. When possible, make one change at a time, then compare your results with your baseline to determine whether the change was helpful. If you make several changes before performing a comparison, you won't know precisely what works and what doesn't work. Always test your new configuration, then retest it to be sure changes haven't adversely affected your server. Additionally, always document your processes and the effects of your modifications.
Memory
Insufficient memory is a common cause of bottlenecks in NT Server. A memory deficiency can disguise itself as other problems, such as an overloaded CPU or slow disk I/O. The best first indicator of a memory bottleneck is a sustained high rate of hard page faults (e.g., more than five per second). Hard page faults occur when a program can't find the data it needs in physical memory and therefore must retrieve the data from disk. You can use Performance Monitor to determine whether your system is suffering from a RAM shortage. The following counters are valuable for viewing the status of a system's memory:
You can instruct NT Server to tune the memory that you have in your system. In the Control Panel Network applet, go to the Services tab and select Server. When you click Properties, a dialog box presents four optimization choices, as Figure 3 shows: Minimize Memory Used, Balance, Maximize Throughput for File Sharing, and Maximize Throughput for Network Applications. Another parameter that you can tune—on the Performance tab of the System Properties dialog box—is the virtual memory subsystem (aka the pagefile).
If you have a multiuser server environment, you'll be particularly interested in two of these memory-optimization strategies: Maximize Throughput for File Sharing and Maximize Throughput for Network Applications. When you select Maximize Throughput for File Sharing, NT Server allocates the maximum amount of memory for the file-system cache. (This process is called dynamic disk buffer allocation.) This option is especially useful if you're using an NT Server machine as a file server. Allocating all memory for file-system buffers generally enhances disk and network I/O performance. By providing more RAM for disk buffers, you increase the likelihood that NT Server will complete I/O requests in the faster RAM cache instead of in the slower file system on the physical disk.
When you select Maximize Throughput for Network Applications, NT Server allocates less memory for the file-system cache so that applications have access to more RAM. This option optimizes server memory for distributed applications that perform memory caching. You can tune applications (e.g., Microsoft SQL Server, Exchange Server) so that they use specific amounts of RAM for buffers for disk I/O and database cache.
However, if you allocate too much memory to each application in a multiapplication environment, excessive paging can turn into thrashing. Thrashing occurs when all active processes and file-system cache requests become so large that they overwhelm the system's memory resources. When thrashing occurs, requests for RAM create hard page faults at an alarming rate, and the OS devotes most of its time to moving data in and out of virtual memory (i.e., swapping pages) rather than executing programs. Thrashing quickly consumes system resources and typically increases response times. If an application you're working with stops responding but the disk drive LED keeps blinking, your computer is probably thrashing.
To ease a memory bottleneck, you can increase the size of the pagefile or spread the pagefile across multiple disks or controllers. An NT server can contain as many as 16 pagefiles at one time and can read and write to multiple pagefiles simultaneously. If disk space on your boot volume is limited, you can move the pagefile to another volume to achieve better performance. However, for the sake of recoverability, you might want to place a small pagefile on the boot volume and maintain a larger file on a different volume that offers more capacity. Alternatively, you might want to place the pagefile on a hard disk (or on multiple hard disks) that doesn't contain the NT system files or on a dedicated non-RAID FAT partition.
I also recommend that you schedule memory-intensive applications across several machines. Through registry editing, you can enable an NT server to use more than 256KB of Level 2 cache. Start regedit.exe, go to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management subkey, and double-click SecondLevelDataCache. Click the decimal base, and enter the amount of Level 2 cache that you have (e.g., 512 if you have 512KB). Then, click OK, close the registry editor, and reboot. I also recommend disabling or uninstalling unnecessary services, device drivers, and network protocols.
Processor
To determine whether an NT Server machine has a CPU bottleneck, remember to first ensure that the system doesn't have a memory bottleneck. CPU bottlenecks occur only when the processor is so busy that it can't respond to requests. Symptoms of this situation include high rates of processor activity, sustained long queues, and poor application response. CPU-bound applications and drivers and extreme interrupts (which badly designed disk or network-subsystem components create) are common causes of CPU bottlenecks.
You can use the following counters to view the status of your system's CPU utilization:
One way to resolve processor bottlenecks is to upgrade to a faster CPU (if your system board supports it). If you have a multiuser system that's running multithreaded applications, you can obtain more processor power by adding CPUs. (If a process is multithreaded, adding a processor improves performance. If a process is single-threaded, a faster processor improves performance.) However, if you're running the single-processor NT kernel, you might need to update the kernel to the multiprocessor version. To do so, reinstall the OS or use the resource kit's uptomp.exe utility.
Another way to tune CPU performance is to use Task Manager to identify processes that are consuming the most CPU time, then adjust the priority of those processes. A process starts with a base priority level, and its threads can deviate two levels higher or lower than the base. If you have a busy CPU, you can boost a process's priority level to improve CPU performance for that process. To do so, press Ctrl+Alt+Del to access Task Manager, then go to the Processes tab. Right-click the process, choose Set Priority, and select a value of High, Normal, or Low, as Figure 4 shows. The priority change takes effect immediately, but this fix is temporary: After you reboot the system or stop and start the application, you'll lose the priority properties that you set. To ensure that an application always starts at a specified priority level, you can use the Start command from the command line or within a batch script. To review the Start command's options, enter
start /?
at the command prompt.
Disk
To determine whether your system is experiencing disk bottlenecks, first ensure that the problem isn't occurring because of insufficient memory. A disk bottleneck is easy to confuse with pagefile activity resulting from a memory shortage. To help you distinguish between disk activity related to Virtual Memory Manager's paging to disk and disk activity related to applications, keep pagefiles on separate, dedicated disks.
Before you use Performance Monitor to examine your disks, you must understand the difference between its two disk counters. LogicalDisk counters measure the performance of high-level items (e.g., stripe sets, volume sets). These counters are useful for determining which partition is causing the disk activity, possibly identifying the application or service that's generating the requests. PhysicalDisk counters show information about individual disks, regardless of how you're using the disks. LogicalDisk counters measure activity on a disk's logical partitions, whereas PhysicalDisk counters measure activity on the entire physical disk.
NT doesn't enable Performance Monitor disk counters by default; you must enable them manually. Enabling these counters will result in a 2 to 5 percent performance hit on your disk subsystem. To activate Performance Monitor disk counters on the local computer, type
diskperf -y
at a command prompt. (If you're monitoring RAID, use the -ye switch.) Restart the computer.
To analyze disk-subsystem performance and capacity, monitor the Performance Monitor's disk-subsystem counters. The following counters are available under both LogicalDisk and PhysicalDisk:
If you determine that your disk subsystem is experiencing a bottleneck, you can implement several solutions. You can add a faster disk controller, add more disk drives in a RAID environment (spreading the data across multiple physical disks improves performance, especially during reads), or add more memory (to increase file cache size). You also might try defragmenting the disk, changing to a different I/O bus architecture, placing multiple partitions on separate I/O buses (particularly if a disk has an I/O-intensive workload), or choosing a new disk with a low seek time (i.e., the time necessary to move the disk drive's heads from one data track to another). If your file system is FAT, remember that NTFS is best for volumes larger than 400MB.
You can also provide more disk spindles to the application. How you organize your data depends on your data-integrity requirements. Use striped volumes to process I/O requests concurrently across multiple disks, to facilitate fast reading and writing, and to improve storage capacity. When you use striped volumes, disk utilization per disk decreases and overall throughput increases because the system distributes work across the volumes.
Consider matching the file system's allocation unit size to the application block size to improve the efficiency of disk transfers. However, increasing the cluster size doesn't always improve disk performance. If the partition contains many small files, a smaller cluster size might be more efficient. You can change the cluster size in two ways. At the command line, enter
format :/FS:NTFS /A:
or use Disk Administrator. Select Tools, Format, and change the allocation unit size. NTFS supports a cluster size of 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, 8192 bytes, 16KB, 32KB, or 64KB. FAT supports a cluster size of 8192 bytes, 16KB, 32KB, 64KB, 128KB, or 256KB.
Network Interface
After you consider a system's memory, CPU, and disk metrics, your next step is to examine the network subsystem. Client machines and other systems must be able to connect quickly to the NT Server system's network I/O subsystem so that they provide acceptable response times to end users. To determine where network bottlenecks reside and how to fix them, you must understand what type of workload your client systems generate, which key network architecture components are in use, and what type of network protocol (e.g., Ethernet, NetBEUI) and physical network you're on. Performance Monitor collects data for each physical network adapter. To determine how busy your adapters are, use the following counters:
If you determine that the network subsystem is experiencing a bottleneck, you can implement numerous measures to alleviate the problem. You can bind your network adapter to only those protocols that are currently in use, upgrade your network adapters to the latest drivers, upgrade to better adapters, or add adapters to segment the network (so that you can isolate traffic to appropriate segments). Check overall network throughput, and improve physical-layer components (e.g., switches, hubs) to confirm that the constraint is in the network plumbing. You might also try distributing the processing workload to additional servers.
In a TCP/IP network, you can adjust the TCP window size for a potential improvement in performance. The TCP/IP receive window size shows the amount of receive data (in bytes) that the system can buffer at one time on a connection. In NT, the window size is fixed and defaults to 8760 bytes for Ethernet, but you can adjust the window size in the registry. You can either modify the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ Parameters\TcpWindowSize subkey to globally change the setting on the computer or use the setsockopt() Windows Sockets call to change the setting on a per-socket basis. The optimal size depends on your network architecture. In TCP, the maximum achievable throughput equals window size divided by round-trip delay or network latency.
Finally, don't use Autosense mode in your network adapters. Set your NICs to the precise speed that you want. To change the setting, use the configuration program that came with your network adapter.
Understand Your Environment
Performance analysis requires logical thinking, testing, and patience. NT's primary monitoring and tuning tools can help you manage the performance of your company's systems.
The key to achieving your objective is understanding what you have, how your applications work, and how your users use your network. To resolve any performance problems and plan for future requirements, combine the knowledge you gain from these tools' output with an understanding of your applications and your environment.