Part 1

You've just purchased a new server, installed the operating system and the applications, and connected the system to the network. At first, everything works well. But as the user load increases, the system slows to a crawl. So you launch that Windows NT Server application that automatically tunes every aspect of your system, and users are happy again.

Wake up from your dream. NT Server doesn't work quite like that. As with other operating systems, you need to tune NT Server to use the server's limited resources efficiently so you can ultimately provide good service for the end users. Although you can't solve all your tuning nightmares with one stroke, you can get ahead in the tuning game with some good planning, a basic tuning methodology, configuration testing, and proactive NT Server monitoring.

Tuning Strategy
To get the most out of NT Server, you need to plan your system layout before you initially load the operating system, and you have to thoroughly understand your server hardware. Then before you unleash your server on the user community, you need to test the configuration to be sure that your server is working as you've planned.

To optimize performance, use the tools provided by NT Server and third-party vendors to monitor your system and identify potential bottlenecks. If you find a bottleneck, try to determine its cause; sometimes the cause is elusive. Try one change at a time to fix the problem. If you make several changes, you will have trouble determining what works and what doesn't. If you make multiple changes, be sure to compare the results with your baseline to find out whether the changes were helpful. Always test your new configuration, then test it again. You need to ensure that the changes you've made have not adversely affected your server.

The first step in optimizing performance is tuning your hardware. Then look at how your operating environment and applications are using memory, establish baselines to identify problem areas, and systematically document the effects of modifications you make. This article will give you a good idea where to start. Next month, I'll talk about tuning disk and network I/O and CPU.

Tuning the Hardware
A good administrator has to understand both hardware and software. Before you load NT Server, think about which applications it is running, the hardware NT Server is running on, and the goals you want NT Server to accomplish today and tomorrow.

When you're starting at ground level with your hardware, be sure that you have set the server's BIOS settings for maximum performance and stability. Have you turned on write-back cache and zero wait states for memory and set the CPU speed to fast? (OK, fast CPU speed is obvious, but I have seen what happens when a CPU is set to a lower setting.) Ask your hardware's manufacturer for the most recent BIOS release level and optimum BIOS settings for the system's architecture. Most hardware manufacturers have this information on their Web site. Microsoft keeps similar information on its Web site about NT Server patches or service packs. These patches attempt to fix known problems and occasionally include performance enhancements.

Getting to Know Your System: Baselines
Before you can remove bottlenecks, you need to understand how NT Server controls and interacts with your entire system. All the server's major resources-- CPU, memory, disk I/O, network I/O, and applications--are interrelated, so solving one problem can sometimes cause another. Therefore, you need to know what effects your solution will have. Two tools included with NT Server 4.0 for monitoring your system's performance are Performance Monitor (Perfmon) and Task Manager. Although the NT Workstation and Server resource kits offer other tools (e.g., pviewer.exe, pmon.exe, sc, etc.), for monitoring specific processes, I'll discuss the functionality of only built-in tools.

Perfmon. You can use Perfmon in a variety of ways to observe NT Server's use of resources. Collecting performance data over a significant period can help you develop a performance baseline for detecting bottlenecks and for determining whether your tuning efforts are succeeding. Typically, a sampling period of 10 minutes is sufficient. For finer grain transaction workloads, set the sampling interval to between 5 seconds and 10 seconds, or for coarser grain workloads, set the interval as high as 20 minutes.

To log performance data, start Perfmon from the Start, Programs, Administrative Tools menu. To enter logging mode, select View, Log, Edit, Add to log. Select all objects, then click Add and then Done. Screen 1 shows how to begin the logging session: Select Options, Log, enter the name of your log file (e.g., test1-perfmon-log), specify a sampling interval, and then click Start Log.

Perfmon uses about the same amount of resources to collect measurements from one object or many objects from NT Server's performance library DLLs, so collect them all. The objects you haven't measured inevitably are the ones you'll need in later analyses. Perfmon introduces a small amount of overhead (less than 4 percent) on your server. Perfmon's overhead is slightly greater in some cases: if your sampling period is very short (less than 3 seconds), if you run Perfmon in chart mode, or if you have the disk counters turned on.

When you are looking at disk statistics, you must activate the disk counters by using the diskperf -ye command at the command prompt. The y option sets the system to start performance disk monitors when you restart the system; the e option lets you measure physical drives in a striped set. One last note on Perfmon: If you view an active log, Perfmon stops logging. To avoid this problem, start a second copy of Perfmon and open the current log to view data that the system is currently logging. "The Windows NT Performance Monitor" in the March 1997 issue gives you more information about using Perfmon.

Task Manager. The NT Task Manager lets you get a quick look at the current state of the system. Task Manager displays specific information about the processes running on the system. Task Manager's Process view is similar to that of a UNIX tool called top. Any quick look rarely justifies the considerable tuning resources required for a brief sampling period, but this snapshot of your system can help you identify which processes currently demand a great amount of resources. This information helps you know what to observe with Perfmon.

Start Task Manager by right-clicking on the task bar. From the Processes tab, select View, Selected columns, to choose which process information you want to observe. In Screen 2, I chose to observe process identifier (PID), CPU usage, CPU time, memory usage, page faults, and base priority. This observation tells you which processes have the highest CPU and memory usage, and therefore, which applications are bogging down the server.

Eliminating Wasted Resources
Beyond knowing which processes are currently running, you need to know what the processes do. Why run a service or application in the background if you don't need it? For example, why have Remote Access Service (RAS) turned on if your NT Server does not require RAS? By turning off unnecessary background processes, you free resources for applications and portions of NT Server that really need the resources. Knowing your system and understanding the functions you want NT Server to perform help you tune the server. Under Task Manager, Processes, you can terminate a process by right-clicking on the process and then selecting End Process (similar to the UNIX kill command).

Unfortunately, ending an unneeded process is only a temporary fix. If you started the process as a background service, the process will start again when you reboot the server. As Screen 3 shows, from Control Panel, Service, you can control whether background processes start automatically or manually or are disabled when you boot the server. Be careful which processes you kill, because some processes are essential for NT Server operation. The NT Workstation and NT Server resource kits explain background services well.

Tuning Memory Resources
Insufficient memory leads to one of the most common bottlenecks you find when you use NT Server. RAM is a limited resource, and NT Server lets you tune how it uses its memory subsystem. From Control Panel, Network, Services, select Server, then click Properties. This dialog box offers the five options shown in Screen 4, page 125. The Help key in this dialog box provides some basic information about the relationships of these options.

Maximize Throughput for File Sharing. For most multiuser environments, two options for tuning NTS's memory strategy are of particular interest: Maximize Throughput for File Sharing and Maximize Throughput for Network Applications. When you select Maximize Throughput for File Sharing, NT Server uses all available memory for the file system cache (dynamic disk buffer allocation). This option is good when you use your NT Server only as a file server. Allocating all memory for file system buffers generally enhances disk and network I/O performance. By providing more RAM for disk buffers, you increase the probability that NT Server will complete I/O requests in the faster RAM cache instead of touching the slower file system on the disk. However, this relationship is not good if you are running any other applications on the NT Server file server: When you start a client/server application such as Microsoft SQL Server or another memory-intensive application, the server might begin to page (swap information between RAM and the disk) excessively.

Let me briefly explain paging. A process's working set is where its current code and data physically are in RAM. When sufficient RAM is available and a process requests information that is not currently in RAM, NT Server's virtual memory manager leaves the process's current working set in RAM and fetches the additional information from disk. This action is a soft page fault. When RAM resources become scarce (because of either other processes' working sets or RAM that the file system cache is using), the virtual memory manager moves older pages (4KB size pages for Intel and 8KB size pages for Alpha CPUs) from RAM to the paging file system (Pagefile.sys) on the disk drive. This process is a hard page fault.

When a hard page fault occurs, the virtual memory manager has, in essence, stolen some RAM; that is, it has trimmed the working set of another process to fulfill the currently running process's request. Occasionally, paging that results from hard page faults is acceptable, but if this type of paging occurs excessively over a period of time, system resources become unbalanced and a memory bottleneck forms.

To determine whether your system is paging, use Perfmon to observe the relationship of the metrics shown in Table 1. One strong indicator of a memory bottleneck is that the Pages/sec counter is high (greater than 50) and growing compared with your baseline. NT Server sets aside a minimum amount of RAM for kernel resources that are never paged to disk; the average NT Server will not let its last 4MB of RAM be paged (similar to the SVR4 UNIX minfree/lotsfree memory thresholds). If the Available bytes counter is also decreasing to the minimum NT Server goal of 4MB and the disk drives that house the Pagefile.sys files are busy (marked by an increase in %Disk Time, Disk bytes/sec, and increased Average Disk Queue Length), you have undoubtedly identified a memory bottleneck.

Maximize Throughput for Network Applications. The second option for tuning NT Server's memory strategy is Maximize Throughput for Network Applications. When you select this option, NT Server allocates less RAM for the file system cache so that running applications can have access to more RAM. With this option, you get into applications tuning. When you configure applications such as SQL Server or Microsoft Exchange, you can tune them to use specified amounts of RAM for areas such as buffers for disk I/O and database cache.

Knowing what is running on your system is particularly important here: If you allocate too much memory for each application in a multi-application environment, excessive paging can turn into thrashing, and you will have one slow system. Thrashing is the state when all active processes and file system cache requests become so great that they overwhelm the system's RAM. When this condition occurs, requests for RAM create hard page faults at an astounding rate, and the virtual memory manager begins stealing pages from one process just to fulfill the request of another process. On a busy system, the increased workload of the virtual memory manager consumes more disk resources as the use of the Pagefile.sys increase. The server then wastes CPU cycles on memory management functions instead of servicing productive processes. Thrashing can quickly consume an inordinate amount of system resources and typically increases users' response times considerably.

You can accept some paging or even continuous paging if the response times to the end user are reasonable. You can use tools such as third-party remote terminal emulation and remote workstation emulation from third-party companies to test system loading in conditions that emulate your environment. This testing provides informative measurements in areas such as user response times and overall system throughput.

Additional Memory Tuning
You can improve the system's memory/paging performance when loads are heavy by spreading the paging file across two disks. This modification improves the overall paging file read/write rates, because more disks are available to process the paging file workload.

During installation, NT Server creates one Pagefile.sys file on the root (C) drive. To spread the load, review the disk metrics you have gathered with Perfmon and select two disks that are under the lightest load. Then, as Screen 5 shows, from Control Panel, System, Performance, select Virtual Memory and create two new paging files, one on each disk. After the new paging file systems are in place, remove the default Pagefile.sys on the root disk.

As a guide to determining the Pagefile.sys sizes on the new disks, use Perfmon to monitor the %usage and % peak usage counters of Paging file. Usually you will create pagefiles that are the same size on both disks, with initial sizes of at least the value shown for %usage and a maximum of at least the value for % peak usage.

Sizing the Pagefile.sys correctly ensures that NT Server does not waste cycles creating larger Pagefile.sys files. If possible, dedicate two disk drives to the task of containing the paging file systems. This approach guarantees that no other application or process will contend with NT Server when the system needs the paging file system.

If the system begins to page to an unacceptable degree, use Perfmon and Task Manager to isolate the applications or processes that are draining excessive memory and tune down the memory allocated to them (if possible). If the application source code is available, you can work with the developers to improve overall memory performance. When all tuning efforts fail to improve user response times as a result of lack of memory, place more memory into the system or distribute memory-intensive applications to the appropriate number of additional servers.

Knowledge Is Power
When you know which processes are running, what the processes do, and how to reallocate memory to fit your needs, you're getting the most out of your computing power. Next month, I discuss tuning the other components of NT Server: disk I/O, network I/O, and CPU.