Part 1
You've just purchased a new server, installed the operating system and the applications, and
connected the system to the network. At first, everything works well. But as the user load
increases, the system slows to a crawl. So you launch that Windows NT Server application that
automatically tunes every aspect of your system, and users are happy again.
Wake up from your dream. NT Server doesn't work quite like that. As with other operating
systems, you need to tune NT Server to use the server's limited resources efficiently so you can
ultimately provide good service for the end users. Although you can't solve all your tuning
nightmares with one stroke, you can get ahead in the tuning game with some good planning, a basic
tuning methodology, configuration testing, and proactive NT Server monitoring.
Tuning Strategy
To get the most out of NT Server, you need to plan your system layout before you initially load
the operating system, and you have to thoroughly understand your server hardware. Then before you
unleash your server on the user community, you need to test the configuration to be sure that your
server is working as you've planned.
To optimize performance, use the tools provided by NT Server and third-party vendors to monitor
your system and identify potential bottlenecks. If you find a bottleneck, try to determine its
cause; sometimes the cause is elusive. Try one change at a time to fix the problem. If you make
several changes, you will have trouble determining what works and what doesn't. If you make multiple
changes, be sure to compare the results with your baseline to find out whether the changes were
helpful. Always test your new configuration, then test it again. You need to ensure that the changes
you've made have not adversely affected your server.
The first step in optimizing performance is tuning your hardware. Then look at how your
operating environment and applications are using memory, establish baselines to identify problem
areas, and systematically document the effects of modifications you make. This article will give you
a good idea where to start. Next month, I'll talk about tuning disk and network I/O and CPU.
Tuning the Hardware
A good administrator has to understand both hardware and software. Before you load NT Server,
think about which applications it is running, the hardware NT Server is running on, and the goals
you want NT Server to accomplish today and tomorrow.
When you're starting at ground level with your hardware, be sure that you have set the server's
BIOS settings for maximum performance and stability. Have you turned on write-back cache and zero
wait states for memory and set the CPU speed to fast? (OK, fast CPU speed is obvious, but I have
seen what happens when a CPU is set to a lower setting.) Ask your hardware's manufacturer for the
most recent BIOS release level and optimum BIOS settings for the system's architecture. Most
hardware manufacturers have this information on their Web site. Microsoft keeps similar information
on its Web site about NT Server patches or service packs. These patches attempt to fix known
problems and occasionally include performance enhancements.
Getting to Know Your System: Baselines
Before you can remove bottlenecks, you need to understand how NT Server controls and interacts
with your entire system. All the server's major resources-- CPU, memory, disk I/O, network I/O,
and applications--are interrelated, so solving one problem can sometimes cause another. Therefore,
you need to know what effects your solution will have. Two tools included with NT Server 4.0 for
monitoring your system's performance are Performance Monitor (Perfmon) and Task Manager. Although
the NT Workstation and Server resource kits offer other tools (e.g., pviewer.exe, pmon.exe, sc,
etc.), for monitoring specific processes, I'll discuss the functionality of only built-in tools.
Perfmon. You can use Perfmon in a variety of ways to observe NT Server's use of
resources. Collecting performance data over a significant period can help you develop a performance
baseline for detecting bottlenecks and for determining whether your tuning efforts are succeeding.
Typically, a sampling period of 10 minutes is sufficient. For finer grain transaction workloads, set
the sampling interval to between 5 seconds and 10 seconds, or for coarser grain workloads, set the
interval as high as 20 minutes.
To log performance data, start Perfmon from the Start, Programs, Administrative Tools menu. To
enter logging mode, select View, Log, Edit, Add to log. Select all objects, then click Add and then
Done. Screen 1 shows how to begin the logging session: Select Options, Log, enter the name of your
log file (e.g., test1-perfmon-log), specify a sampling interval, and then click Start Log.
Perfmon uses about the same amount of resources to collect measurements from one object or many
objects from NT Server's performance library DLLs, so collect them all. The objects you haven't
measured inevitably are the ones you'll need in later analyses. Perfmon introduces a small amount of
overhead (less than 4 percent) on your server. Perfmon's overhead is slightly greater in some cases:
if your sampling period is very short (less than 3 seconds), if you run Perfmon in chart mode, or if
you have the disk counters turned on.
When you are looking at disk statistics, you must activate the disk counters by using the
diskperf -ye command at the command prompt. The y option sets the system to start
performance disk monitors when you restart the system; the e option lets you measure
physical drives in a striped set. One last note on Perfmon: If you view an active log, Perfmon stops
logging. To avoid this problem, start a second copy of Perfmon and open the current log to view data
that the system is currently logging. "The Windows NT Performance Monitor" in the March
1997 issue gives you more information about using Perfmon.
Task Manager. The NT Task Manager lets you get a quick look at the current
state of the system. Task Manager displays specific information about the processes running on the
system. Task Manager's Process view is similar to that of a UNIX tool called top. Any quick
look rarely justifies the considerable tuning resources required for a brief sampling period, but
this snapshot of your system can help you identify which processes currently demand a great amount
of resources. This information helps you know what to observe with Perfmon.
Start Task Manager by right-clicking on the task bar. From the Processes tab, select View,
Selected columns, to choose which process information you want to observe. In Screen 2, I chose to
observe process identifier (PID), CPU usage, CPU time, memory usage, page faults, and base priority.
This observation tells you which processes have the highest CPU and memory usage, and therefore,
which applications are bogging down the server.
Eliminating Wasted Resources
Beyond knowing which processes are currently running, you need to know what the processes do.
Why run a service or application in the background if you don't need it? For example, why have
Remote Access Service (RAS) turned on if your NT Server does not require RAS? By turning off
unnecessary background processes, you free resources for applications and portions of NT Server that
really need the resources. Knowing your system and understanding the functions you want NT Server to
perform help you tune the server. Under Task Manager, Processes, you can terminate a process by
right-clicking on the process and then selecting End Process (similar to the UNIX kill
command).
Unfortunately, ending an unneeded process is only a temporary fix. If you started the process
as a background service, the process will start again when you reboot the server. As Screen 3 shows,
from Control Panel, Service, you can control whether background processes start automatically or
manually or are disabled when you boot the server. Be careful which processes you kill, because some
processes are essential for NT Server operation. The NT Workstation and NT Server resource kits
explain background services well.
Tuning Memory Resources
Insufficient memory leads to one of the most common bottlenecks you find when you use NT Server.
RAM is a limited resource, and NT Server lets you tune how it uses its memory subsystem. From
Control Panel, Network, Services, select Server, then click Properties. This dialog box offers the
five options shown in Screen 4, page 125. The Help key in this dialog box provides some basic
information about the relationships of these options.
Maximize Throughput for File Sharing. For most multiuser environments, two
options for tuning NTS's memory strategy are of particular interest: Maximize Throughput for File
Sharing and Maximize Throughput for Network Applications. When you select Maximize Throughput for
File Sharing, NT Server uses all available memory for the file system cache (dynamic disk buffer
allocation). This option is good when you use your NT Server only as a file server. Allocating all
memory for file system buffers generally enhances disk and network I/O performance. By providing
more RAM for disk buffers, you increase the probability that NT Server will complete I/O requests in
the faster RAM cache instead of touching the slower file system on the disk. However, this
relationship is not good if you are running any other applications on the NT Server file server:
When you start a client/server application such as Microsoft SQL Server or another memory-intensive
application, the server might begin to page (swap information between RAM and the disk)
excessively.
Let me briefly explain paging. A process's working set is where its current code and
data physically are in RAM. When sufficient RAM is available and a process requests information that
is not currently in RAM, NT Server's virtual memory manager leaves the process's current working set
in RAM and fetches the additional information from disk. This action is a soft page fault. When RAM
resources become scarce (because of either other processes' working sets or RAM that the file system
cache is using), the virtual memory manager moves older pages (4KB size pages for Intel and 8KB size
pages for Alpha CPUs) from RAM to the paging file system (Pagefile.sys) on the disk drive. This
process is a hard page fault.
When a hard page fault occurs, the virtual memory manager has, in essence, stolen some RAM;
that is, it has trimmed the working set of another process to fulfill the currently running
process's request. Occasionally, paging that results from hard page faults is acceptable, but if
this type of paging occurs excessively over a period of time, system resources become unbalanced and
a memory bottleneck forms.
To determine whether your system is paging, use Perfmon to observe the relationship of the
metrics shown in Table 1. One strong indicator of a memory bottleneck is that the Pages/sec counter
is high (greater than 50) and growing compared with your baseline. NT Server sets aside a minimum
amount of RAM for kernel resources that are never paged to disk; the average NT Server will not let
its last 4MB of RAM be paged (similar to the SVR4 UNIX minfree/lotsfree memory thresholds). If
the Available bytes counter is also decreasing to the minimum NT Server goal of 4MB and the disk
drives that house the Pagefile.sys files are busy (marked by an increase in %Disk Time, Disk
bytes/sec, and increased Average Disk Queue Length), you have undoubtedly identified a memory
bottleneck.
Maximize Throughput for Network Applications. The second option for tuning NT
Server's memory strategy is Maximize Throughput for Network Applications. When you select this
option, NT Server allocates less RAM for the file system cache so that running applications can have
access to more RAM. With this option, you get into applications tuning. When you configure
applications such as SQL Server or Microsoft Exchange, you can tune them to use specified amounts of
RAM for areas such as buffers for disk I/O and database cache.
Knowing what is running on your system is particularly important here: If you allocate too much
memory for each application in a multi-application environment, excessive paging can turn into
thrashing, and you will have one slow system. Thrashing is the state when all active
processes and file system cache requests become so great that they overwhelm the system's RAM. When
this condition occurs, requests for RAM create hard page faults at an astounding rate, and the
virtual memory manager begins stealing pages from one process just to fulfill the request of another
process. On a busy system, the increased workload of the virtual memory manager consumes more disk
resources as the use of the Pagefile.sys increase. The server then wastes CPU cycles on memory
management functions instead of servicing productive processes. Thrashing can quickly consume an
inordinate amount of system resources and typically increases users' response times considerably.
You can accept some paging or even continuous paging if the response times to the end user are
reasonable. You can use tools such as third-party remote terminal emulation and remote workstation
emulation from third-party companies to test system loading in conditions that emulate your
environment. This testing provides informative measurements in areas such as user response times and
overall system throughput.
Additional Memory Tuning
You can improve the system's memory/paging performance when loads are heavy by spreading the
paging file across two disks. This modification improves the overall paging file read/write
rates, because more disks are available to process the paging file workload.
During installation, NT Server creates one Pagefile.sys file on the root (C) drive. To spread
the load, review the disk metrics you have gathered with Perfmon and select two disks that are under
the lightest load. Then, as Screen 5 shows, from Control Panel, System, Performance, select Virtual
Memory and create two new paging files, one on each disk. After the new paging file systems are in
place, remove the default Pagefile.sys on the root disk.
As a guide to determining the Pagefile.sys sizes on the new disks, use Perfmon to monitor the
%usage and % peak usage counters of Paging file. Usually you will create
pagefiles that are the same size on both disks, with initial sizes of at least the value shown for
%usage and a maximum of at least the value for % peak usage.
Sizing the Pagefile.sys correctly ensures that NT Server does not waste cycles creating larger
Pagefile.sys files. If possible, dedicate two disk drives to the task of containing the paging file
systems. This approach guarantees that no other application or process will contend with NT Server
when the system needs the paging file system.
If the system begins to page to an unacceptable degree, use Perfmon and Task Manager to isolate
the applications or processes that are draining excessive memory and tune down the memory allocated
to them (if possible). If the application source code is available, you can work with the developers
to improve overall memory performance. When all tuning efforts fail to improve user response times
as a result of lack of memory, place more memory into the system or distribute memory-intensive
applications to the appropriate number of additional servers.
Knowledge Is Power
When you know which processes are running, what the processes do, and how to reallocate memory
to fit your needs, you're getting the most out of your computing power. Next month, I discuss tuning
the other components of NT Server: disk I/O, network I/O, and CPU.