Keep One Eye on the Disk, the Other on the Network Interface, and Pare Down the Protocols.

When you set up your Windows NT network, you probably played around with the Performance Manager (Perfmon), an extremely powerful--and free--tool. But once you start working with it more regularly, you realize that you're faced with an embarrassment of riches: There are a lot of things that you can monitor with Perfmon. Furthermore, you want to monitor different kinds of things depending on what your Windows NT Server does. NT Servers tend to act primarily as either file servers or application servers--two very different tasks. Let's look at how you tune an NT-based file server.

The Bottleneck Battle
In a file server, you usually find bottlenecks in either the disk subsystem or the network interface. (Actually, through its indirect role in memory management--a common application server bottleneck--the disk drive has the dubious honor of being a source of bottlenecks for both file servers and application servers.)

There are a few ways you can recognize a disk bottleneck. Watch the object PhysicalDisk and the counter "percent disk time" in Perfmon. If the disk usage time gets up around 90%, you have a problem.

Realistically, though, this isn't much of a revelation: After all, most network administrators reflexively peek at the server's disk light whenever they pass by it, and, if the hard disk access light is constantly turned on, they don't need Perfmon to tell them that they're disk-bound. (By the way, none of the Perfmon counters for disk activity will work until you open a command line, type diskperf -y, press Enter, and restart the system.)

A more subtle, but just as useful, indicator is the "disk queue length," another counter in the PhysicalDisk object. The disk queue length is an artifact of NT's multitasking nature. Just as many people may want hamburgers from McDonalds simultaneously, creating a line at the counter, so can a number of NT processes all want data from the centralized Disk Manager process at any point in time, leading to a line-up for the disk.

If the line becomes too long, it's an indication that you need to expand the disk bandwidth in some way. "Too long" in this case means that if the average disk queue length has more than two processes waiting, then you've got a disk bottleneck.

When you're trying to break a disk bottleneck, the first line of attack is usually to optimize the disk cache. However, that's not an option for Windows NT administrators: NT sets its own cache size, and there's nothing you can do to change it.

Software Remedies
You have two main software options that enable you to adjust a disk bottleneck. The first is load balancing.

If you have several file servers, compare the performance numbers for them to see if one server's disk is working a lot harder than the others. If it is, you can move some data from the busier server to another one that is less busy. For example, you could move some people's home directories, or, as one particularly cynical systems administrator said to me, you could "move the games."

The other software remedy is to adjust the relative priorities of the file server and the print server. The file-server thread in Windows NT has a lower priority than the print-server thread, so you can boost the priority of the file-server thread if you want to see quicker responses to file requests.

This is, of course, something of a zero-sum gain; you'll be slowing down print-service requests when you do this. Furthermore, it doesn't really break the disk bottleneck, as the print server is somewhat disk-bound also; however, users tend to be more sensitive to file-access response times than to print-access response times.

Print-server priority, by default, is set to 2, and file-server priority, to 1; the larger the number, the higher the priority. You change the file server's priority by modifying the Registry in the current control set; in Services\Lanman\Server\Parameters, you add the value entry ThreadPriority of type DWORD and set it to 2.

The Midas Touch
If the software remedies don't work, then the only alternative is to buy some hardware. Getting faster disk hardware partially boils down to what you'd expect: buy disks with faster (smaller) seek times and better (bigger) data-transfer rates.

Cheap multigigabyte drives are widely advertised with data-transfer rates of 4MB per second (MBps) and up, so that's a beginning. And be sure to use 32-bit, bus-mastering disk adapters. I see a surprisingly large number of 16-bit disk adapters in the servers that I run across when consulting. Slow adapters are a real "no-no."

While the low price of IDE drives makes them tempting, buy a SCSI drive instead due to the greater flexibility that it offers and its ability to do asynchronous disk I/O. All IDEs and many SCSI systems that include more than one physical hard disk cannot perform simultaneous I/O on those disks, believe it or not. If you make simultaneous requests of multiple drives, they'll be handled one at a time, rather than letting the drives all seek at the same time.

On the other hand, SCSI host adapters that support asynchronous I/O can keep all your disk drives busy at the same time. You should look for that feature when buying host adapters (most SCSI adapters don't support asynchronous I/O, and no IDEs do under Windows NT).

If you have a host adapter that supports asynchronous I/O, then exploit that power with stripe sets. Use the Disk Administrator to create a stripe set across a number of drives. As a stripe set distributes a disk's data across several drives, reading the data can be quite fast, working all of the physical drives on the stripe set in parallel.

The Network Interface
Although most of the process of tuning file servers lies in the disks, some of it can be achieved with the network interface. As with disks, many improvements come from better hardware--32-bit bus mastering boards and faster underlying networks. But there are some software tweaks available as well.

Look in Perfmon in the Server object "Bytes Total/sec" and compare that amount to the rated medium speed (10 megabits per second (Mbps) or 100Mbps for Ethernet, 4Mbps or 16Mbps for Token Ring). If your server's total network throughput is anywhere near the network medium's speed, then you're working the network too hard, feeding it more data than it can transport. Is it possible to segment the network, add a router or two, or perhaps shift some workstations to another segment?

On an Ethernet, even a network throughput of about 0.8MBps is worrisome, as Ethernets tend to become collision-ridden at the two-thirds utilization mark. Token Rings can reach the speed of the networking medium (0.5MBps or 2MBps, depending on the type) without a loss in performance. And, in the extreme, you can always go to the faster version of Ethernet (100Mbps) or Token Ring (16Mbps).

Trim a Few Protocols
Another way to streamline the network end of a server is to trim extra unnecessary protocols. Do you need to keep NetBEUI on your system?

Running NetBEUI in combination with NWLINK on a network that really only needs NWLINK generates extra network traffic and forces the server software to do twice as much work. It has to respond to both the meaningful NWLINK messages and the extra NetBEUI chatter. If possible, you should choose one protocol--NetBEUI, TCP/IP, or NWLINK--and use only that for the file server. I use TCP/IP on some of the file servers at my firm and get excellent results from it.

Rearranging network bindings can sometimes provide better throughput. Click on the Bindings button in the Network applet of the Control Panel, select NetBIOS, and choose the order in which to bind the transport protocols to it. NetBIOS is the network binding interface that the Windows NT file-server software sits atop.

However, Windows NT ships with three protocols--NetBEUI, TCP/IP, and NWLINK--that all have NetBIOS interfaces on them so they can provide services to the server software. Thus, the server software has "too many options," slowing down the process of serving the files (hence, the earlier recommendation to remove protocols). If you can't exclude a protocol, then you can use the binding order to establish which protocol has first crack at the server requests.

And, finally, a few file-server tuning odds and ends:

  • If you are using TCP/IP and the Windows Internet Name System (WINS), then binding the NetBIOS to TCP/IP will greatly reduce system broadcasts.
  • If you are using Windows for Workgroups (WFW) workstations and TCP/ IP, be absolutely sure that you have installed the version of WFW that is on the NT CD-ROM, or at least the update files. They will radically affect TCP/IP performance.
  • Interrupt Request 10 (IRQ10) has a slightly higher system priority than the more commonly used IRQ5 does, so employ it for your boards when possible.

There is no generally accepted benchmark for network performance that is both generic--runs on all networks--and nontrivial. My tests with TCP/IP drivers on Ethernet and Token Ring cards show that if you can enable shared RAM on your network cards, then you should. Use it as much as possible. This does not apply to bus-mastering EISA, Peripheral Component Interconnect (PCI), and Micro Channel cards; shared RAM doesn't seem to improve their performance.