How to Buy the Right Windows NT System
If your experience is anything like mine, you migrated to Windows NT from a DOS/Windows environment. For the most part, you probably appreciate the robustness of the operating system with its multitasking and multithreading capabilities. Yet, when you think back to your Windows 3.1 days, you probably miss the seemingly infinite capacity for editing and tweaking SYSTEM.INI and WIN.INI to increase performance, for adding space to the environment by increasing the shell size, etc.

In Windows NT, many of these capabilities are gone, and you're faced with editing an arcane--and not well-understood--Registry that is the very heart of Windows NT. Don't feel bad. Even many experienced NT users find the Registry daunting. To overcome this lack of knowledge, you may have bought the excellent Windows NT Resource Kit (see the book review in the September issue of Windows NT Magazine) and tried to figure out how to optimize NT by adding keys and changing this and that. Your Registry grew and grew. Then, you upgraded to the next version of NT, and things didn't work right. So you had to start all over again from scratch. All that work was simply thrown away.

Well, there's a way around repeating and repeating that same process. You can optimize the operating system without editing the Registry if you understand how to set up your hardware in an optimized fashion.

Most approaches to optimizing NT assume that you're going to change your system after you purchase it. To the inexperienced computer user, this can be scary. This approach will help you design an NT-optimized system--one that you can order directly from a reseller--and tell you what you need to ask for. Thus, when your system arrives, you will have an optimized NT system when you open the box instead of after spending a couple of days with the Performance Monitor.

Avoiding Bottlenecks
Bottlenecks are notorious, although it can be difficult to determine their cause. One particularly graphic example of a bottleneck involves a catsup bottle at a restaurant. It's the one place that hangs things up. In this case, the bottleneck could be the CPU, the cache, the memory, the I/O performance of the bus, the peripheral controller, the hard drive, or just about anything where the demand exceeds the supply.

To control the effect of bottlenecks--you can never eliminate them--you must optimize each "link" in the processing chain until the overall performance is acceptable for a given task. In simplest terms, a task is a collection of computer events to be completed. These events can involve access to system resources. (We tend to consider tasks in an application, hence the Task Manager in NT and the Taskbar in Windows 95.)

The Fast Bus Ride
For reasons that baffle me, many people consider the Industry Standard Architecture (ISA) bus their bus of choice. The ISA bus was actually considered out of date in 1988. Since the ISA standard was designed for the Intel 286 chip, only 16MB of memory can be accessed directly on the bus. If you add more memory to the system, all access to addresses above 16MB must be buffered to regions below 16MB. This double buffering slows the system down.

Following the ISA standard, IBM developed the Micro Channel Adapter (MCA), and an independent group of hardware vendors developed the Extended Industry Standard Architecture (EISA) standard. Both were 32-bit buses, not 16 bits as the ISA bus was. The number of access lines was increased, and the memory limit for Direct Memory Addressing (DMA) increased to 4GB. Although not normally mentioned, one major advantage that EISA and MCA offered over ISA was their ability to do things rapidly in DMA. In fact, accomplishing an event via DMA was faster than doing it locally.

ISA refuses to die. Fortunately, the Peripheral Component Interface (PCI) and, to a lesser degree, the VESA local bus (VL) have diminished the industry's reliance on it. The lack of DMA memory is still a problem since neither PCI nor VL fully populates the bus. Since PCI and VL have local-bus architectures, the 16MB limit does not exist. So, why bring it up? You may want to add an ISA card to your system for a specific function, and the card may be a bus master. If so, it will have the 16MB legacy and may slow the system down significantly. I don't recommend an ISA bus for anything involving multitasking.

What constitutes a reasonable bus for purchase? Let's consider the attributes it needs.

Enough high-speed/access slots to enable you to work around ISA limitations. If you mirror your data or add a second bus-master controller, that will automatically take up two slots, leaving you only one or two for other bus-master or local-bus cards. Adding video and networking uses all your slots or takes you over the local-bus limit. You need to have three or four local-bus slots. If you choose ISA for your remaining buses, bear in mind that you have no more room for expansion, and the card you will most likely want to add will be PCI or VL. For this reason, I prefer PCI/EISA motherboards. I have a PCI/ISA system although I consider it far from ideal.

A system using bus-mastering devices that can undergo CPU arbitration. A bus-mastering card is typically a card with a coprocessor. The card can directly access memory, and the coprocessor can finish the task so the CPU can do other things. Arbitration determines who controls the CPU and for how long. In Programmed I/O (PIO), the system allows one device to occupy the CPU until the process or thread is finished. (A process is a single running instance of an application; threads are the executable units of a process.)

In an arbitrated scheme, CPU access can be assigned a priority or serviced within defined clock times. The importance of this issue becomes obvious when you consider that one device can disrupt what another device is doing (e.g., getting data off a hard disk can disrupt any other event until the data is in memory). Don't misunderstand me. PIO cards and devices (IDE hard drives, for example) work well with NT and may actually be faster on a single process/thread than a bus-mastering device. This is not, however, the case for multitasking and thread isolation.

Admit it. The near future--at least--is all PCI. You might as well ignore implementation issues and get a PCI-based--and not a VL-based--motherboard. VL may be faster on a per-card basis--no Plug-and-Play overhead--but the industry is standardizing on PCI.

EISA is still a viable option, but the cards are old and most probably won't be updated soon. Even so, PCI/EISA is the optimum motherboard. However, some of the new PCI/ISA motherboards based upon the latest incarnation of the Triton chipset also show promise for workstation use.

CPU, Cache, and Memory
Performance issues arise because processes and related threads are assigned priorities and related time slices. Those with higher priorities can disrupt the performance of those of lower priority. NT is a preemptive multitasking system; it can preempt one thread to start processing another.

In addition, the inherent client/server nature of NT affects performance. When a process creates a thread, that thread is a client of the Win32 subsystem server. If that client asks the server to do something, a complementary thread is created in the subsystem. The server thread and the client thread don't share time space; one is suspended for the other.

Therefore, to optimize NT, you need to provide the smoothest flow possible between threads without causing disruptions in processing. Depending upon the nature of the processes, you will run into various issues. The CPU may be too slow; there may not be enough memory available for the proper functions; or the hard drives and data transfer may be too slow. These are the issues this article addresses.

Cache: Any modern motherboard should have a minimum of 256KB of cache memory. You need to match this cache to the CPU and bus speed. For example, the 33-MHz bus Pentium systems need cache speeds below 10 nanoseconds (ns) while the 30-MHz bus can function well with standard 15-ns cache. Newer motherboards use high-speed single in-line memory module (SIMM)-like cache that can be altered according to the speed of the CPU. If the system has a significant amount of memory, it's probably worthwhile to maximize the cache on the motherboard. You'd probably want to insist on this up front.

CPU: The speed of the CPU is an interesting topic. It pays to invest more in memory than in CPU speed. A DX-4/100 is a good starting point; however, it really doesn't make sense to buy a system running less than a Pentium 75. The cost differential is slight. (I'll address RISC systems in a subsequent article.)

Systems using more than one CPU--symmetrical multiprocessor (SMP) machines--are of considerable interest these days. Most people will tell you that such systems are only for servers. This is far from true. A dual-processor machine can run two threads at once. Whether you need such a system depends on how much CPU-intensive work you do; for example, graphics workstations can make good use of them.

Memory: This is a much debated issue with NT. Microsoft states that 16MB is the starting point. NT can, in fact, run in as little as 8MB or 12MB. However, it "comes into its own" at 20MB to 24MB and is very responsive at 32MB. The more memory you add, the more responsive NT becomes. It's very scaleable in its use of memory. I've never heard of a system with too much.

Controllers and Configuration
If you plan to use Windows NT as a low-cost desktop system for its robustness and security but you don't want to stress the system with heavy multitasking, etc., an Enhanced IDE (EIDE) system will suffice. You should be sure that it has the latest BIOS that fully supports EIDE, including logical block addressing (LBA). If, however, you want to optimize the system to maximize the use of preemptive multitasking and threading, SCSI should be your bus of choice. In fact, NT was designed around the SCSI model.

The development of 16-bit SCSI (Wide SCSI) has created a lot of confusion. Wide SCSI is very efficient in burst mode and functions best in a RAID configuration. Plugging standard SCSI-2 drives into a Wide SCSI controller won't improve the data-transfer speed. However, Wide is faster than SCSI-2 in the right circumstances.

To truly maximize performance, it pays to use two controllers. (In fact, the systems sold by DeskStation Technology come with two on the motherboard.) The controllers of choice are the bus-mastering controllers on Microsoft's Hardware Compatibility List. They include the 2940 family from Adaptec, the NCR810/825 controllers from AT&T, and numerous others. Remember, however, with controllers, you basically get what you pay for.

Hard Drives
I once spent a couple of weeks trying to determine the effect of the hard drive on Windows NT performance. I used a SQL routine that was written to specifically examine disk-/CPU-intensive performance. The platform was a 486 DX2/66 with 32MB of RAM on a nice SuperEISA motherboard. Running the same task on several different combinations of controllers, hard drives, and CPUs yielded the following results:

Adaptec 2742AT (standard SCSI-2 card)

1. Micropolis 4110 8 minutes

2. Maxtor 1240S 7 minutes

3. Quantum 1080 6 minutes

Adaptec 2742W (Wide EISA SCSI card)

1. Seagate Hawk ST31200W 6 minutes

2. Seagate Barracuda
ST32550W 4 minutes

When I switched to a DX4/100 platform, I got the following result:

3. Seagate Barracuda/
100-MHz CPU 3:30 minutes

It is interesting that changing the controller and adding the 5400 rpm Wide Hawk did little to improve the task. However, switching to the 7200 rpm Barracuda had a significant effect. Changing from the 66-MHz CPU to the 100-MHz chip cut 30 seconds off the task. It would seem that the bottleneck is the hard drive and related I/O.

I haven't reproduced this exact test on the Adaptec 2940 and 2940W, but individual tests have verified the trend. Taking cost and other things into consideration, you need the drive with the fastest seek and transfer rates.

The Optimal System
The right Intel-based system for Windows NT is a Pentium 90 or 100--faster CPUs are available and are even better--with at least 32MB of RAM and two SCSI controllers. The first controller has a hard drive containing Windows NT and the page file; the second controller has a hard drive with all your applications.

There are good reasons for this division of labor. If I open a 35MB .TIF file in Picture Publisher 4 NT and the page file is on the same drive as the application, it will take 25 seconds to open the file on a dual Pentium 90 with 64MB of RAM. If I move the page file to a different drive on a different controller, the file will open in 14 seconds. What's the difference? The first time the file is opened, the page file is accessed to add virtual memory. This thread has a very high priority, so the NT Kernel interrupts the open. Moving the file to another drive and controller allows the bus-mastering card to obtain the necessary space and pass it to NT. Since the I/O for each thread is independent, there is little interference between the two.

What advantage does the dual Pentium provide? Using .TIF files, it still takes 14 seconds to open the 35MB file. However, the system can open the 35MB file, as well as a 25MB file, a 12MB file, and a 5MB file in the same amount of time. This shows how threads can be distributed across two processors. With SMP-aware applications that are properly threaded, the performance increase is amazing.

Similarly, increasing the amount of RAM can make a huge performance difference--just not in opening files (in this case, the bottleneck is in transferring the file from the hard drive)--by keeping files in memory for subsequent uses. If you increase the RAM to 128MB, the system takes 14 seconds to open the files the first time, but only four seconds after that. This is a major benefit for image editing since even large files can stay in RAM for further editing.

After nearly a year of working with dual-processor SMP computers, 64MB seems to be a minimum for them. An optimal configuration seems to be more on the order of 96MB or higher.

How about SQL Server? A realistic estimate of the amount of RAM needed is 64MB if you plan to make any complex requests of the SQL routines. SQL Server 4.21 is multiprocessor-aware, and the forthcoming version 6.0 is even more so. A two-processor SMP computer would be the system of choice for serious SQL Server database applications in a moderate environment.

For really large CAD/CAM applications or maximized servers, four- to six-processor computers are available. They are very expensive, but their speed is phenomenal. Remember, the most expensive aspect of any computer installation is wasted user time.

Shopping List

  • When ordering a system, don't let a salesperson overly influence your choice of system components (see "The Bottom Six" on page 96). A system is no better than its worst component. Start with at least a Pentium 75.
  • For a standard desktop system, ask for 32MB of RAM and a reasonably fast hard drive (minimum of 1GB).
  • For systems that will run large files and do major multitasking, use two SCSI controllers. Put Windows NT and the page file on a drive on one controller, and put all your applications on a drive on the other one. Choose a very fast drive to contain the system/ page file.
  • For systems running SQL Server or related databases, increase your memory to 64MB of RAM, and make certain that you use two very fast controllers and drives. You might want to insist on Wide SCSI for the database drives.
  • For large graphical editing applications and large database distributions, choose a system with multiple processors. Make certain that you have at least 48MB of RAM for each processor.
  • Try to distribute your workload. Fast hard drives suffer on poor controllers. Putting six drives on one controller will be slower than putting three on one controller and three on another. In other words, always work to minimize the slowest event--the infamous bottleneck.

On the Right Foot
The right system for Windows NT is one that's reasonably optimized when you buy it. It's a lot easier to start with the right system than to buy the wrong one and spend days running Performance Monitor to try to optimize it. If you get off on the right foot, your experience with Windows NT will be better from the beginning.