Use third-party tools to optimize multiprocessor configurations

Last month, I mentioned that you achieve optimal application performance from multiprocessor systems when you tune Windows NT and your applications to your hardware configuration (see "8-Way Scalability," September 1998). For example, on an 8-way system you can dedicate four processors to SQL Server, two to Internet Information Server (IIS), and two to NT. On a 4-way system you can balance your IRQ load across the processors, or you can dedicate processors for different network cards or disk interfaces.

Performing this tuning can be complicated and risky. You have to tune NT software and set processor affinities via a Registry editor because Microsoft doesn't provide any other tools for NT 4.0. NT 5.0 will include a symmetric multiprocessing (SMP) tuning utility.

If you don't want to wait for NT 5.0, you can use third-party tools to tune NT safely. The Windows NT Magazine Lab Guys tested two tuning products. MCSB Technology's AutoPilot P/SA is a tool that makes tuning configuration decisions. NCR's SMP Utilization Manager 2.5, reviewed on page 80, is a full-blown GUI tool that lets you set and reset tuning parameters.


AutoPilot P/SA
Systems administrators and IS managers must squeeze every ounce of performance from their Windows NT servers. The Windows NT Magazine Lab Guys use many tools to tweak our servers for fast I/O, few collisions, and maximum performance. The Lab is always looking for new products to improve system performance.

MCSB Technology's Performance Assistant Family consists of three products: Resource and Data Activity Repository (RADAR), Performance Tool Kit (PTK), and AutoPilot Performance/Scalability Accelerator (P/SA). RADAR is a data repository tool that accumulates and stores system performance information. PTK lets you access the data in the repository. AutoPilot P/SA uses the information to adjust how the system distributes the workload among the processors.

I tested AutoPilot, which separates facets of its functionality into parts that you can combine into loadable run-time modules. These modules let you collect data in various forms to generate a series of system thread and environmental recommendations for optimizing system performance. Collectively, the modules detect and react to conditions that impede system performance, including resource contention, high cache miss rates, and thread starvation.

AutoPilot works with the NT scheduler to make informed decisions concerning threads you select to run. AutoPilot's collection modules gather system data and pass it to algorithm and environmental modules. These modules use the data to detect potential performance problems. AutoPilot determines how to optimize the system workload.

I tested AutoPilot's performance on a name-brand server with four 200MHz Pentium Pro processors, 512KB Level 2 cache, 2GB of RAM, a 3.5" drive, a CD-ROM drive, two SCSI 4.5GB hard disks, an Adaptec 7880 SCSI controller, and an Adaptec 6944-UW four-port network card. I inserted the AutoPilot CD-ROM into my test server and accessed the setup file from the CD-ROM's root directory.

The installation wizard walks you through each step and prompts you to update system files if necessary. When you update the system files, AutoPilot reboots the system to update the appropriate run-time files. After the installation finishes, you must reboot the system again to start the AutoPilot service.

An AutoPilot icon appears in the tool tray in the lower right corner of your screen. Right-click the icon to open the AutoPilot P/SA Properties dialog box. AutoPilot is simple to use. Select the AutoPilot2 check box in the Modules tab, as Screen 1 shows, to load all the modules. To view individual modules, click the plus symbol (+) next to the AutoPilot2 check box. AutoPilot's modules are NT device drivers that dynamically load if you select their check box and unload if you clear their check box.

The AutoPilot service starts manually by default, but you can configure it to start automatically. Click the word AutoPilot2 and click Properties to open the General tab. Select the Enable automatic startup check box, and click OK. AutoPilot will then start automatically after each system boot.

To test the 4-way server's system performance, I ran the AIM Technology Domain Server WNT tests before and after I installed AutoPilot. AIM tests simulate domain server tasks, including light file transfers; network routing; packet forwarding; email; and shared applications such as spreadsheets, word processing, and network maintenance. (For more information about AIM Technology's tests, see "AIM Technology Server Benchmark Test," page 78.) I had to run the AIM tests twice after I installed AutoPilot: once to establish a profile and again to test performance.

Before I installed AutoPilot, the server's WNT Peak Performance was 1378.7 and its WNT Sustained Performance was 1313.9. After I installed AutoPilot, the WNT Peak Performance jumped to 1408.2, an increase of 2 percent. The WNT Sustained Performance jumped to 1344.7, also an increase of 2 percent. I was disappointed with this performance increase, but I attributed the results to the profile run. When I ran the tests again, performance increased only slightly.

The system performance increases we measured weren't as great as the substantial performance improvement MCSB Technology advertises. Our tests yielded only minor performance increases. To evaluate AutoPilot in your environment, download a 14-day evaluation copy from MCSB Technology's Web site.

—Jonathan Cragle

AutoPilot P/SA
Contact: MCSB Technology * 800-701-2436
Web: http://www.mcsb.com
Price: $995
System Requirements: Intel Pentium or Pentium Pro processor, 16MB of RAM, VGA or higher-resolution video adapter

SMP Utilization Manager 2.5
If you invest in a multiprocessor system, you need to maximize its performance. NCR's SMP Utilization Manager 2.5 is a performance-tuning tool for Intel-based multiprocessor systems that significantly improves your system's performance. NCR's engineers used their knowledge of symmetric multiprocessing (SMP) load balancing for UNIX when they designed SMP Utilization Manager.

When you run multiple applications on a multiprocessor system, you must control each application's CPU usage. You might want one application to dominate the system to ensure quick response time for users. Or you might want to equalize applications' CPU usage to prevent one application from dominating the system and slowing users' access to other applications.

SMP Utilization Manager lets you control which processors an application, process, thread, or interrupt uses. CPU cache hits improve when you limit and specify the processors an application can use. You can set priorities for processes and threads. NT allots more CPU time for an application with many threads than it does for an application with few threads. Setting processor affinity lets you adjust allocation of CPU resources. For an overview of NT's process dispatcher, see "How Windows NT Dispatches Processes and Threads."

Product Features
SMP Utilization Manager has two main components. The first component is a two-window GUI interface you use to set CPU affinity for processes, threads, and system hardware interrupt processing. The second component is a command-line interface you use to automate standard affinity configuration changes.

To set CPU affinity at the process level, you assign a process one of four priority levels, as Screen 2 shows: Real Time, Very High, Normal, or Idle. You have several options for setting CPU affinity for a process' threads. You can set threads to one of five priority levels, as Screen 2 shows: Highest, Above Normal, Normal, Below Normal, or Idle. Threads inherit their parent process' CPU affinity mask. Thus, the dispatcher can use any of the parent process' assigned processors to process the threads. SMP Utilization Manager lets you assign threads to processors in round-robin hard or round-robin preferred sequences. In a round-robin hard sequence, a thread runs on only the processor you assign to it. In a round-robin preferred sequence, you set an ideal processor for a thread. The dispatcher chooses the preferred processor if possible. You can also set processor affinity for individual threads. This ability is useful if a process' threads have different execution characteristics.

SMP Utilization Manager lets you set CPU affinity for system hardware interrupt processing, as Screen 3 shows. The Interrupts tab shows you which driver is using each interrupt, so you don't need to check your device list or run Windows NT Diagnostics (winmsd.exe) to monitor interrupt usage.

SMP Utilization Manager runs as a process and initializes at system boot time. It can set CPU affinity for only system processes that initialize at startup. You can use the setaffin.exe command-line interface to automate configuration of processes that initialize after startup. Setaffin.exe sets hard CPU affinity for processes only--not threads or interrupts. SMP Utilization Manager 2.5's setaffin.exe utility has a bug that requires the utility to run as a scheduled process rather than as a command at the command line. To avoid this problem, run setaffin.exe with the Microsoft Windows NT Server 4.0 Resource Kit SOON command scheduler.

SMP Utilization Manager 3.0 will have significant improvements over SMP Utilization Manager 2.5. The persistence of CPU affinity throughout booting is improved so that every service has the desired affinity mask set when it starts. A new Launch Manager tab will let you apply desired affinity settings and automatically start applications that do not run as services. The setaffin.exe command-line interface is enhanced to support all GUI functions.

Typical Tuning
NCR's Optimizing Windows NT on NCR Servers provides information about tuning NT. It covers NT system services and LAN protocols, SQL Server, Exchange, SAP R/3, and Lotus Notes.

Basic tuning with SMP Utilization Manager is simple. You must determine how you are using system resources. Performance Monitor shows you the most important system resource information: CPU usage. You can use this information to assign workloads to your system's processors. You must monitor system performance after each change to be sure your changes produce the desired effect.

An easy place to start tuning is to set interrupt CPU affinity for LAN adapters and the network device interface specification (NDIS) process. You can bind the network packet processing code to one processor to improve efficiency. NCR recommends that you bind network adapter interrupts to the highest processor, which NT uses by default. Bind a different processor to your SCSI adapter interrupts for optimal performance.

For efficiency, consider demand on CPU resources when you allocate major applications to processors. Suppose you are running SQL Server, Exchange, and file services on an 8-way system. Performance monitoring shows that Exchange and file services use approximately equal CPU resources. SQL Server uses twice the CPU resources as Exchange and file services use. Thus, you might assign four processors to SQL Server, two processors to Exchange, and two processors to other system processes.

Test Results
I ran several benchmark tests to determine the effect of CPU affinity tuning on application performance. I used NCR's WorldMark 4380 8-way system as a test server. (For a review of the WorldMark 4380, see John Enck, "8-Way Scalability," September 1998.) I used Bluecurve's Dynameasure benchmarking tools. My test load consisted of 17 clients generating Dynameasure's file service workload (i.e., file copy to and from the test server), and 16 clients generating Dynameasure's SQL Server online transaction processing (OLTP) workload (i.e., random reads, including two and three table joins). In each test, I configured SQL Server to use the first four processors (P0 to P3) and other system processes to use the last four processors (P4 to P7). I configured SMP Utilization Manager to run during the untuned tests, and I set CPU affinity to make all processors eligible.

In my first test, I assigned SQL Server threads to processors in a round-robin hard sequence. I configured the remaining processes (except system and idle processes) to use processors P4 to P7. Rather than set CPU interrupt affinity, I let IRQs use any processor. Utilization metrics showed that none of the CPUs were stressed during the test, with average CPU utilization of 25 percent. Theoretically, low CPU utilization indicates that processors are available and that the system can dispatch a thread to the same processor it ran on previously. In my tuned configuration, average response time for SQL Server transactions improved by 9.5 percent, and throughput for file copy tests improved by 5 percent.

In subsequent tests I added CPU affinity settings for network and disk I/O interrupts. I selected the P7 processor for the Adaptec and SMC 100Base-TX adapters and the P4 processor for the Adaptec and Mylex disk controllers. I increased the number of client motors that Dynameasure used to expand the test workload by 50 percent. In the file server tests, throughput in KB per second for the tuned configuration was almost identical to throughput for the untuned configuration: The tuned configuration showed an increase of only 0.5 percent. However, transactions per second increased 12.2 percent for the tuned configuration. In addition, the average response time was 12.4 percent faster for the tuned configuration. The SQL Server tests showed similar improvement, with an average response time 14.3 percent faster for the tuned configuration than for the untuned configuration.

The Bottom Line
SMP Utilization Manager improves 8-way server performance in multiple application environments. You can also use it on 4-way systems. NCR bundles SMP Utilization Manager with its SMP servers and sells the product separately to use with other servers. If you need to run multiple applications on a large NT-based SMP system, consider SMP Utilization Manager to maximize your performance.

SMP Utilization Manager 2.5
Contact: NCR * 937-445-5000 or 800-225-5627
Web: http://www.ncr.com
Price: Bundled free with NCR's SMP servers; sold separately for $500
System Requirements: Windows NT 4.0, Standard Edition or Windows NT 4.0, Enterprise Edition, Intel-based multiprocessor, PCI support for interrupt-affinity feature