Is 4 + 4 > 8?

Your pen tip hovers over the signature line on a purchase order for an 8-way system. Suddenly you are overcome by indecision, and your pen quivers as your hand starts to shake. Is an 8-way system the right investment for your company and your applications? Will it give you the performance you need? A bead of sweat forms on your temple and slides down your cheek. A cold shiver runs down your spine, and you slowly place the pen on your desk. You realize that purchasing an 8-way system is no simple decision--you need answers before you can put ink on paper.

Is an 8-way system the right investment for your company and your applications?

If you believed Microsoft's promise of scalability, you wouldn't hesitate to sign the purchase order. In 1997, Microsoft held its Scalability Day and promised scalability of 8-way systems with the Windows NT Server 4.0, Enterprise Edition family of products, with support for 16-way systems not far behind. I remember this promise clearly because I was in the audience.

Has Microsoft kept its promise? The answer depends on whom you ask. At a recent Windows NT Magazine Professionals Conference, a Microsoft repre-sentative proudly announced that Microsoft had achieved good scalability with 8-way systems, citing Transaction Processing Council (TPC) test results that showed a 65 percent performance improvement for 8-way vs. 4-way systems. This number surprised me because the Windows NT Magazine Lab tests haven't shown such an improvement. I decided to analyze Microsoft's data.

Microsoft Math
Microsoft compared a 4-way system running NT Server and SQL Server with a 4-way system running NT Server, Enterprise Edition (NTS/E) and SQL Server, Enterprise Edition (SQL/E). As Figure 1, page 72 shows, the standard NT Server and SQL Server configuration generated a score of 9800 transactions per minute (tpm). The NTS/E and SQL/E configuration generated a score of 12,100tpm, a 23 percent improvement. The reason for this dramatic improvement is NTS/E and SQL/E's very large memory (VLM) support. The 4-way system Microsoft tested had 4GB of RAM, which NTS/E and SQL/E fully utilized.

Microsoft then pitted the NT Server and SQL Server results (9800tpm) against an 8-way system with 4GB of RAM running NTS/E and SQL/E. The 8-way NTS/E and SQL/E configuration scored much higher. Figure 2, page 72, shows the result: 16,200tpm. This number is a whopping 65 percent improvement over the 4-way system--pretty impressive.

However, Microsoft's comparison was unfair. Microsoft's results included not only the improvement of an 8-way over a 4-way system but also the improvement you get with VLM support. If you can upgrade your 4-way system to 4GB of RAM and run NTS/E and SQL/E for a 24 percent performance improvement, why buy an 8-way system? Upgrading a 4-way system is far cheaper than buying an 8-way system.

I still wondered about the true difference between a 4-way and an 8-way configuration (without VLM support). Fortunately, Microsoft provided the pertinent information, so I just needed to do some math. The 4-way system with 4GB of RAM running NTS/E and SQL/E produced 12,100tpm. The 8-way system with 4GB of RAM running NTS/E and SQL/E produced 16,200tpm. The increase between these two configurations is 34 percent.

Yes, that's right. Using Microsoft's own tests, the performance improvement between a 4-way system and an 8-way system is only 34 percent. And you can bet that Microsoft reports its best-case test results, so your performance improvement will vary depending on the applications you run.

The Plot Thickens
Like you, I take Microsoft's benchmark numbers as gospel and don't bother running independent tests to verify the results. Yeah, right.

The Lab Guys and I ran a series of tests in the Lab to verify Microsoft's benchmarking numbers. Unlike Microsoft, we didn't use separate 4-way and 8-way systems. We tested two 8-way systems: Data General's AViiON 8600, reviewed on page 74, and NCR's WorldMark 4380, reviewed on page 78. Both systems had 4GB of RAM and ran NTS/E. To test the difference between 4-way and 8-way systems, we disabled (via software configuration) some processors in each system. We tested three configurations: 4-way, 6-way, and 8-way. (For other 8-way systems, see "Vendor List for 8-Processor Servers.")

In all fairness to Microsoft, the tests we ran on the AViiON 8600 showed performance improvement similar to Microsoft's TPC benchmarks for 4-way and 8-way systems. The AIM Domain Server Mix test showed an improvement of 31 percent for sustained performance in the 8-way configuration and a 50 percent improvement for peak performance. (For more information about AIM Technology's tests, see "AIM Technology Server Benchmark Test," page 76.) Sustained performance is the key measurement because it represents the workload a system can handle under stress. Our sustained performance result of 31 percent for the AViiON 8600 8-way configuration isn't far from Microsoft's TPC benchmark of 34 percent.

A more interesting finding was the spread of improvement between 4-way, 6-way, and 8-way configurations. We discovered that the greatest performance improvement is between a 4-way and a 6-way configuration. The AIM WNT Sustained Performance test showed a 22 percent improvement from a 4-way to a 6-way system and a 31 percent improvement from a 4-way to an 8-way system. That's only a 7 percent performance improvement between the 6-way and 8-way configurations. The AIM WNT Peak Performance test showed a 32 percent improvement from a 4-way to a 6-way system and a 50 percent improvement from a 4-way to an 8-way system. Again, only a marginal performance improvement of 14 percent between the 6-way and 8-way configurations. Based on these numbers, a 6-way system is more impressive than an 8-way system.

Our tests also showed that 8-way configurations are not uniformly scalable. Our AViiON 8600 test results were similar to Microsoft's TPC tests, but our WorldMark 4380 test results didn't measure up. The AIM WNT Sustained Performance test showed a 5 percent improvement from a 4-way to a 6-way system and an 11 percent improvement from a 4-way to an 8-way system, which is a performance gain of only 6 percent between 6-way and 8-way configurations. The AIM WNT Peak Performance test showed a 13 percent improvement from a 4-way to a 6-way system and only a 12 percent improvement from a 4-way to an 8-way system (performance dropped between the 6-way and 8-way configurations).

The WorldMark 4380 has a different multiple-processor architecture than the AViiON 8600 has. Based on our test results, the WorldMark 4380 is better suited for a multiple-application environment than for scaling up one application.

When I contacted Microsoft to discuss 4-way, 6-way, and 8-way performance, the company reported a slightly different spread than our test results showed. According to Microsoft's TPC tests, the performance difference between a 4-way and a 6-way configuration is 17 percent. The difference between a 4-way and an 8-way configuration is 34 percent, which yields a 15 percent spread between a 6-way and an 8-way system. Although 15 percent isn't phenomenal, it's better than the 7 percent difference we saw on our AViiON 8600 sustained performance tests.

Microsoft's TPC benchmarks focus on NTS/E and SQL/E. In contrast, the AIM tests we ran focused on the operating system (OS), which was NTS/E, rather than on BackOffice, Enterprise Edition components (e.g., SQL/E and Exchange, Enterprise Edition). We ran preliminary BackOffice benchmarks to verify that they report the same performance trends we found.

Now What?
If you're considering signing a purchase order for an 8-way system, a performance improvement of 34 percent will most likely give you pause. After all, 8-way systems aren't 34 percent more expensive than 4-way systems. In fact, an 8-way system currently costs at least two or three times the price of a similarly configured 4-way system.

Eight-way systems aren't economic bargains. However, they are practical in a few situations.

You might run an application that scales well to an 8-way environment. Benchmark tests don't reflect the behavior of every application. Some applications probably scale better than 34 percent in 8-way configurations. Before upgrading to an 8-way system, have your software or hardware vendor prove to you that your applications scale to an 8-way environment.

Perhaps you need every drop of performance you can squeeze out of a system. You might have a monolithic application that you can't partition or spread across multiple systems (e.g., a huge database). If you've reached your 4-way system's performance limit, a 34 percent performance increase might look good at any price.

Maybe you want to consolidate your servers. You can effectively use an 8-way system to consolidate multiple servers into one server, provided that you establish processor affinities for the applications you want to run. If you load your applications and let NT handle the processor management, you're asking for trouble. You'll get better performance if you set affinities for your applications (e.g., you assign two processors to Internet Information Server--IIS, four processors to SQL Server, and two processors to NT). You can set processor affinities through the Registry or with third-party software tools.

You might run applications that aren't CPU intensive. Scalability works differently with different applications. If you have an application that is disk and network bound, you can improve performance if you use an 8-way system with processor affinities tuned to balance OS performance, application performance, disk I/O, and network I/O. In this case, having more processors lets you dedicate processor resources to specific bottlenecks in your applications. To take advantage of extra processors, you must know how to optimally tune your applications.

If none of these situations applies to you, I recommend that you deploy 4-way or 6-way systems with 4GB of RAM running NTS/E. New hardware technology (i.e., new processors) will enter the 4-way and 6-way market long before it reaches 8-way systems. The performance difference between 4-way, 6-way, and 8-way systems will proba-bly decrease when the new 4-way and 6-way systems hit the market (and increase when the same technology reaches 8-way systems).

You can't cure all your performance problems with an 8-way system. These systems have a place in the NT industry, but they might not have a place in your environment. Only you can decide whether your application environment will benefit from the increased power of eight processors or whether eight processors can fix bottlenecks in your applications. If you decide your environment needs an 8-way system, go ahead and sign that purchase order. Otherwise, consider the price and performance benefits of the new 4-way and 6-way systems.

AViiON 8600

If you need the computing power of an 8-way symmetric multiprocessing (SMP) server, consider Data General's AViiON 8600. This easy-to-use, enterprise-level server uses Adaptive Memory Crossbar architecture to scale as many as eight 200MHz Pentium Pro processors and 8GB of RAM.

Special Delivery
The Windows NT Magazine Lab's test unit came with the maximum eight 200MHz Pentium Pro processors, 1MB Level-2 cache, 4GB of RAM, a 3.5" drive, a CD-ROM drive, an internal IDE 4GB hard disk (boot disk), an STB Nitro 3-D video graphics card, 10 Seagate Cheetah 9GB SCSI hard disks, and a Mylex DAC960 Disk Array Controller. Data General offers various configurations for its AViiON servers, such as two processors to eight processors, 128MB to 8GB of RAM, and space for multiple CLARiiON disk arrays.

AViiON 8600
Contact: Data General * 508-898-5000
Web: 8600_enterprise_server.html
Price: $125,000
System Configuration: Eight 200MHz Pentium Pro processors, 1MB Level-2 cache, 4GB of RAM, STB Nitro 3-D video graphics card with 2MB of VRAM, Ten Seagate Cheetah SCSI 9GB hard disks, Mylex DAC960 Disk Array Controller

AViiON 8600's cabinet is 73" tall and 30" deep. The unit's footprint isn't much larger than a four-processor server box, but you'll need extra room for its height. The oversized cabinet has a rigid metal rack for mounting equipment parts. The components slide out on metal rails, and the modular design provides plenty of space to let you easily change components. Each major component has an independent power supply and a cooling subsystem that plug into a built-in power strip on the inside of the cabinet.

Two rear doors provide access to the major components. The top portion of the unit's frame holds the CPU Module Assembly, a metal drawer that houses the motherboard and slides horizontally out the back of the unit. The CPU Module Assembly has eight PCI expansion slots and four P6 module slots. Each P6 slot holds a PCI-type card that contains two of the unit's CPUs. Under the CPU Module Assembly is the memory board that contains the DIMM slots filled with 4GB of RAM. The CPU Module Assembly is identical to the one in Axil Computer's Northbridge NX801. (For more information about the Northbridge NX801, see Carlos Bernal, "Northbridge NX801," April 1998.)

AViiON 8600's impressive components help Data General provide functional 8-way processing. With the CPU Module Assembly pulled out, we easily installed four Digital Equipment DE500 Fast EtherWORKS PCI 10/100 adapters to connect to the Lab's benchmarking network.

Raising the Crossbar
Scaling a system beyond four Pentium Pro processors presents special challenges. For example, a P6 system bus supports only four processors. Axil designed the Adaptive Memory Crossbar architecture to support two parallel P6 system buses. This architecture lets you build six-processor and eight-processor systems. Data General uses Adaptive Memory Crossbar technology to give the AViiON 8600 eight-processor capability.

Axil's Adaptive Memory Crossbar architecture uses standard Intel 450GX PCI bridges that connect the two P6 buses and their PCI buses. The Adaptive Memory Crossbar architecture has a high-performance Synchronous DRAM (SDRAM)-based memory subsystem, an address reorder buffer, and balanced P6 bus bridge I/O architecture to enable faster transfer speed than a 4-way configuration offers.

A high-performance SDRAM-based memory subsystem offers sustained memory bandwidth of up to 1.066GB per second (GBps). The Adaptive Memory Crossbar architecture's memory system has 16 interleaved memory banks, providing eight times as much memory as a typical Pentium Pro system. Two application-specific integrated circuit (IC) chips implement the Adaptive Memory Crossbar architecture. The data chip switches between two data buses and a third bus connected to the memory banks. The address chip controls data switching, checks coherence, and routes transactions between the bridged P6 buses.

A typical memory controller processes read and write requests in the order it receives them. An Adaptive Memory Crossbar controller performs read operations first and delays write instructions, thus increasing system speed. The typical memory controller often stalls during read requests because it is waiting for write request data. The Adaptive Memory Crossbar design reorders transactions and lets read transactions complete before starting the write transactions.

The Adaptive Memory Crossbar's address reorder buffer lowers the overhead on the memory system and P6 buses. Memory requests go through this buffer, and the memory controller sends the requests to free memory banks. When a bank is busy, the buffer lets the controller prioritize requests and reorder them to optimize bandwidth use, thus increasing application performance as much as 30 percent to 40 percent.

Standard Intel 450GX PCI bridges connect the two P6 buses and four PCI buses. Two PCI buses are for add-on cards, offering four PCI card slots per bus. The other two PCI buses are for built-in Ultra SCSI channels. This balanced design provides a disk-to-memory speed of 100MB per second (MBps)--enough bandwidth to support most enterprise applications.

AViiON 8600 came preconfigured with Windows NT Server 4.0, Enterprise Edition (NTS/E). One of Data General's engineers helped us install and configure the system. We reinstalled NT and the AViiON 8600's configuration software to test the ease of use. For testing purposes, we divided the 10 hard drives into four logical drives and configured the drives as RAID 0.

AViiON 8600's installation and setup is impressive. Even without documentation, we easily identified the major components and power supplies (the unit requires a 220V outlet). NTS/E installation went smoothly. To connect to the Lab's domain, we used TCP/IP as the network protocol and assigned fixed IP addresses to the four network adapter cards. Complete documentation for the Mylex controller helps you set up the system quickly.

AViiON 8600 includes a five-page Technical Note for installing the hardware abstraction layer (HAL) and various drivers, and manuals for NuView ManageX, Data General's NT enterprise system management software. Data General needs to include a reference guide with detailed information such as component descriptions, diagrams, technical information, installation instructions, and troubleshooting tips.

Tortoise or Hare Performance?
To test AViiON 8600's file and print services performance, we ran the AIM Technology Domain Server Mix tests three times, using 4-way, 6-way, and 8-way processors. These tests simulate domain server tasks, including light file transfers; network routing; packet forwarding; email; and shared applications, such as spreadsheets, word processing, and network maintenance. (For more information about AIM Technology's tests, see "AIM Technology Server Benchmark Test.")

AViiON 8600 had a WNT Peak Performance of 3842.5 and a WNT Sustained Performance of 2932.9 for 4-way processors. With 6-way processors, performance increased to 5083.6 Peak Performance (up 32 percent) and 3584.6 Sustained Performance (up 22 percent). With eight processors, Peak Performance was 5747.4 and Sustained Performance was 3850.0. Peak Performance was 50 percent higher for 8-way processors than for 4-way processors. Sustained Performance was only 31 percent higher for 8-way processors than for 4-way processors. After hearing Data General's advertised performance claims, we expected a huge performance jump for 8-way processors. We were disappointed. A 31 percent performance increase is good, but 8-way processors can cost as much as two or three times the price of 4-way processors. Not exactly a bargain.

The Verdict Is In
Data General's AViiON 8600 is the best-performing system we've tested in the Lab. However, in our tests, the system did not perform as well as advertised. AViiON 8600 is stable for a variety of applications, including the Internet, e-commerce, collaborative computing, and data warehousing. Certain situations might call for an 8-way system, but AViiON 8600's performance does not offset its cost.

If you need a high-powered system that can handle large network capacity, set realistic expectations and consider your options carefully. Data General's AViiON 8600 is expensive, but it might be the system for you.

WorldMark 4380

NCR has been involved in the multiprocessor Intel market since the early 1990s, with the System 3000 family. The WorldMark 4380 is NCR's 8-way Intel processor. (For a review of NCR's 4-way processor, see Carlos Bernal, "WorldMark 4300," February 1998.)

Out of the Box
As I unpacked the WorldMark 4380 8-way symmetric multiprocessing (SMP) system, I noticed its office-friendly design. NCR describes the WorldMark 4380 as a desk-side system. The unit is slightly larger than a two-drawer file cabinet, at 27.5" * 18" * 29.5". It requires only a standard 110V outlet. A typical office can easily accommodate the system.

WorldMark 4380's internal components are readily accessible. You can remove both sides of the cabinet to reveal the quad-processor system boards, each with a system bus, memory, PCI, and EISA slots. The primary system board includes a 1MB super VGA controller and two Adaptec Ultra SCSI channels. The secondary system board has a third Adaptec Ultra SCSI channel.

The system is expandable. It has 14 PCI slots, three EISA slots, and a shared PCI and EISA slot; twelve 3.5" hot-swappable drive bays (using the 80-pin SCA connector); and four half-height removable media drive bays (in addition to the 3.5" drive). The processor has a standard interrupt controller that supports 16 interrupts, plus two hardware interrupt controllers, for a total of 48 available interrupts. The two system processor boards support 4GB of 256MB DIMMs. The system I tested in the Windows NT Magazine Lab had eight 200MHz Intel Pentium Pro processors with 1MB cache, 4GB of Error-Correcting Code (ECC) RAM, a 3.5" drive, a CD-ROM drive, an Exabyte Eliant 820 8mm tape drive, an SMC EtherPower 10/100 dual-channel Ethernet card, and a Mylex DAC960 RAID Disk Array Controller with six Seagate Cheetah low-profile 4GB Ultra SCSI hard disks.

NCR's OctaSCALE (its Non-Uniform Memory Access--NUMA--design) is a dual system board architecture. The memory controller on each system board includes NCR's Intelligent Locality Management System (ILMS), which arbitrates memory access for local CPUs and initiates memory access on the other system board when necessary. A CPU accesses memory on the far system board more slowly than on its own system board (30 or more clock cycles vs. 10 clock cycles, per OctaSCALE's white papers). The OctaSCALE architecture's dual system buses help you run multiple applications. If you use NCR's SMP Utilization Manager (included with the WorldMark 4380), you can process an application on one system bus and keep other applications on the other system bus.

WorldMark 4380
Contact: NCR * 937-445-5000 or 800-225-5627
Price: $109,050
System Configuration: Eight 200MHz Intel Pentium Pro processors, 4GB of RAM, Windows NT Server 4.0, Enterprise Edition, Six Seagate Cheetah ST34501 4GB 10,000rpm hard disks, Mylex DAC960 Disk Array Controller, SMC EtherPower 10/100 dual-channel Ethernet adapter, SMC EtherPower II 10/100 single-channel Ethernet adapter, Toshiba 12X CD-ROM drive

Reliability, Availability, and Service
An enterprise-level server must be reliable and available. The WorldMark 4380 is both. You can configure it with two or three hot-swappable 625-watt power supplies. (The review system had two.) The primary system board has sensors for temperature, voltage, and fan failure. NCR's server management software supports these sensors. The system board supports the chassis intrusion detection switches on the side panels and on the retainer bracket that protects the hot-swappable drives. A separate Server Management Board monitors the server's condition and uses a modified version of LANDesk Server Manager to support dial-out notification of critical events and dial-up control of server operation.

NCR provides a standard service agreement: onsite service during business hours for 1 year. You can purchase expedited and 24-hour, 7-day-a-week service plans.

ValuePlus CD-ROM
Hardware is only as good as the applications it runs. NCR's ValuePlus CD-ROM includes the following utility applications to help you get the most from the WorldMark 4380.

PowerMon II. PowerMon II monitors the status of a UPS attached via a serial port. It logs power events and can initiate a system shutdown if it detects a power failure.

Server Manager. Server Manager supports NCR's entire 4300 family of servers. The Server Manager Console runs on any Windows NT system and accesses the Server Manager software on network or dial-up servers. Servers can belong to a logical cluster, which lets you use one icon to monitor a group of servers. Server Manager includes several agents to support specific hardware components, such as Intel LAN adapters, Adaptec SCSI controllers, Mylex RAID controllers, and APC UPS systems. You can configure Server Manager to send alerts to a Simple Network Management Protocol (SNMP) management console or via email or pager.

Server Manager/Remote. Server Manager/Remote includes server and client modules. You can use Server Manager/Remote to remotely configure and troubleshoot your system.

SMP Utilization Manager. If you execute tasks on multiple processors, you generate extra overhead from switching between processors and reloading the cache. If you execute related tasks on one processor, you eliminate this overhead and thus improve the SMP system's efficiency. By default, NT lets processes and threads run on any available processor. SMP Utilization Manager lets you select which processors are available to run specific processes, threads, or interrupts. NCR recommends that you assign one processor to service interrupts and the driver for LAN cards and SCSI adapters. NCR further recommends that you assign specific groups of processors to major applications running on one server, such as SQL Server and Exchange.

WAN Links for Windows NT. You can use WAN Links for Windows NT with a supported Digi adapter to support Routing and Remote Access Service (RRAS) client connections over frame-relay and X.25 networks.

NCR Enterprise Pack
NCR offers the Enterprise Pack for an additional cost. This package bundles the following support software.

LifeKeeper 2.0. LifeKeeper 2.0 is NCR's server clustering solution. (For more information about this software, see Jonathan L. Cragle, "Clustering Software for Your Network," July 1998.)

Master Minder. Master Minder is an automated systems management package. This software monitors event logs and performs predefined actions in response to specific events.

NCR provides several useful manuals. The Server Software Guide is a thorough hardware configuration reference that describes the system BIOS, the Diagnostic Partition, Adaptec SCSI BIOS, and Mylex array configuration. Other documentation includes Optimizing Windows NT on NCR Servers, Deskside Hardware Installation Guide, Installing Windows NT Server, and Installing UNIX MP-RAS.

I was impressed with Optimizing Windows NT on NCR Servers. This highly technical reference includes information about tuning the network transport and core NT components. It also covers application tuning for SQL Server, Exchange Server, Lotus Notes, and SAP R/3. Finally, the text explains how to use SMP Utilization Manager to allocate system processors to applications and processes.

NCR provides a system site log notebook with each system. This convenient reference is reminiscent of legacy systems' field engineering logs. It contains hardware and software configuration information, problem and change history logs, and other valuable information for your operations support staff.

System Performance
To evaluate the WorldMark 4380's performance, I ran the AIM Technology Domain Server Mix tests, using 4-way, 6-way, and 8-way processors. (For more information about AIM Technology's tests, see "AIM Technology Server Benchmark Test.")

The WorldMark 4380 had a WNT Peak Performance of 3483.8 and a WNT Sustained Performance of 3262.7 for 4-way processors. With 6-way processors, performance increased to 3932.1 Peak Performance (up 13 percent) and 3424.5 Sustained Performance (up 5 percent). With 8-way processors, Peak Performance was 3904.4 and Sustained Performance was 3631.4. Peak Performance was only 12 percent higher for 8-way processors than for 4-way processors. Sustained Performance was only 11 percent higher for 8-way processors than for 4-way processors. These results are nothing to write home about.

System performance and throughput depend on the workload you process. My tests did not simulate multiple-application (e.g., SQL Server and Exchange Server) workloads. NCR is positioning the WorldMark 4380 as a multiple-application system. For example, SMP Utilization Manager is most valuable for running multiple applications. WorldMark 4380 is a solid system that provides good performance in a small package that you can easily expand. This 8-way system is beneficial in certain situations, but many IS managers will want to keep looking, or stick with their 4-way systems.