Is an 8-way server worth the premium price?

When your organization outgrows its server, you might need to make a choice: Implement a network of smaller servers or upgrade to a system that can handle your entire workload. An 8-way server might be able to handle your applications' crucial requirements, but when your chief financial officer (CFO) sees the price, you'll need to justify the expense. The Windows 2000 Magazine Lab tested Compaq's ProLiant 8000 to help you decide whether the system is worth the expense. The system lives up to its promises of improved power, memory, and storage, and I'll hate to send the server back.

Hardware Design Features
The basic ProLiant 8000 system has four processors, 2GB of Synchronous DRAM (SDRAM), a 4250ES RAID controller, and no OS. We tested a system that had eight 550MHz Pentium III Xeon processors with a 2MB Level 2 cache, 4GB of 100MHz SDRAM, twenty-one 9GB 10,000rpm-class disk drives, and a Compaq 4250ES RAID controller.

I remember the refrigerator-sized Pentium Pro processor-based 8-way servers, so I was pleasantly surprised to receive a system that is 24" x 17.5" x 24.5", rack mountable (14U form factor), and on casters. Three disk-storage bays are in the front of the cabinet. Each bay holds seven 1" hot-swappable disk drives. The combination CD-ROM and 3.5" drive is near the bottom of the system, so you can reach the drive when you mount the system at the top of a rack. The system includes two other front-accessible, half-height non-hot-swappable device bays. An LCD status panel—the Integrated Management Display—shows the system's power-on state and displays hardware alerts that the system firmware generates. The power status light is green when the system's power is on, and yellow when the power supplies are plugged in and the system detects a hardware fault. During our tests, the power status light turned yellow when a processor card wasn't properly inserted into its slot.

The system components you use most frequently are easy to access. Three lockable hinged doors in the top of the cabinet provide access to two hot-swappable cooling fans and the system's 11 PCI slots. Slot 1 is the only 32-bit PCI slot in the system; the other slots are 64-bit PCI slots. All the slots are 33MHz except slots 10 and 11, which are 66MHz slots. In addition, slots 10 and 11 have an extended SCSI connector to support Compaq's cableless RAID controller and SCSI connections. All PCI slots are hot-swappable and include individual power switches that let you turn off power to a slot when you insert or replace a card while the system is running. Windows NT 4.0 lets you replace a failed card, and Windows 2000 (Win2K) also supports new card insertion and autodetection during system operation.

The system includes three bridged PCI buses. PCI slots 1 through 4 are assigned to the primary bus, slots 5 through 9 are assigned to the secondary bus, and slots 10 and 11 are assigned to the tertiary bus. The slots are all tool-less. A cleverly designed plastic retainer clip rotates into place to secure a card. When necessary, you can use standard screws to secure a PCI card in place. You can see three system interlock LEDs through the open PCI slot covers. You use the LEDs and troubleshooting information from the Compaq ProLiant 8000 Setup and Installation Guide to diagnose power-related component and system-board interconnect cabling failures in the system.

Thumbscrews on the front of the system hold each of the side panels in place. The panel on the right side of the unit (viewed from the front) provides access to the CPU slots and the memory board. A plastic cover channels cooling airflow through the radiators affixed to each CPU. A separate airflow path cools each group of four processors, and a sealed liquid cooling system carries heat to 24 cooling fins on the end of each processor. The cooling system's design creates a longer, thinner CPU package than those which have the cooling fan mounted directly on the side of the processor. Two system cache accelerators (Intel calls them cache coherency filters) sit between two groups of four processors along with the Profusion chipset. More than one processor can have a copy of the same memory address in its Level 2 cache. The cache accelerators for each CPU bus retain a list of the memory addresses in each CPU's Level 2 cache and indicate whether several caches share the data and whether the data has been modified. Because of these lists, CPUs snoop the processor caches on the remote processor bus less frequently, reducing processor bus utilization and improving system performance.

The system memory board is located just below the CPUs. The board includes 16 DIMM sockets, paired to create eight banks. The board can support a maximum of 8GB of 100MHz SDRAM, and Compaq promises future support for 16GB. You can loosen a thumbscrew to easily remove the memory board's retaining bracket.

You access the two non-hot-swappable device bays through the right side panel. You use the preinstalled SCSI cable with a separate SCSI controller to support SCSI devices installed in the device bays. If you use one of the embedded SCSI channels for non-hot-swappable devices, you can't use the same SCSI channel for the corresponding hot-swappable storage bay. You don't need to remove the left side panel; it conceals only internal cables and power supply bays. The two side panels have identical diagrams detailing processor and memory slot usage.

The back of the server provides access to the three redundant power supply slots and a pair of redundant rear-mounted processor cooling fans. Connectors for the embedded VGA controller and the serial, printer, mouse, and keyboard ports are also on the rear. Cutouts for externally accessible SCSI connectors are located between PCI slots 4 and 5.

Several features support system availability. A new feature of the ProLiant 8000 is Auto-Processor Bus Recovery. The system's Profusion architecture places two sets of four processors on separate system buses. If one system bus fails, system operation can continue using only the remaining operational system bus processors. (For a brief description of Profusion architecture, see the sidebar "Profusion Raises SMP to New Levels." For a detailed description, see Tao Zhou, "Profusion Architecture," November 1999.) Redundant load-sharing power supplies, redundant hot-swappable fans, and hot-swappable disk drives have become standard features of high-end servers, and Compaq has added several other redundant components. CPU power supplies and remote-flashable system ROMs are standard. Also standard is a dual-port 100Mbps Ethernet NIC with support for failover redundancy, which is upgradeable to Gigabit Ethernet with a daughterboard. Even if the primary array controller fails, support for redundant array controllers lets you continue to access RAID-based data.

Every ProLiant 8000 includes the Compaq Smart Array controller model 4250ES. The ES in the model number indicates that the RAID controller has the Extended SCSI connector for cableless installation. The controller has no connectors that would let you use standard SCSI cables to attach disk drives. Three internal cables connect each of the three storage bays to the system I/O board instead of connecting directly to the RAID controller. The extended SCSI connectors in slots 10 and 11 carry the extra signals so you can install hard disks and RAID controllers without using any cables. A battery-backed 64MB cache is standard on the controller, and the battery and cache are on a daughterboard. You can move the daughterboard to a replacement Compaq 4250ES if the controller fails, and the system will complete write operations that were in process when the system failed.

Error-Correcting Code (ECC) cache protects your data from a complete memory-chip failure. The data format that the controller writes to disk drives is compatible with the previous- and next-generation Compaq array controllers and lets you upgrade the array controller without backing up and restoring data on the array. Two onboard processors enhance RAID performance. The array controller checks for data integrity on disk drives in the background and remaps bad sectors to enhance system availability. The controller supports RAID levels 0, 1, 4, 5 and 0+1 (mirrored stripe sets) and array capacity expansion during system operation. Three 80Mbps Ultra 2 SCSI channels support up to 240Mbps peak throughput.

Systems Management
System components, including the array controller, detect errors and report them to the server's Integrated Management Display and Compaq Insight Manager, a systems management software package that ships with each server. Self-Monitoring, Analyis, and Reporting Technology (SMART) monitors disk drive status and reports failing components. The Compaq Remote Insight Board offers out-of-band remote management, but this tool isn't a standard feature. When system firmware reports prefailure symptoms, the ProLiant 8000's warranty covers prefailure disk drive, processor, and memory replacement. Service and support packages are available in terms covering a standard business week or a 24 x 7 week, with a 3-year parts and labor warranty offering "best effort" onsite service. The warranty can be upgraded to 3 years of standard business week coverage with a 4-hour response time for $3300, or 24 x 7 coverage and 4-hour response time for $4950. Priority NT support is also available.

System Performance
Because the ProLiant 8000 is the first Profusion architecture 8-way system the Lab has tested, we enthusiastically delved into the testing process. We looked for tests that would demonstrate the CPU scalability of the Profusion architecture under NT 4.0. For details of our testing methodology, including the performance tuning changes that we completed before testing, see "ProLiant 8000 CPU Scalability Testing Procedures,", InstantDoc ID 7975.

We ran two benchmark tests, one with a Microsoft SQL Server 7.0 Enterprise Edition workload, and the other with a Microsoft Internet Information Server (IIS) 4.0 workload. Our goal for each test was to maximize CPU utilization and measure application throughput for all eight processors. We also looked for performance differences between the Profusion architecture and Intel's 450NX PCIset chipset.

SQL Server 7.0 Testing
For this test, Client/Server Solutions, working with Microsoft's SQL Server group, used Microsoft's Transaction Processing Council (TPC)-C Data Generator to create an online transaction processing (OLTP) workload with a small (approximately 500MB) database. However, the test we ran isn't a TPC-C benchmark. We ran the same set of tests on two similarly configured ProLiant servers: a ProLiant 8000 with eight 550MHz Pentium III Xeon processors with a 2MB Level 2 cache, and a Compaq ProLiant 7000 with four 500MHz Pentium III Xeon processors with a 2MB Level 2 cache. For a detailed description of our testing and tuning procedures, see "ProLiant 8000 CPU Scalability Testing Procedures." Because the two systems' processing power and disk configuration were different, the throughput results aren't directly comparable. Our purpose in testing both servers was to discover whether scalability improved with the 8-way system.

Figures 1 and 2, page 121, show the throughput results for the ProLiant 8000 and ProLiant 7000, respectively. Figure 3 shows CPU scalability, relative to single-processor performance, at each processor level in the test. Figure 3 reveals that with four processors, performance scalability of the Profusion architecture and the 450NX PCIset architecture is nearly identical. When we measured throughput transactions per second (tps), four-processor performance is 3.4 times the throughput of a one-processor system. However, eight-processor scalability measured 5.1, which means that the eight-processor system throughput was just over five times the throughput of the single-processor system. In other words, the eight-processor configuration yielded 50.5 percent more throughput than the 4-way configuration. Other discussions of eight-processor scalability results running different OLTP workloads are also available. You can read Compaq's test results at papers/ecg0580399.pdf.

A previous generation of eight-processor servers was based on Intel's 200MHz Pentium Pro processor. As John Enck reported in "8-way Scalability," September 1998, Microsoft's comparison of 4-way and 8-way throughput with SQL Server 6.5- and 200MHz Pentium Pro-based 8-way systems showed improved throughput of 34 percent in TPC-C results. The Lab's tests with SQL Server 6.5 on 200MHz Pentium Pro-based 8-way systems showed minimal gains in throughput beyond four processors.

Web Server Testing
We ran a second test series based on a simple Active Server Pages (ASP) workload. The transaction randomly generated 1000 characters of output to the client screen. To achieve full CPU utilization, we tuned IIS for the workload our test specified. Thus, our test didn't simulate the real world—the workload was too simple. Figures 4 and 5 show the throughput results for the ProLiant 8000 and ProLiant 7000, respectively. Figure 6 shows the CPU scalability, relative to single-processor performance, at each processor level in the test. Figure 6 shows that the performance scalability of the Profusion architecture and the 450NX PCIset architecture is virtually identical for systems using four processors, just as it was in the SQL Server tests. However, in the IIS and ASP test, the scalability was nearly linear. Four-processor scalability was more than 3.9 times the throughput of a single-processor configuration for the ProLiant 8000 and ProLiant 7000, and eight-processor scalability for the ProLiant 8000 was 7.55 times the single-processor throughput. Compared to 4-way throughput, the eight-processor system improved performance by 92 percent.

Is the 8-Way System for You?
Profusion-based systems are more costly than 4-way systems. Compaq estimates the ProLiant 7000 we tested at $64,742. A similarly configured 8-way ProLiant 550MHz Xeon processor with a 2MB Level 2 cache with 18 hard disks is $100,458. (The ProLiant 8000 we tested had 21 disks.) The same configuration in a ProLiant 8000 with only four processors is $75,914, which costs 17.25 percent more than the similarly configured ProLiant 7000.

In our SQL Server 7.0 tests, eight-processor performance was 50.5 percent better than in the same four-processor configuration. Depending on the application workload, many companies will reason that the improved performance makes the high purchase price worthwhile. If a large monolithic application is straining your 4-way system, the ability to support 50 percent more throughput without additional administrative overhead is worth a lot. However, if you have multiple SQL Server-based applications straining a 4-way system, you could distribute the applications to two 4-way systems. You'd probably obtain greater aggregate throughput and lower your costs if you used two 4-way systems instead of one 8-way system.

The performance of Wintel-based eight-processor systems is improving but still needs work. More performance improvements are ahead in Win2K and the next release of SQL Server. Windows 2000 Server (Win2K Server) offers reduced lock contention and improved partitioning with SMP systems; support for more physical memory (a boon for database applications); and support for offloading some fundamental packet-oriented network functions to the NIC, which will reduce the CPU resources that network communications consume when you use NICs that support these functions. Although SQL Server 7.0 can use a maximum of 4GB of RAM (3GB for SQL Server applications), the next version of SQL Server will support Win2K's larger physical memory, which lets larger databases fit into server memory to improve performance.

In October and November 1999, Hewlett-Packard (HP), Unisys, and Compaq, with Microsoft's support, released TPC-H workload results with 8-way servers running prerelease versions of Windows 2000 Advanced Server (Win2K AS) and SQL Server 7.5. You can view the results at TPC-H testing simulates a decision support workload of ad hoc queries to a database. Although the database size varies with different TPC-H tests, the October and November 1999 tests used a 100GB database. Throughput results ranged from 1125.4 to 1253.3 queries per hour, with price/performance ratios ranging from $241 to $288 per query per hour. In this group, the ProLiant 8000 posted the greatest throughput and lowest per-query cost, beating the HP and Unisys systems.

To create a price comparison, I used Dell's Web-based server configurator to price 4-way and 8-way servers for a configuration including 550MHz 2MB Level 2 cache processors, 4GB of RAM, and eighteen 9GB 10,000rpm disk drives with NT Server 4.0 (NT Server, Enterprise Edition on the 8-way). The Dell PowerEdge 6350 4-way came in at $55,883, whereas the Dell PowerEdge 8450 8-way was $90,882.

The ProLiant 8000 is a premium system with many advanced features to support the high availability and serviceability that crucial applications need. This system will appeal to organizations that don't want the additional administrative costs of multiple small systems. Such organizations might be willing to pay the higher up-front price of an 8-way system to get the most possible processing power. Because of this system's reported processing power and the cost per unit of application throughput, some organizations can consider replacing traditional legacy-system-based applications with the ProLiant 8000.

ProLiant 8000
Contact: Compaq * 800-345-1518
Price: Starts at $58,176; $102,297 as tested
Decision Summary:
Pros: Includes large internal storage capacity (21 hot-swappable drive slots); can be configured with redundant cableless RAID controllers with failover capability; engineered for speedy hardware problem determination with fault-indicating LEDs; provides CPU failure recovery (operation continues with CPUs on alternative bus)
Cons: High price—best fit for large database or data warehouse applications