Periodically, you'll hear a die-hard anti-Windows NT type claim that NT isn't scalable. Apparently, the National Center for Supercomputing Applications (NCSA) hasn't heard this message: NCSA at the University of Illinois at Urbana-Champaign built a 192-processor cluster of commercially available Intel-based HP and Compaq workstations running unmodified NT Server 4.0 Service Pack 3 (SP3). This NT cluster will join other supercomputers to support supercomputing applications for NCSA and the National Science Foundation (NSF), which also funds the National Computational Science Alliance.
NCSA is the lead institution in the Alliance, which is composed of researchers from more than 50 institutions. The Alliance already has other supercomputers, such as Silicon Graphics' CRAY Origin2000. However, as Robert Pennington, team leader of NCSA's NT Cluster Group, explained, "We saw the coming importance of NT and Intel, and we needed to give users a choice." Although most scientific applications that solve complex mathematical and scientific problems are UNIX-based, some users have come to NCSA with NT-based codes that NCSA has needed to support. The purpose of the NCSA's NT cluster is to support existing UNIX applications from other supercomputers and to demonstrate that NT and Intel technology are viable for high-performance computing. Currently, about 100 researchers and programmers run applications dealing with astrophysics, environmental hydrology, and numerical relativity on the NCSA NT cluster, but Pennington said he expects this number to grow rapidly as the system joins other supercomputers in the Alliance. "We expected to see more emphasis on NT from technical and scientific users—and this is what is beginning to happen."
Building with Basics
The NCSA cluster's key feature is the commercially available and unmodified hardware and software components. The computers in the cluster—64 HP Kayak XU PC Workstations with 300MHz Intel dual-Pentium II processors and 32 Compaq Professional Workstation 6000s with 333MHz Intel dual-Pentium II processors—run NT Server 4.0 with SP3. As Figure 1 shows, these computers work together as one virtual server with 192 processors and 50GB of RAM. Myricom's Myrinet high-speed System Area Network (SAN) provides an 80MBps hardware connection between the computers in the cluster, and a 100Mbps Fast Ethernet connection links the cluster to the storage area.
The only component of the NT supercluster that isn't made of off-the-shelf products is the High Performance Virtual Machines (HPVM) set. Andrew Chien, Science Applications International Corporation chair professor in the Department of Computer Science and Engineering and leader of the Concurrent Systems Architecture Group at the University of California at San Diego, developed the HPVM set. Chien designed the HPVM set of software tools to unite groups of ordinary desktop computers into one high-performance environment.
Working Out the Bugs
Although most of the network components are off-the-shelf products, building the cluster to run the applications presented some problems. Commenting on obstacles to the project's success, Pennington said, "We had a sizeable list \[of problems\] that we worked with Dr. Andrew Chien to resolve." (For more of Pennington's comments on the cluster project, see "An Interview with Robert Pennington.") NCSA had to determine how applications designed to run on 4, 8, or 32 processors would scale to 192 processors. Native UNIX applications required migration to the NT platform. NCSA also needed to provide users access to the cluster, ensure adequate application storage space for users, and make sure the servers had adequate processor power for the demanding applications. Finally, NCSA had concerns about upgrading 96 computers in the future.
Running Applications on the NT Cluster
Preparing applications for the NT supercluster required a combination of tweaking and migration to make them run. As Pennington expected, some applications required redesigning to provide better scalability. Migrating the applications from UNIX to NT was a challenge because the system calls that UNIX uses don't always exist in NT, and UNIX sockets and NT sockets have some differences. To resolve these problems, NCSA applied Cygnus Solutions' Cygwin toolkit, composed of GNU development tools and utilities for 32-bit Windows, to port the UNIX applications to NT.
Applications that run on a supercluster are apt to be demanding—to put it mildly. To reduce competition for network and other resources, NCSA used two components of Platform Computing's Load Sharing Facility (LSF) Suite integrated with the HPVM. LSF's batch capabilities make the 96 processors in the cluster accessible to users. To reduce competition for bandwidth and processing cycles, LSF's scheduling capabilities assign one or two user processes per dual-processor box, thus ensuring that the applications have all the memory and processor time they need.
Providing User Access
All the computing power in the world is useless if you can't get to it, so the NT Cluster Group implemented a server running NT Server 4.0, Terminal Server Edition and Citrix MetaFrame 1.0 to provide outside researchers access to the cluster. MetaFrame includes a UNIX client, so outside users—more likely to use UNIX—don't need NT on their client computers. (A second server that NCSA reserves for internal access to the cluster runs only Terminal Server.) Most users access the supercluster from the Internet. The Terminal Server session displays the NT desktop and uses the server's computing power to run applications.
| SOLUTION SUMMARY |
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign has developed a 192-processor cluster of HP Kayak XU PC Workstations and Compaq Professional Workstation 6000s running unmodified Windows NT Server 4.0 with Service Pack 3 (SP3).
Myricom's Myrinet System Area Network (SAN) connects the computers in the cluster. The cluster runs a few native NT applications and many UNIX applications, so the NT Cluster Group had to port the UNIX applications to NT or support them from NT. Cygnus Solutions' Cygwin toolkit was useful for porting the UNIX applications to NT. Because users typically connect to the cluster via one of two servers running NT Server 4.0, Terminal Server Edition and Citrix MetaFrame, the clients don't require NT support. Whether a user runs UNIX or NT on the client computer, the Terminal Server session displays the NT desktop and supports all application development.
This cluster will join other supercomputers to support supercomputing applications for NCSA and National Science Foundation (NSF). NCSA has produced a cost-effective supercomputer that can do the job of a multimillion-dollar machine for about $200,000.
Providing an NT-based high-performance file system comparable to that available in the NCSA's non-NT supercomputers is quite a trick. The cluster uses the computer's hard disk space only for storing temporary files and the OS, not for file or application storage. A couple of file servers (a 128GB disk array and 9GB SCSI disks on an NT server using disk striping without parity to reduce read times) provide the storage space that connects to the cluster via 100Mbps Fast Ethernet. Inside the cluster, a Myrinet SAN connects the computers, using 32-bit PCI NICs and SAN cables to provide a physical backbone with a bandwidth of just less than 80MBps. Therefore, the computers within the cluster can communicate with one another much more quickly than they can communicate with the data storage area. Pennington said, "We can use the standard Microsoft file-sharing capabilities within the cluster for a common namespace, but the storage performance using this \[configuration\] over Fast Ethernet is inadequate. We are still working on the high-performance storage issues."
Hoping to alleviate the performance problem, the NT Cluster Group is investigating fibre channel. Fibre channel might provide a backbone for a viable storage solution and supplement to the Myrinet SAN that can bring the performance level of the storage area connection up to meet the applications' needs.
Upgrading Cluster Components
A long-term problem that the NT Cluster Group faces is that of physically upgrading all the computers in the cluster. Figuring that the useful life span of the computers in the cluster is 12 to 18 months before they become obsolete, the NT Cluster Group had to make sure that it could upgrade the computers easily. Successive generations of hardware don't always fit in the same physical space, so a solution to join the computers was necessary. The NT Cluster Group found that the rack-mount kit that Compaq offers for the NCSA cluster workstations provides more flexibility than using tower units on racks with cables connecting the workstations.
Bigger, Farther, Faster
NCSA is a research facility; thus, the NT supercluster remains a work in progress. Ongoing concerns include the slow I/O bandwidth to the storage systems and the problem of integration with the predominantly UNIX supercomputing environment at NCSA (for which the NT Cluster Group is testing the Windows NT Services for UNIX Add-on Pack).
A short-term goal is to expand the cluster to 512 processors by year 2000. A longer-term goal is to add the NT cluster to an Alliancewide computational grid interconnecting all supercomputers in the Alliance. (Most supercomputers in the research community are currently isolated at individual sites.) The grid will eliminate the significance of a computer's physical location.
NCSA plans to connect the NT Cluster Group's 192-processor cluster with Chien's HPVM 128-node (256-processor) NT cluster. Connecting these clusters presents an interesting problem, because light takes 20 milliseconds to travel between the two sites and, as Pennington said, 20 milliseconds is "a long time for applications of this class." The application structure will determine how well an application runs across even a very high-speed WAN. NSCA and the Alliance are experimenting with the NSF's Very High Speed Backbone Network to determine whether its OC-12 (622Mbps) channel creates a viable connection between the two sites.
Although the supercluster logistics are complicated, the long-term goals are straightforward. Pennington said, "Our goal is to someday have an NT-based supercomputer as a fully supported platform at NSCA, offering the national user community a real alternative to traditional supercomputers." Most administrators won't implement clusters of 100 nodes any time soon, but we can all dream.