Will clustering help NT overcome its scalability challenge?

Is Windows NT ready for the enterprise? Windows NT Magazine continually examines this question. Rather than debate NT's readiness for business applications, in this article I'll examine scalability, a key component in making NT the right solution for enterprise applications. I'll discuss what NT clusters provide today and what the future holds for NT clusters in the areas of availability, systems management, and scalability.

Availability
How important is uptime to your enterprise environment? Do you know that the difference between 99 percent and 99.999 percent availability is the difference between 4 days and 5 minutes a year? Can you afford to be down 4 days a year, or is even 5 minutes of downtime too much? Your answers to these questions will help you determine how important availability is to your organization, and how much you are willing to spend to get it. If you have mission-critical applications (I define a mission-critical application as any application that can't be down during business hours) and your business runs 24 hours a day, 7 days a week, you must have maximum availability.

Clustering lets you link separate computers (or nodes) to work as a single system. For example, you can configure a cluster in which one server functions as a backup for another server. If one of the two clustered servers fails, the functioning server picks up the load. Currently, Microsoft Cluster Server (MSCS) handles two-node configurations, whereas Octopus and ArcServe Replication can handle eight-node clusters. The more nodes you have in a cluster, the more sophisticated the cluster's configuration can be. In my companion article, "NT Clustering Solutions Are Here," on page 125, I detail eight examples of cluster configurations currently in use.

Future MSCS and other clustering solutions will handle up to 16 nodes, which is a practical limit for most situations. Today, generic failover lets almost any application restart on a surviving node in a cluster in which a node fails. As developers engineer applications to become more cluster-aware, the availability of applications that restart on functioning nodes will increase. This restart capability is key in your consideration of NT as a platform for mission-critical applications.

Systems Management
Are you tired of waiting until after hours to upgrade your servers? Wouldn't you love to be able to take servers down in the middle of the day, when it's convenient for you? With clusters, you can. For example, say you have a clustered Web server with six nodes, and you want to add a service pack to one of the servers. You can manually fail over the node on which you intend to perform the upgrade, letting the other servers pick up the failed-over server's load. Then, you can perform the upgrade, test it, and manually fail the server back into the cluster. What's more, you can do all this work while your users access the cluster as usual.

Suppose you want to view the nodes in your cluster as a single image. With the proper clustering solution, a single-image view of multiple nodes is easy to accomplish. A single-image view will let you make changes affecting the entire cluster once, instead of making changes one node at a time.

In the future, you can perform all kinds of systems and network management of clustered nodes. Systems management software such as CA-Unicenter is already cluster-aware. Eventually, all systems management packages must become cluster-aware to compete effectively.

Scalability
If you've used NT, you've undoubtedly heard that NT doesn't scale. In fact, Microsoft's fate rests on how well the company answers the scalability challenge. If Microsoft can't change the widespread perception that NT doesn't scale, customers won't deploy mission-critical applications on NT and therefore won't buy BackOffice. Fortunately for Microsoft, almost every vendor with an enterprise solution is vying for NT-enterprise business. Most vendors with existing scalability solutions on other platforms are porting them to NT. This fact makes predicting the direction of NT scalability over the next few years easier, because we know how the existing solutions work today.

In some ways, the scalability issue is like US politics. The two primary parties are large symmetric multiprocessing (SMP) and clustering. These parties have been debating for years over who has the best solution to the scalability challenge. The SMP party believes the answer is building larger SMP machines: 8 CPUs, 16 CPUs, 32 CPUs, and so on. The primary benefit to building larger SMP machines, the SMP party says, is that application software doesn't need to change--the operating system (OS) does the work. In addition, the SMP party believes maintaining one system is easier than maintaining a bunch of smaller systems. Isn't it better to throw all your applications on one machine?

Large-scale SMP
The SMP camp has many hurdles to overcome. The first is that applications written for NT Server don't behave well together. Windows NT Magazine's Web master, T.J. Harty, tried a simple experiment (see "Web Structure and Infrastructure," November 1997) that illustrates this problem. He ran Internet Information Server (IIS) and SQL Server on one 4-way SMP machine, then he ran them separately on two 2-way SMP machines. When he compared performance, T.J. found that running these applications on separate machines resulted in far better performance than running them together on a single machine did. Why? Because BackOffice and other NT Server-based applications contend for control of the system they're running on. These applications were designed to run on separate machines, not to run together on the same server. Microsoft almost always recommends running server-based applications on separate NT servers. The exception is Small Business Server (SBS), and with SBS you can accommodate a maximum of only 25 users.

A second hurdle is that large-scale SMP machines are expensive. As you increase the number of CPUs, the complexity that results from keeping the caching systems synchronized increases exponentially. This increased complexity will drive up the cost of manufacturing large-scale SMP machines. The newest trend in manufacturing these machines is to link two 4-way SMP motherboards with a Corollary bridge to create an 8-way motherboard. Intel has started production on units that follow this trend, and most server vendors are now shipping these systems. The solution is to get the cost of one 8-way system down to the cost of two 4-way systems, which isn't a likely possibility. Sequent Computer Systems, for example, is working on 16- and 32-way systems using its shared-memory model, NUMA-Q. Although several manufacturers will build 16- or 32-way systems, the cost of these systems will be prohibitive for most organizations.

A third hurdle to large-scale SMP is NT's thread scheduler and its limitations. Mark Russinovich has pointed out (see "Inside the Windows NT Scheduler, Part 2," August 1997) that NT's scheduler experiences performance problems when the number of CPUs increases. This limitation might explain the challenge NT Server-based applications such as SQL Server and Exchange face in scaling past 4 CPUs. Some vendors would have you believe that an 8-way SMP machine can run SQL Server, for example, twice as fast as a 4-way machine can. But NT Server applications don't scale in a linear fashion. Instead of producing 100 percent improvement by adding 4 additional CPUs, going from 4 to 8 CPUs might represent only a 60 percent to 75 percent improvement in throughput and performance.

There's another challenge to building large-scale SMP systems: NT's 4GB of RAM limit. When you increase the number of CPUs, you simply move the bottleneck to the overall system memory. Currently, only Digital's Alpha-based systems can overcome this problem when they're paired with the beta version of NT 5.0's 64-bit very large memory (VLM). This configuration lets an Alpha-based system address up to 32GB of RAM. Until Intel's Merced chip is available, most hardware server vendors will not achieve the desired level of scalability on their 8-way or higher SMP machines.

Here's a final problem: Having one SMP machine doesn't resolve the need for availability. If you have one large SMP machine and it fails, you're hosed.

Clusters
The cluster party maintains that clustering is the solution to increasing NT scalability. The cluster party also maintains that you can use standard off-the-shelf hardware as the building blocks for your commodity cluster solution. For example, if a 4-way SMP system offers the best price for performance on NT, then use it as the basis for your cluster strategy. Never, says the cluster party, use proprietary systems as the basic building blocks for your clustering solution. Your goal is to get scalability with availability--a requirement for any mission-critical application.

Like the SMP party, the cluster party has challenges to overcome. First, software must be written specifically to scale well on clusters. In a perfect world, a generic load-balancing program would be available for NT. Unfortunately, that program doesn't exist--on NT or any other platform. However, software vendors are working to solve the load-balancing challenge for specific applications. For example, several vendors provide load balancing and failover for IIS that lets an application scale with additional IIS-based Web servers (see Jonathan Cragle, "Load Balancing Web Servers," page 68, for a review of four examples of this technology). Another example is Citrix's WinFrame Load Balancing Option Pack. The Citrix solution lets you group multiple WinFrame servers in a unified server farm to serve as many users as necessary. Citrix is adding the WinFrame Load Balancing Option Pack to Microsoft's Windows-based Terminal Server. Oracle has created a version of its database server that scales by letting you add four nodes to a cluster. These vendor solutions are just beginning to provide ways to boost NT's scalability.

The second challenge to the cluster party is that it consists of two separate factions: shared disk and shared nothing. In the shared-disk clustering approach, software running on any of a cluster's nodes can access any disk connected to any node. In the shared-nothing clustering approach, no direct sharing of disks between nodes occurs. Figure 1 contrasts shared-disk and shared-nothing clusters.

VAX clusters make Digital a leading shared-disk vendor. Digital has begun moving VAX-like clusters to MSCS. Another leader of the shared-disk faction is Oracle, with its Parallel Server. Oracle and VAX use a distributed lock manager (DLM) to maintain consistency between data on shared disks.

Tandem is a leader of the shared-nothing faction, with its NonStop solution. Tandem contends that shared nothing is the only way to get the scalability and availability necessary for large mission-critical applications. A Tandem cluster has several key components, including NonStop software and the high-speed cluster interconnect ServerWare, which Tandem designed to be cluster-aware. In May 1997, Tandem demonstrated a 2TB database solution including a 16-node cluster, with each node having four 200MHz CPUs, scaling on NT. Tandem used its cluster software, database, and interconnect--NT had to scale only to 4 CPUs on the individual nodes. Tandem's goal was to show that NT solutions can scale if they're designed with the right software. To prove that any NT server can duplicate Tandem's scaling feat, Unisys plans to demonstrate a similar but larger system using NonStop technology. (Now that Compaq owns Tandem, it's only a matter of time before a similar demonstration is built on Compaq servers. For more on NonStop, see "Speculation About Compaq," page 11.)

Microsoft's Course to the Future
So where is NT going? Both the SMP and clustering parties will use NT. The SMP party will demonstrate and talk about 8-, 16-, and 24-way systems over the next several years. You'll see NT clustering solutions in every shape and configuration. As in politics, the SMP and clustering parties will each claim that its scalability solution is the best one.

A more important question is, where is Microsoft heading with its NT-based solutions, such as MSCS, SQL Server, and Exchange? To reach an answer, consider the following.

First, Microsoft has stated it will move toward shared-nothing clustering. Second, Microsoft paid Tandem $30 million to port Tandem's NonStop software to NT in exchange for an inside look at Tandem's clustering technology. Third, Jim Gray, Microsoft's senior research scientist and a former Tandem employee, is in charge of the Scaleable Servers Research Group at Microsoft. If you want to see where Microsoft's clustering strategy is headed, take a look at Tandem's existing technology--it's at least 3 years to 5 years ahead of Microsoft's.

In the sidebar "Commodity Cluster Requirements," Jim Gray outlines Microsoft's cluster vision. Microsoft must do several things to accomplish this vision. First, Microsoft must develop its scalable components, such as Transaction Server and distributed component object model (DCOM), so that independent software vendors (ISVs) will widely adopt these components. Microsoft must give developers the necessary tools to create cluster-aware applications. However, if Microsoft's tools don't work or are too hard to use, developers won't use them. Microsoft must lead the way by rewriting its applications to take advantage of scalable components and other cluster-specific features. When SQL Server, Exchange, IIS, and other Microsoft applications scale across clusters, developers will have the proof they need to start making their applications cluster-aware.

SQL Server 7.0 has technology that will take SQL on NT to the next level of scalability. SQL Server 7.0 contains an online analytical processing (OLAP) engine, code-named Plato. OLAP, a crucial component in building a data warehouse and decision support, builds multidimensional cubes of data. Plato can distribute these data cubes among many nodes in a cluster using a technique the industry calls partitions. Plato facilitates query-building that assembles data from data cubes, regardless of the cubes' location. In other words, a developer doesn't have to know where the cubes are located; Plato puts the necessary data pieces together at runtime and delivers the results to requesting queries. Another name for this technology is parallel query, and it is essential in building database applications that can scale beyond one node. Unfortunately, there is currently no support for performing parallel queries on online transaction processing (OLTP) data. Such support is necessary to boost the scalability of SQL on NT.

The key to scaling a database beyond one node is the ability to partition data across multiple nodes. But what kinds of management challenges do you create by spreading data across dozens of nodes? Is there an automatic way to partition data and manage the partitions? Not with SQL Server. However, partition management technology exists on Tandem's clustering platform. Tandem's NonStop SQL/MX database supports partitioning transparency and parallel query decomposition. In Tandem's 2TB demonstration, the SQL/MX database was partitioned across 700 drives transparently: The application program that ran against the database wasn't aware that the data was partitioned across 700 drives because the database was smart enough to put the pieces together at runtime. The parallel query decomposition capability Tandem demonstrated with its 2TB database decomposed queries against the database into components that multiple nodes of the partitioned database executed independently. Again, the database query engine had the intelligence to use the individual nodes of the cluster as if the configuration were one server. Finally, SQL/MX database-management software can partition the database automatically, which means database administrators don't have to. Microsoft developers face a big job getting SQL Server even close to this level of cluster awareness.

How soon will third-party business applications become cluster-aware? A logical assumption is that applications that are currently cluster-aware on non-NT platforms will be the first to migrate to NT. Under this assumption, developers who have based their technologies on Tandem, Oracle, or Digital have a head start over developers who build cluster-aware applications on NT from scratch. Unfortunately, many ISVs are afraid to stray too far from Microsoft-built products and therefore won't begin building cluster-aware applications until Microsoft ships a high volume of MSCS, creating a substantial market for cluster-aware applications. At the current pace, we won't see a large number of cluster-aware applications on NT until after the year 2000. If you need cluster-aware applications before 2000, I recommend taking a look at Oracle or Tandem.

Cluster Pricing
Clustering technology faces a significant problem with pricing. Will you be able to buy software at a total cluster price, or must you continue to buy a copy of applications for every server that clustering might run on, regardless of whether you use clustering only for failover capability? Currently, Microsoft's pricing of NT Server, Enterprise Edition (NTS/E) at $4000 per server positions it solely for enterprise customers. Microsoft believes NTS/E is of high value only to enterprise customers, who will use it primarily for SQL Server and Exchange. Ironically, Microsoft uses NTS/E for print servers and an internal software download (FTP) site--not for SQL Server or Exchange.

I assume that if NTS/E had a lower price, many organizations would find it useful in solving a variety of availability problems. With increased demand for NTS/E, more ISVs would consider writing cluster-aware versions of their software, thus speeding the delivery of Microsoft's vision of improving NT scalability.

What the Future Holds
Microsoft will continue to scale NT and BackOffice applications to work beyond 4-way SMP systems. This progress will fuel an interest in 8-way SMP machines and a limited interest in SMP machines with more than 8 CPUs. Microsoft must prove that SQL Server and Exchange will scale consistently on 8-way machines. To achieve NT scalability with availability, Microsoft will pursue shared-nothing commodity-cluster methodology. The commodity-cluster strategy is consistent with Microsoft's practice of building its software platforms on commodity hardware. In the future, whether NT is able to scale beyond a 4-CPU server might not matter, because rather than buying larger SMP systems (with more than 4 CPUs), you will simply add 4-CPU servers to your cluster to achieve scalability and availability.

While Microsoft pursues its scalability strategy, my advice is to take a look at all the clustering solutions in the NT market. The answer to the question, "Does NT scale?" might not come from Microsoft.