Commodity Cluster Requirements

\[Editor's Note: "Commodity Cluster Requirements" is reprinted with permission from Jim Gray's personal research plan. You can read this plan in its entirety on the Microsoft Research Web site: barc/gray/clusters95.doc.\]

The main requirements for a commodity cluster are summarized here. One could write a volume on each of these topics.

Commodity Components: The most common cluster has one node, the next most common has two nodes, and so on. The cluster should be built from commodity hardware and software components. You cannot buy ATM from RadioShack today, but you can buy 100Mb Ethernet, and Gigabit Ethernet is only a few years off. You can certainly buy NT and even SQLserver or Oracle. Any workgroup cluster design must scale from one node to 1,000 nodes with the same hardware and software base. It must have scaleup and scaledown. There are huge economies in having the same design work for both small workgroups and for SuperServers. This is really a price-performance requirement.

Modular Growth: The cluster's capacity can grow by a factor of a thousand by adding small components. As each component arrives, some storage, computation, and services automatically migrate to it.

Availability: Fault tolerance is the flip side of modular growth. When a component fails, the services it was providing migrate to other cluster nodes. Ideally, this migration is instant and automatic. Certain failure modes (for example, power failure, fire, flood, and insurrection) are best dealt with by replicating services at remote sites. Remote replication is not part of the core cluster design.

Location Transparency: The cluster's resources should all appear to be local to each cluster. Boundaries among cluster nodes should be transparent to programs and users. All printers, storage, applications, and network interfaces should appear to be at the one node that the user or program currently occupies. This is sometimes called a single-system image. Transparency allows reorganizing data or resources by moving them to new nodes from busy nodes without disturbing users or applications.

Manageability: Ideally, each hardware and software module would be plug-n-play. Even so, the administrator must set policies stating who is allowed to do what, when periodic tasks should be performed, and what the system performance goals are. Short of this fully automated cluster mechanism, the system should automate much of the design, deployment, operations, diagnosis, tuning, and reorganization of the cluster. A large cluster should be as easy to manage as a single node.

Security: A cluster appears to be one computer. It has one administrator, one security policy, and one authenticator. A process authenticated by one node of the cluster is considered authenticated by all other nodes.

Performance: Communication among cluster nodes must be fast and efficient. It must be possible to read at disk at speed (10 MB/s) from anywhere to anywhere. This requires good software and hardware IO architecture. The hardware and software cannot have any bottlenecks or centralized resources. Any centralized resource, even if it has a 0.1% utilization, will be saturated at a thousand nodes. The simple test of performance is that a thousand node system should run a thousand times larger problem in the same time (scaleup) or run a fixed sized problem a thousand times faster (speedup).

Automatic Parallelism: Architects have been building parallel computers since the 1960's. Virtually all these systems have been useless because they were difficult to program. Users expect parallel execution to be automated. Print, mail, file and application servers, TP monitors, relational databases, and other search engines have all been commercial successes by hiding concurrent execution from the application programmer. Clusters must provide tools that automate most parallel programming tasks.

How Does a Cluster Differ from a Distributed System? It may seem that a cluster is just a kind of distributed computing system­like the world wide web or a network of Solaris systems or... Certainly, a cluster is a simplified kind of distributed system. But, the differences are substantial enough to make cluster algorithms significantly different.

Homogeneity: A cluster is a homogeneous system: it has one security policy, one accounting policy, one naming scheme, and probably is running a single brand of processor and operating system. There may be different speeds and versions of software and hardware from node to node, but they are all quite similar. A distributed system is a computer zoo­lots of different kinds of computers.

Locality: All the nodes of the cluster are nearby and are connected via a high-speed local network. This network has very high bandwidth since it is modern hardware and software. The bandwidth is inexpensive since it does not come from the phone company. It is reliable since it is in a controlled environment; and it is efficient since it can use specialized protocol stacks optimized for local communication. Communication in a distributed system is relatively slow, unreliable, and expensive.

Trust: The nodes of a cluster all trust one another. They do authentication for one another, they share the load, they provide failover for one another, and in general they act as a federation. A distributed system is more typically an autonomous collection of mutually suspicious nodes.