MSCS Architecture
Although the current release of MSCS implements clusters consisting of two nodes, future releases will support clusters made up of multiple nodes. So with an eye on the future, in the rest of this discussion I'll refer to a cluster as consisting of two or more nodes.
The software components that make up a node in an MSCS cluster are Cluster Service, Cluster Network Driver, Cluster Administrator, one or more Resource Monitors, and one or more Resource DLLs. Figure 3 shows how these pieces relate to one another. Cluster Service, which MSCS implements as an NT service, is the command center of the cluster. Cluster Service is in charge of managing the resources of the cluster and communicating with Cluster Services on the other nodes to coordinate resource ownership and monitor for failures. Cluster Service is made up of several discrete components that divide resource ownership coordination and failure monitoring into subtasks. For example, Cluster Service's Database Manager uses Cluster Service's Log Manager to log transactions.
Cluster Service components include Checkpoint Manager, Database Manager, Event Log Manager, Failover Manager, Global Update Manager, Log Manager, Membership Manager, Node Manager, and Resource Manager. Checkpoint Manager, Database Manager, and Event Log Manager maintain a coherent image of the central cluster database, which must be stored on the cluster's quorum resourcein the current version of MSCS this resource is a shared SCSI drive. Whenever a change occurs in the status of a node, resource, or group, MSCS updates the cluster's database transactionally. Global Update Manager notifies other nodes in the cluster whenever a change occurs in the status of one node. Maintaining internal node and network configuration information is the job of Node Manager. Node Manager and Membership Manager work together to monitor and control nodes both as active participants in the cluster and as offline nodes.
As in standard NT services, errors that occur in any part of the cluster software are logged to the System event log; however, Cluster Service's Event Log Manager, instead of the native NT event log manager, records these errors. Event Log Manager replicates the cluster event log across the cluster's nodes, ensuring that any node can use its Event Viewer to view cluster errors. Failover Manager handles the task of failing over (i.e., moving cluster resources from one node to another) resources and groups while honoring their start order and dependencies. Resource Manager initiates the migration of resources between nodes by communicating with Failover Manager.
Cluster Network Driver is the intranode communications channel, and it is responsible for providing reliable communications. Cluster Services on the MSCS nodes use the Cluster Network Driver to check other nodes periodically to determine that they are operational (the cluster network manages communications with User Datagram ProtocolUDP). Cluster Network Driver notifies each node's Cluster Service when it detects a communications failure. Then, each Cluster Service determines whether its node should become the only active node in the cluster or the node should fail itself (I will elaborate on this function shortly).
The private network links between nodes in a cluster facilitate intranode communications. Because the links between nodes are crucial to the stability of each node, other traffic should not share these links. However, if you don't install the private connections, or if they fail, the nodes will attempt to use their LAN connections to talk to one another. Communications between applications on a cluster and clients take place over the LAN using a remote procedure call (RPC) on top of the TCP/IP protocol.
MSCS uses resources to abstract the parts of an application that move from one node to another. Examples of resources include network shares, IP addresses, applications, physical disks, and Web server virtual roots. Every MSCS resource requires a Resource DLL, which monitors and controls resources of a particular type. MSCS comes with the handful of standard Resource DLLs shown in Table 1, but most MSCS applications can define their own Resource DLLs.
Resource DLLs load into the memory space of Resource Manager, which executes separately from Cluster Service. The separate execution of Resource Manager insulates Cluster Service from Resource DLLs that fail, and introduces a component that can detect failures in Cluster Service. When Cluster Service queries the status of a resource or starts or stops a resource, it sends the command through Resource Monitor, which contains in its memory space the resource's Resource DLL. Cluster Service creates one Resource Monitor by default, but you can create more Resource Monitors manually through Cluster Service. You might want to create extra Resource Monitors if you have a Resource DLL that crashes consistently and you want to separate it from other Resource DLLs.
You use the Cluster Administrator program to configure and monitor clusters, as Screen 1, page 62, shows. Cluster Service exports an administrative API that Cluster Administrator uses to group cluster resources, list resources that are on the cluster or are active on a node, and take resources offline and restart them. Developers can create other cluster administrative tools that make use of this published administrative API.