Take advantage of Win2K Cluster service

With the release of Microsoft Exchange 2000 Enterprise Server, Exchange Server deployments can now take full advantage of Windows 2000 Cluster service (formerly Microsoft Cluster Server—MSCS—in Windows NT). (Although Enterprise Server is the only version that supports clustering, I'll refer simply to Exchange 2000 throughout the rest of this article.) Exchange 2000 clustering will likely be a primary building block for organizations that want to scale out their Exchange 2000 deployments: Exchange 2000's support for active/active clustering can provide increased availability and can be a cornerstone of server consolidation. Whether you're new to clustering or have already deployed clusters in Exchange Server 5.5, the best approach to deploying Exchange 2000 clusters is to first study the technology, then plan your implementation.

Why Cluster?
A common—and fatal—mistake when you employ cluster technology is to view clustering as the final answer to all Exchange Server downtime problems. Clustering technology, especially in Win2K or NT, can help solve only specific problems, typically by eliminating single points of failure. A standalone server running Exchange Server has several potential points of failure—for example, hardware components (e.g., the system board, processors, power supplies without redundancy, network cards) that might fail. In some cases, software problems such as memory leaks can also be single points of failure that clusters can help eliminate. Shared storage, however, is an example of a single point of failure that a cluster can't protect you from. When all nodes in a cluster attach to shared storage through a controller in an individual node, that controller becomes a single failure point that the cluster can't tolerate. Technologies such as switched fibre channel and redundant paths can compensate for this weakness by enabling each node to contain a redundant controller that attaches to a separate switch fabric in the fibre channel Storage Area Network (SAN).

Clustering also addresses another common availability problem: planned outages. Although you might not consider planned outages to be a factor in your Exchange Server deployment's availability, these outages still cause downtime during which clients can't access your system. Outages can occur during routine maintenance, rolling or block-point upgrades, configuration changes, and hardware or software upgrades. Clustering's ability to support planned outages might be the technology's most important—and most overlooked—benefit. Clustering Exchange Server can help mitigate planned outages by letting you fail over services running on one cluster node to another node or nodes while you perform maintenance. In most cases, you can perform comprehensive hardware or software upgrades and routine maintenance without users ever noticing an interruption. This failover ability alone might justify an organization's investment in clustering Exchange Server.

Clustering isn't a replacement for sound disaster-recovery practices. Beyond reducing single points of failure and minimizing planned outages, clustering technology might not be able to solve other problems that cause downtime for your Exchange Server deployment. Clustering can't solve problems such as poor planning or inadequate training, and can't mitigate most software problems or major catastrophes. (For information about disaster recovery planning, see Paul Robichaux, Getting Started with Exchange, "The Six Deadly Backup Sins," April 2000.) And clustering can't help prevent infrastructure failures of services (e.g., WINS, DNS, Active Directory—AD, network services) that directly support Exchange Server. Also, clustering and the technologies (e.g., Cluster Administrator) that support it add significant complexity to your environment. Before you decide to cluster Exchange Server, you need to understand and evaluate whether clustering can address your most important concerns.

MSCS to Cluster Service
Microsoft included several optional components in its 1997 release of NT Server, Enterprise Edition (NTS/E). MSCS was one of these components. The name change from Microsoft Cluster Server in NT 4.0 to Cluster service in Win2K is a small but important differentiating detail. (For information about MSCS and Cluster service, see "Clustering Resources," page 145.)

MSCS's goal is to extend NT to seamlessly include high-availability features and to support applications—without requiring significant application modification—in a clustered environment. Microsoft specifically excluded two features from MSCS. First, MSCS isn't lock-step fault tolerant, which means MSCS doesn't instantaneously move running applications and thereby avoid missed operations. Therefore, applications running on MSCS rarely achieve 99.99999 percent availability (i.e., about 3 to 4 seconds of downtime per year). Second, MSCS can't recover a shared state between client and server. In other words, users will likely need to repeat work in progress during a failure.

Key Win2K Clustering Concepts
Three Win2K Cluster service concepts are pertinent to Exchange 2000 clustering. These concepts are resources, virtual servers, and failover.

Resources. Resources are the smallest unit that you can manage within a cluster. Resources are logical or physical units that Cluster service can bring online and take offline, manage in a cluster, host on only one node at a time, and move between nodes. Examples of logical cluster resources include network names and IP addresses; examples of physical cluster resources include disk devices and network interfaces. In a cluster, the cluster designer or administrator typically groups resources into functional units, called resource groups, for the specific purpose of providing a service such as a file and print share, Web service, or Exchange Server service. The relationship among resources in a resource group is called resource dependency. For example, a network name needs to resolve to an IP address; therefore, the network name is dependent upon the IP address. Resource dependency dictates the order in which resources come online and go offline. Figure 1 shows the Exchange 2000 resources and resource dependency within an Exchange Virtual Server (EVS) resource group.

Virtual servers. Virtual servers build on the concept of resources and re-source groups. Don't confuse a cluster virtual server with a Microsoft IIS virtual server (although an IIS virtual server might map to a cluster virtual server). A virtual server in a cluster is the application service entity that network clients see. In Exchange 2000, a virtual server is a group of all the resources necessary to provide Exchange Server service functionality on the network. EVSs appear to the Exchange 2000 System Manager application and to users as standalone (i.e., nonclustered) Exchange 2000 servers. In Exchange 2000 Enterprise Server, multiple EVSs can exist in a cluster. In any cluster, only one cluster node at a time can own each EVS, although one node can support multiple EVSs simultaneously. The only limitation in the number of virtual servers per node or per cluster is the number of Exchange 2000 storage groups that each server or node can support. In Exchange 2000 Enterprise Server's initial release version, that limit is four storage groups per clustered or standalone server; this limit probably won't change anytime soon. Understanding the EVS concept is an important step toward successfully deploying and managing clustered Exchange 2000 servers.

Failover. Failover is the moving of cluster services from one node to another. Cluster service supports two types of failover: Active/active (aka resource) and active/passive (aka service). Both types permit increased system availability.

Active/active failover, which Exchange 2000 supports, is more comprehensive than active/passive failover. Active/active mode assumes that Cluster service is running on both cluster nodes and that a specific resource (e.g., a database, a virtual server, an IP address), rather than the entire service, will fail over. Active/active mode utilizes application-specific resource DLLs, which use cluster APIs to make applications cluster-aware and to permit customizable failover of the applications. Resource DLLs provide a means for Cluster service to manage resources. These DLLs define resource abstractions, interfaces, and management.

However, many applications (from Microsoft as well as third-party software vendors) don't provide resource DLLs. To offset this deficiency, Cluster service provides a Generic Service resource DLL that offers any application basic functionality when running on Cluster service. This generic DLL provides for the active/passive failover mode and restricts applications to running on one node only.

In active/passive mode, which Ex-change Server 5.5 supports, the Generic Service resource DLL defines an application as a resource to Cluster service. Then, Failover Manager ensures that the application runs on only one cluster node at any given time. The application is part of a resource group that uses a common name throughout the cluster. All applications running in the resource group appear under that name to all network clients.

Exchange 2000 Clustering Enhancements
Microsoft designed Exchange 2000 Enterprise Server to build on Exchange Server 5.5, Enterprise Edition's (Exchange 5.5/E's) initial clustering support, providing full application functionality in a clustered environment. Exchange 2000 supports active/active failover. From an Exchange 2000 administrator's perspective, an EVS comprises all the required components (i.e., a storage group and required protocols, at a minimum) to provide services and a unit of failover. One or more EVSs can exist in a cluster, and each EVS runs on a cluster node. From Cluster service's perspective, an EVS exists as a subgroup of resources in each resource group. If you have multiple EVSs that share the same physical disk resource (for example, EVSs that each have a storage group residing on the same disk device), those EVSs must exist within the same resource group. This requirement ensures that the resources and EVSs all fail over as one unit and maintains resource group integrity. Clients connect to EVSs the same way that they connect to standalone servers. Cluster service monitors the EVSs in the cluster. In the event of a failure, Cluster service restarts or moves the affected EVSs to a healthy node. For planned outages, you can manually move the EVSs to other nodes. In either event, clients will see an interruption of service only during the brief time that an EVS is in an online/offline pending state.

Exchange 2000 Cluster Components
Currently, not every Exchange 2000 component is supported in a clustered environment. The supported components are the resources that makeup an EVS's resource group. Table 1 lists Exchange 2000's primary components, the type of support (if any) they receive in a cluster environment, and the failover mode they support.

Remember that the existence of an application-specific resource DLL is the key differentiator for cluster-aware applications. Exchange Server 5.5 doesn't provide a resource DLL but instead uses Cluster service's generic resource DLL. Exchange 2000 developers took the extra time to create application-specific resource DLLs that guarantee full cluster functionality. As Figure 2, page 147, shows, Exchange 2000 provides core features and support for Cluster service primarily through two DLLs: the exres.dll resource DLL and the excluadm.dll Cluster Administrator DLL.

Exchange 2000's setup application installs exres.dll when Setup recognizes that Exchange 2000 is operating in a clustered environment. Exres.dll implements the Cluster service API set and thus acts as a direct resource-monitor interface between the cluster services and Exchange 2000. Exres.dll is necessary for resource monitoring and for restart or failover actions.

For Cluster Administrator to configure and control Exchange Server resources, Exchange Server services must communicate with Cluster Administrator, and Cluster Administrator must provide Exchange-specific configuration parameters and screens. Excluadm.dll provides the necessary wizard screens when you configure Exchange Server resources in Cluster Administrator and presents Exchange Server resources—the Ex-change System Attendant, for example—that you can add as resources in the cluster. Excluadm.dll is vital in configuring and managing Exchange services in a cluster.

A Strong Foundation
Understanding the basic concepts behind Cluster service and Exchange 2000 clustering gives you a foundation from which to evaluate whether clustering your Exchange 2000 deployment is the right decision for your organization. However, you still need to consider storage design and administration before making a move toward clustering. In Part 2 of this series, I'll discuss these decisions and the best practices to follow when you add clustering to your Exchange 2000 deployment or upgrade from Exchange Server 5.5 clusters.

Clustering Resources
WINDOWS 2000 MAGAZINE ARTICLES

Adrian Ingleson, "Deploying Microsoft Cluster Server," August 2000, InstantDoc ID 9044
Greg Todd, "Microsoft Clustering Solutions," November 2000, InstantDoc ID 15701

WEB SITE
http://www.microsoft.com/windows2000/library/
technologies/cluster/default.asp

WHITE PAPER
"Deploying Exchange 2000 Clusters" http://www.compaq.com/activeanswers