Simple steps to a better cluster
Clustering your Exchange Server 2003 or Exchange 2000 Server systems can provide the high availability that's so important for a business-critical email application. If you're considering clustering Exchange, you can take several steps to improve your deployment, such as getting cluster-specific training, planning ahead, building extra redundancy into the cluster, and deploying a solid Windows infrastructure before building the cluster. (This article requires knowledge of clustering concepts. For clustering basics, see "Clustering in Exchange 2000," November 15, 2001, http://winnetmag.com, InstantDoc ID 22772, and the Microsoft article "Deploying Microsoft Exchange 2000 Server Clusters" at http://www.microsoft.com/downloads/details.aspx?familyid=824a63a2-f722-4bff-a223-e71b856f83c4. For a comparison of Exchange 2003 and Exchange 2000 clusters, see "Exchange Server 2003 Clusters," November 2003, InstantDoc ID 40457.)
Clusters are more complex than single-server Exchange deployments, so you need training that focuses on clustering concepts and operations such as the quorum, failover/failback operations, and using Cluster Administrator. You also need to understand the requirements of clustering-hardware configurations. For example, shared storage must be accessible to all nodes, so you must correctly configure any hardware that manages storage connections (e.g., array controllers, Storage Area Network—SAN—switches) to avoid contention or corruption of databases. Attention to detail is necessary to ensure that you correctly install Windows before installing Exchange and that you install and configure Exchange in the correct sequence to work on a cluster—a process that differs significantly from installing Exchange on one server. For example, to install a two-node, active/passive cluster, you need to perform the following tasks in sequence:
- Run Exchange Setup on node 1.
- Run Exchange Setup on node 2.
- Create a cluster group for the Exchange Virtual Server (EVS).
- Move disk resources that the EVS will use to the Exchange cluster group.
- Create the resources that the EVS requires (e.g., Microsoft Distributed Transaction Coordinator—MSDTC—an IP Address resource, a Network Name resource).
- Create a System Attendant resource for the EVS. As part of this step, you must supply the name of the EVS, the administrative group and routing group in which the EVS will reside, and a shared-storage folder in which Exchange will create and store its databases, transaction logs, and SMTP folders at installation.
- Cluster Administrator automatically creates Exchange cluster resources for the EVS (e.g., the Information Store—IS; HTTP and IMAP servers for the virtual server; the required dependencies for the IP Address and Network Name resources).
- Use Exchange System Manager (ESM) to relocate the Exchange components (i.e., databases, transaction logs, and SMTP folders) to shared-storage drives or folders, according to established best practices (see "Customizing Your Exchange 2000 Server Installation," June 2002, http://www.winnetmag.com/microsoftexchangeoutlook, InstantDoc ID 24774, for suggestions). Exchange needs to be able to access these resources from each node as the EVS fails over.
You need to have a firm grasp of Microsoft Cluster service clustering concepts (see the Microsoft white paper "Windows Clustering Technologies—An Overview" at http://www.microsoft.com/windows2000/techinfo/planning/clustering.asp for more information about these concepts). You also need to understand the limitations and constraints of running Exchange 2003 or Exchange 2000 on a cluster. For example, the Lotus Notes connector is unsupported on Exchange 2003 or Exchange 2000 clusters, as the Microsoft article "Status of Exchange 2000 Server and Exchange Server 2003 Components on a Server Cluster" (http://support.microsoft.com/?kbid=259197) explains. You must deploy additional standard servers to support any components that aren't supported on clusters. Be aware that many third-party products fall into this category, and I definitely recommend against installing unsupported products on a cluster, given the complexity of clustering.
As important as training is, deploying production-quality test clusters that match the specifications of your production clusters can be prohibitively expensive because of the additional hardware necessary (compared with single-server deployments). Therefore, getting the necessary experience on a cluster before deployment is often difficult. To deploy low-cost clusters as training aids, consider using virtual server technology such as VMware or Microsoft Virtual Server. Windows Server 2003 introduces the concept of a local quorum, which lets you deploy single-node clusters. However, you can't test failover and failback operations or rolling upgrades on this type of cluster.
Plan your cluster deployment carefully. A poorly implemented cluster can perform erratically and can increase downtime rather than maximize uptime. When planning, consider hardware specifications, node configuration, and the limitations of Exchange clustering memory management.
Hardware specifications. All the hardware components you use in a Windows 2000 cluster (e.g., disk drives, array controllers) must appear on the Microsoft Hardware Compatibility List (HCL—http://www.microsoft.com/whdc/hcl/default.mspx). For Windows 2003 clusters, consult the Windows Catalog (http://www.microsoft.com/windows/catalog/server), which replaces the HCL. If you implement hardware that isn't on the HCL or in the Windows Catalog, Microsoft won't support your configuration. The HCL lets you view devices by category, as Figure 1 shows; the Windows Catalog does the same.
Node configuration. Within a cluster, identically configure each node that can host an EVS, and configure all nodes with identical specifications for memory, disks, CPUs, and so forth. Although Microsoft's Operations and Technology Group (OTG) has implemented clusters with member nodes that have varying hardware configurations, I recommend identical node configurations within a cluster. Clusters are complex, and introducing cluster nodes with varying hardware specifications introduces additional complexity for Cluster Administrator. Also, implementing nodes of varying specifications can lead to inconsistent performance levels as EVSs move between nodes. (If you must build clusters with varying hardware configurations, see "Troubleshooter: Building Exchange Clusters with Nodes on Differing Types of Hardware," October 2003, http://www.winnetmag.com/microsoftexchangeoutlook, InstantDoc ID 39797.) The Microsoft article "The Microsoft Support Policy for Server Clusters and the Hardware Compatibility List" (http://support.microsoft.com/?kbid=309395) describes Microsoft's support policy for cluster hardware, and some hardware vendors offer Microsoft-certified and -supported packaged cluster solutions with standard hardware across nodes.
Limitations of clustering memory management. Active/active clusters on early Exchange 2000 deployments experienced virtual memory problems, as "Memory Fragmentation and Exchange Clusters," November 2001, http://www.winnetmag.com, InstantDoc ID 23097, describes. The release version of Exchange 2000 supports a maximum of 1000 connections per node, Exchange 2000 Service Pack 1 (SP1) supports a maximum of 1500 connections, and SP2 supports a maximum of 1900 connections, as does Exchange 2003. (This limitation applies to clusters running on Windows 2003 or Win2K.) However, Microsoft's recommended cluster model for Exchange 2003 and Exchange 2000 is active/passive. Active/passive clusters don't have the same constraints on connections as active/active clusters have. Virtual memory fragmentation is less of a concern with active/passive clusters because the EVS can always start on a passive node.
In Exchange 2003, built-in functionality in ESM enforces active/passive clustering guidelines on clusters with more than two nodes: The number of EVSs you can create is (N-1), where N represents the number of nodes in the cluster. ESM blocks you from creating EVSs that equal or exceed the number of nodes in the cluster. However, you still need to monitor memory fragmentation on active/passive clusters (and standalone servers) with many users. The Microsoft article "XADM: Monitoring for Exchange 2000 Memory Fragmentation" (http://support.microsoft.com/?kbid=296073) describes how to configure Performance Monitor to monitor virtual memory usage. For more information about planning Exchange 2003 clusters, see the Microsoft article "Planning an Exchange 2003 Messaging System" (http://www.microsoft.com/downloads/details.aspx?familyid=9fc3260f-787c-4567-bb71-908b8f2b980d).
3. Redundancy, Redundancy, Redundancy
The key design principle of any cluster deployment is to provide high availability. In the event of a hardware failure, a failover operation will move resources from the failed node to another node in the cluster. During Exchange failovers, users won't be able to access email folders for a brief time as resources go offline on the failed node and come online on the other node. For each node in a cluster, implement redundant hardware components to reduce the effect of hardware failures and thus avoid a failover. Examples of components in which you can implement redundancy are NICs, power supplies, and host bus adapters (HBAs) or array controllers.
NIC teaming lets multiple NICs act as one virtual NIC, allowing for the failure of one NIC, cable, or switch port (when you split the NICs over multiple network switches) without any interruption of service. I suggest you use NIC teaming on the public (client) network in a cluster, but Microsoft doesn't support teaming on the private (heartbeat) network in a cluster (as the Microsoft article "Network Adapter Teaming and Server Clustering" at http://support.microsoft.com/?kbid=254101 explains).
You can connect redundant power supplies to separate power distribution units; if you connect multiple power supplies to the same power distribution unit and that unit fails, power will be lost. Also connect power supplies to a UPS or use a UPS service to protect the datacenter hosting the cluster in the event of a power failure.
If you implement a SAN with your cluster, try to implement redundancy into the connections between your nodes and the SAN to handle failures of the HBA, fibre connections, or SAN switches, without the need to induce node failover. Take care to configure your storage; many of the problems with early Exchange 2000 cluster deployments were storage related.
4. Stabilize Your Windows Infrastructure
A stable and resilient Windows infrastructure is a crucial element of any Exchange cluster deployment. For an Exchange 2003 cluster, all Active Directory (AD) domain controllers (DCs) and Global Catalog (GC) servers must run Windows 2003 or Win2K SP3 or later; the DCs and GC servers that support an Exchange 2000 cluster must run Win2K or later. Exchange 2003 and Exchange 2000 store configuration information in DCs and GC servers. Each DC holds a complete copy of all the objects in the DC's domain, plus a copy of objects replicated in the forestwide Configuration naming context (NC). A GC server holds a complete copy of all objects in the GC server's domain, plus partial copies of objects from all other domains in the forest. DSAccess is the Exchange component that locates and retrieves AD information from DCs and GC servers. From DCs, DSAccess retrieves information about Exchange entities such as administrative groups, connectors, Exchange system policies, and other servers in the Exchange organization. From GC servers, DSAccess retrieves user information such as email addresses and distribution group memberships. (For more information about DSAccess, see "Exchange 2000 SP2's DSAccess Component," July 2002, http://www.winnetmag.com, InstantDoc ID 25330.)
To ensure the stability of the infrastructure that your cluster relies on, you can build redundancy into your Windows 2003 or Win2K organization by implementing multiple GC servers, DNS servers, and WINS servers.
Multiple GC servers. Implement two GC servers in the same Windows 2003 or Win2K site and LAN in which your Exchange clusters reside. If DSAccess can't contact a GC server, the System Attendant will fail, causing a failover because the IS resource has a dependency on the System Attendant resource. Implementing two GC servers mitigates the effect of a GC server going offline. If another GC server is available in the site, Exchange will use DSAccess to locate that GC server. If no GC server is available in the site, Exchange will try to use GC servers in other sites, resulting in downtime as DSAccess attempts to locate a GC server. When DSAccess locates a GC server, the System Attendant and IS resources will come back online and service will be restored. Outlook clients also use GC servers to query and retrieve the Global Address List (GAL). If a GC server goes offline, Outlook sessions also are adversely affected. Deploying Outlook 2003 with cached mode enabled can reduce the impact and visibility to users when a GC server goes offline. When working offline in cached mode with no connection to an Exchange server or a GC server, Outlook uses the Offline Address Book (OAB) on the client to access directory information. Some deployments use separate sets of GC servers for additional resilience: Back-end GC servers support Exchange servers, and front-end GC servers provide directory information to Outlook clients.
Multiple DNS servers. Windows 2003 and Win2K use DNS to resolve server names to TCP/IP addresses and to locate resources. If a DNS server goes offline and no secondary DNS server is available, Exchange can't resolve server names to TCP/IP addresses and might experience DSAccess errors and nondelivery of mail.
Multiple WINS servers. Windows 2003 and Win2K use WINS servers to resolve NetBIOS names to IP addresses; Windows NT networks use WINS for name resolution. The Exchange Server 2003 Deployment Guide states that WINS is necessary for deploying Exchange 2003 or Exchange 2000; Exchange Setup and the ESM use WINS.
Step by Step
Planning, when combined with a stable and resilient Windows infrastructure, redundant clustering hardware, and specialized training for cluster administrators, is key to a successful Exchange cluster deployment. You can also improve your clusters by using proper configuration and adequate security measures, by knowing how to handle failovers, and by properly addressing whichever Exchange service pack your cluster uses. I'll discuss those steps in my next article.