Automated failover for multiple servers
|If you have multiple Exchange Server 2007 servers, cluster continuous replication (CCR) provides server high availability and an automated failover process. Learn how to prepare for and configure CCR.|
In “Setting Up Local Continuous Replication in Exchange 2007” (see Related Articles for details), I discussed the pros and cons of local continuous replication (LCR). Although LCR is cost effective and easy to implement, it requires hand-holding through the failover process; you must manually switch from the active disk to the passive one. LCR also doesn’t provide server high availability, because there’s only one server. If you’re working with more than one server, and you’re ready to move to the next level, you need to look into an automated failover process, such as the Exchange Server 2007 solution known as cluster continuous replication (CCR).
Here are a few considerations to weigh when looking into CCR: Do you need to maintain a high level of availability for your Exchange environment in the event a single server or disk becomes corrupted or fails? If disks alone are your concern, LCR is a valid choice, but to provide system failover, CCR is your next step. Do you need to provide automatic failover? CCR allows for automatic failover, but LCR does not. If you decide that high availability must be part of your system design, then my discussion of how to prepare for and configure CCR will be of interest to you.
CCR requires communication using two formats. The first format entails a private connection between active and passive nodes with a heartbeat (a message sent between the active and passive servers of a cluster informing each other that they’re functioning) passing between them. Sometimes the heartbeat communication stops, but the two systems still function properly. However, each system thinks it should be the active node. To prevent confusion, the secondary format comes into play through the use of a quorum resource. The quorum resource acts as a referee among the clusters by keeping track of the active and passive nodes and stepping in to determine the active node by majority vote. In CCR, this is called a majority node set with a witness file share. The witness file share acts as a third party between two equally established servers. In the event of a communication failure or a crash, the witness file share provides a majority vote deciding whether to change a node from active to passive, or not. Figure 1 illustrates the relationship between the witness file share and the active and passive nodes.
Preparing for CCR
Before you can configure CCR, you must do a few things to prepare. Because you’re enabling clustering for Exchange 2007, you need Windows Server 2003, Enterprise Edition, which has the cluster service. Create a separate account specifically for the cluster service, and place that account in the local Administrators group on each node. Then place the account in the Exchange Server Administrators group.
Create the cluster between the active and passive nodes before you install Exchange 2007. To create the cluster you’ll need a second hard drive in each system so that the C drive isn’t the only one, because the cluster can’t manage physical disks that are on the same storage bus as the volume that contains the OS.
Each node needs two network cards configured with static IP addresses. One card should be on a private network, for the heartbeat passing between the active and passive nodes. The other card should be on the public network for users’ email access, as well as for connectivity to Active Directory (AD), other Exchange servers, and the location of your file share witness and transport dumpster (which I discuss later). Private connections don’t have to be configured with a default gateway or DNS settings; only public connections require those settings. However, you do need to alter the binding order of each of the connections.
To change the binding order of your network connections, start the Control Panel Network Connections applet. On the menu bar choose Advanced, Advanced Settings. Click the Adapters and Bindings tab, and move the public connection to the top of the binding order in the Connections section. This ensures the public connection will always be attempted first. To make these options easy to find in the binding order, you can name your connections “public” and “private” within their network connections.
Install the majority node set quorum with file share witness on both your active and passive node systems. If you have Windows 2003 R2 or SP1 you can install the majority node set quorum either by installing Windows 2003 SP2 on those systems or by following the instructions in the Microsoft article “An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server clusters” (www.support.microsoft.com/kb/921181) to install the Windows 2003 update. This article provides useful information about the two main features added to the cluster—the file share witness and the configurable cluster heartbeats. The file share witness lets your nodes use a standard file share to work with your quorum (which you can establish anywhere that’s accessible to both nodes, although I recommend you put the share on the server holding the Hub Transport server role). Configurable cluster heartbeats let you set a tolerance level for those heartbeats that, by default, are sent every 1.2 seconds. The Microsoft article I mentioned earlier explains how you can change this default configuration.
To ensure a minimal amount of email is lost moving from the active node to the passive node, Exchange provides a feature called the transport dumpster, located on your Hub Transport servers. Its purpose is to keep a queue of email that has been recently delivered (you can configure the amount to be held). In the event of a failover, the passive node becomes the active node and checks in with every Hub Transport server for a resubmission of mail in the transport dumpster’s queue. Duplicate email is discarded; mail lost in the failover (which might occur if the failover were an unscheduled event) is retained.
To check your transport dumpster configuration, open the Exchange Management Shell on the Hub Transport server and type
Two important settings to look for are MaxDumpsterSizePerStorageGroup (the maximum size per storage group that will be held in the queue) and MaxDumpsterTime (the amount of time messages will remain, the default being 7 days). If you want to make changes to these transport dumpster settings you can use the Set-TransportConfig command.
For the purposes of our walk-through let’s assume you have an Exchange organization in place with at least one Hub Transport server established. You have two member servers (that meet the hardware and software requirements Microsoft proposes for Exchange 2007 Mailbox servers) running on Windows 2003 Enterprise with SP2 or the Windows 2003 update installed. Each member server has additional disks and two network cards, one configured as a public network and one configured as a private network.
Create a logically named folder (e.g., MNS_FileShare) on the Hub Transport server (which is a recommended best practice) located within the same site as the clustered nodes, reachable from the public network connections. Share that folder with the account you created earlier for the cluster service, with full control on the share.
Establishing the Cluster
Now that you have everything in place, you’re ready to establish your cluster. On the first node (the one you’ll consider your active node), start the New Server Cluster Wizard by selecting Start, Administrative Tools, Cluster Administrator. Cluster Administrator opens with an Open Connection to Cluster dialog box. Under Action select the down arrow, select Create new cluster, then click OK. Alternatively, you can go to a command prompt and type:
cluster.exe /create /wizard
On the Cluster Name and Domain page, in the Domain section, use the drop-down arrow to select the domain in which the cluster will be created. Then provide a Cluster Name (e.g., EXMBCluster). This name is important because it defines your cluster. Click Next.
On the Select Computer page, provide a computer name for the first node of the new cluster. If you’re working on the first node, as instructed, just type the name of the computer you’re working on, or browse and select it. There’s an Advanced option on this page that lets you alter the configuration of the cluster for a minimum installation. For this walk-through we need the typical (full) configuration, so you can leave the default settings.
The Analyzing Configuration page (which Figure 2 shows) checks for the cluster feasibility. It reports back warnings or errors, such as no secondary adapter for the private network. Or if you didn’t install Windows 2003 Enterprise SP2 or the hotfix from the Microsoft article that I mentioned earlier you’ll be notified that a quorum can’t be found. If you only have a single drive installed, you’ll receive the following warning: The cluster cannot manage physical disks that are on the same storage bus as the volume that contains the operating system because other nodes connected to the storage bus cannot distinguish between these volumes and volumes used for data . Review all the details of the analysis, clear up any issues, and click Next.
On the IP Address page, provide an IP address that cluster management tools will use to connect to the cluster. Make sure you use a unique address that other systems aren’t using. Click Next.
Supply the Cluster Service Account (name, password, and domain). Use the account you established after reading the “Preparing for CCR” section of this article. Click Next.
On the Proposed Cluster Configuration page, confirm the settings you chose. Click Quorum and select Majority Node Set. Click Next.
On the Creating the Cluster page, establish the cluster. Click Next, then Finish.
Confirm availability of the cluster by opening Cluster Administrator and verifying that the name of your cluster with the setting configured as Online is under your Groups, Cluster Groups settings. Before you celebrate, keep in mind that all you have is a cluster with one node. You need to add the second node for the cluster to have failover ability. You can start the New Server Cluster Wizard by using Cluster Administrator, but it’s quicker to use the command line to open the wizard. At the prompt type
cluster.exe /cluster:< ClusterName> /add /wizard
In the wizard, select a computer (or enter the name) for your second node. You can choose more than one, but because this is a CCR cluster you only want one additional node. CCR clusters can have only two nodes in a cluster, although those two nodes can be in two separate geographic locations. Keep in mind that the cluster wizard doesn’t know the type of cluster you’re setting up. Click Next.
As when you set up your first node, the cluster configuration will ensure everything is ready to proceed. Click Next. As before, you need to provide the cluster account that the service will run under and it should be the same one you created and provided earlier. Click Next. On the Proposed Cluster Configuration page, confirm your settings and click Next. After the cluster configures itself, click Next, Finish.
You’ll notice that the procedure is simpler for the secondary node. Joining a cluster is easier than creating one. To ensure the cluster is up and running, open Cluster Administrator. You’ll see the two servers running as nodes, as Figure 3 shows.
Before installing Exchange on each system, configure the Majority Node Set (MNS) quorum to use the file share witness you created earlier. To do so, open a command prompt and type the following, making sure to use the correct Universal Naming Convention (UNC) path for the shared-out folder:
Cluster res "Majority Node Set" /priv MNSFileShare=\\<servername>\<sharename>
You’ll get a warning message that tells you the changes won’t occur until the next time the resource comes online. A quick way to accomplish this is to move the cluster group by typing the following:
cluster group "<ClusterGroupName>" /move
This command will take the resource offline and bring it back online again. Then you can take the resource back offline if you want.
Installing Exchange 2007 Mailbox Servers
You’re finally ready to install the two Exchange server roles: Active Clustered Mailbox role and Passive Clustered Mailbox role. Make sure the two systems meet Exchange Server hardware and software requirements.
First, you’ll install an Active Clustered Mailbox server role on your active node. I don’t discuss the standard installation steps here, just the custom installation. For Installation Type, select Custom Exchange Server Installation. Then, select Active Clustered Mailbox Role, as Figure 4 shows, and click Next.
Under Cluster Settings, be sure Cluster Continuous Replication is selected, as Figure 5 shows. Complete the Clustered Mailbox Server Name and Clustered Mailbox Server IP Address options. This name is what you’ll give your Outlook clients when you set up users’ profiles that have mailboxes on the clustered mailbox server. The IP address must be unique on the network. Click Next. Complete the Readiness Check page and the Completion page to finish the installation.
Now that you’ve prepared the active node, ensure that the passive node is ready for the Exchange server role installation. Then perform the installation just as for the active node, with one exception—for this custom installation you’ll select the Passive Clustered Mailbox Role.
When the passive node installation completes, you’ll have an active and passive node set of mailbox servers that are fully clustered using CCR with an MNS quorum on the Hub Transport server. But how do you know it all works?
Verifying a Cluster
To verify the cluster you can use Cluster Administrator to ensure that everything is working. You can also check on the active and passive nodes to see if the storage groups have been properly replicated from one to the other with transaction logs being shipped and so forth, to make sure the two systems are in sync.
To use the command line to see the cluster, open the Exchange Management Shell and type the following PowerShell command:
Get-ClusteredMailboxServerStatus –Identity < name of cluster>
You’ll see that the cluster is online, as well as which node is currently active.
If you really want to have some fun, test your cluster’s failover ability. No, I’m not encouraging you to turn off the active node (although that will work). Instead of a crash, initiate a “hand off.” Open the Exchange Management Shell and enter the following command to move from the active to the passive node:
Move-ClusteredMailboxServer –Identity:<name of cluster> -TargetMachine:<passive node>
Keep in mind that the passive node is whichever one is not shown as active when you use this command to check the status. When you’re finished testing, reset the active and passive nodes as desired.
Rise of CCR
High availability has its price. CCR is more complicated than the LCR option I discussed in a previous article (see Related Articles for more information). But having automatic failover in the event of a crash is well worth the cost. Yes, this solution requires a bit more hardware, the Enterprise version of Windows 2003, an additional Exchange mailbox server, and lots of cluster information. But compared with earlier, expensive SAN/NAS solutions, CCR is cheaper and easier to configure—plus, it provides redundancy not only of the mailbox server but also of the data itself. CCR is the best built-in solution Exchange 2007 has to offer.