Plan, implement, and troubleshoot cluster upgrades
Many people view the ability to perform rolling upgrades as one of the main advantages of deploying Exchange Server clusters. However, if you're used to working with Exchange 2000 Server, you should be aware that changes in Exchange Server 2003 add some complexity to the procedures for upgrading Exchange 2000 clusters to Exchange 2003, upgrading Exchange 2003 clusters to Exchange 2003 Service Pack 1 (SP1) or later, or applying hotfixes that change the build number of the exsetdata.dll file. Understanding these changes, what sort of planning you should perform before the upgrade, the upgrade process itself, and a few troubleshooting ideas will send you well on your way to success. (This article requires a basic understanding of Microsoft clustering technology and how a clustered Exchange deployment differs from a standalone deployment. The sidebar "Rolling Upgrades" addresses a few of these points.)
Changes to Note
Microsoft made some changes to the rolling upgrade model with the release of Exchange 2003 and Exchange 2003 SP1. When you upgrade Exchange 2000, you only need to upgrade the binaries on each node. Upgrading Exchange 2003 to SP1 or later requires an additional task. After you've applied the upgrade to one node, you must use Cluster Administrator to take your Exchange Virtual Server (EVS) offline, move the EVS to the newly upgraded node, then right-click the System Attendant resource and select Upgrade Exchange Virtual Server. This process updates the metadata associated with the EVS in Active Directory (AD) to reflect the new Exchange version.
When you're running an Exchange cluster with more than one EVS, upgrade only one EVS at a time. This guideline applies to both active/active and active/passive clusters. The upgrade procedure uses a global variable (g_csCachedXSes) that can be modified by only one upgrade session at a time. The Microsoft article "Setup Stops Responding When You Upgrade Multiple Exchange Server Virtual Servers at the Same Time" (http://support.microsoft.com/?kbid=822582) describes what can happen if you try to run EVS upgrades in parallel.
Before you upgrade to Exchange 2003 SP1, certain elements need to be in place. (Most of the following requirements also apply to nonclustered Exchange servers.)
Schedule downtime with the user population. I've performed several SP1 upgrades and have been lucky in that all the users were running Microsoft Office Outlook 2003 in Cached Exchange Mode. (Cached Exchange Mode reduces the visibility of downtime because users can continue to work offline against the cached copy of their mailboxes while the EVS is offline.) The upgrade procedure includes two phases during which you must take the EVS offline. Outlook users experience downtime while the EVS is offline on one node until it's brought online on another node. This process is known as failover. For tips on reducing failover time and the visibility of failovers, see "8 Ways to Improve Your Exchange Cluster, Part 2," May 2004, InstantDoc ID 41943. From my testing, the combined failover downtime for each EVS is on average between 3 and 6 minutes. If your service level agreement (SLA) allows it, try to allocate 30 minutes of downtime for a service pack upgrade. Exchange won't be offline for the entire duration but might go offline for periods of 1 to 2 minutes as you perform failover testing. If you have the luxury of planning an hour of downtime, take it! If you run into a problem during the upgrade, you'll have additional downtime to resolve it. You can also limit disruptions to users by using the downtime to apply any Windows security patches that have been released since you last performed maintenance. And while you're at it, be sure to download and install the Windows 2003 hotfix 831464 (http://www.microsoft.com/downloads/details.aspx?familyid=0bc9b5bc-a094-49bf-89a5-c8a2d32345a2), which is required before you can install Exchange 2003 SP1. This hotfix resolves problems rendering content to Outlook Web Access (OWA) clients.
To upgrade an Exchange 2003 cluster, you must use an account that has Exchange Full Administrator permissions on the Administrative Group in which the EVS resides, and your account must be a member of the Local Administrators security group on each node. If the EVS is part of a routing group that's a member of multiple administrative groups, your account must have Exchange Full Administrator rights on all those administrative groups. For more information about these requirements, read the Microsoft white paper "Working with Active Directory Permissions in Exchange Server 2003" (http://www.microsoft.com/technet/prodtechnol/exchange/2003/library/ex2k3ad.mspx). You can simplify permissions management by creating a security group for your Exchange cluster administrators. For example, create an ExchangeClusterAdmins security group, then delegate Exchange Full Administrator rights to that group and add the group to the Local Administrators group on each node. If you need to add a node to the cluster, you need only add the ExchangeClusterAdmins security group to the Local Administrators group on that node, rather than having to add several individual accounts. If you want to revoke someone's permissions for managing the cluster, simply remove the user account from the ExchangeClusterAdmins group. Doing so saves you time—you don't need to remove an individual account from the Local Administrators group on each node to revoke the person's Exchange permissions.
Make a full backup of the cluster before you begin the upgrade. This backup should include file-level backups of each node, system state backup on each node, and full Exchange database backups.
The Microsoft article "How to obtain the latest service packs for Exchange Server 2003" (http://support.microsoft.com/?kbid=836993) contains up-to-date information about obtaining the most recent Exchange 2003 service pack. As you get ready to begin the upgrade, keep in mind that if you've deployed an Exchange front-end/back-end server architecture in which your cluster is on a back-end server or servers, you need to upgrade the front-end servers first.
Applying the Service Pack
The Microsoft article "How to install Exchange Server 2003 Service Pack 1 in a clustered Exchange environment" (http://support.microsoft.com/?kbid=867624) runs through the updated procedures for applying Exchange 2003 SP1 to a cluster. Let's look at the necessary steps in more detail, using a two-node (Node1 and Node2), active/passive Exchange 2003 cluster with one EVS (EVS1). Node1 is the active node and has current ownership of EVS1.
If you haven't already applied hotfix 831464 to Node2, do so, then reboot. When Node2 rejoins the cluster, apply Exchange 2003 SP1 to the node.
To complete the upgrade, take EVS1 offline but leave the Network Name, IP Address, and storage resources that are associated with the EVS online. To do so, right-click the System Attendant resource and select Take Offline, as Figure 1 shows. This action takes offline Exchange resources that depend on the System Attendant (e.g., the Information Store—IS—resource, the IMAP and POP Protocol resources). Move the Exchange cluster group that's associated with EVS1 from Node1 to Node2.
Log on to Node2 and open Cluster Administrator. While the Exchange resources associated with the EVS are offline, right-click the System Attendant resource and select Upgrade Exchange Virtual Server. Note that you can't perform this process from a Cluster Administrator session running on Node1 because the files required for the upgrade procedure aren't yet installed on Node1. The requirement to run an additional Exchange service-pack upgrade procedure from Cluster Administrator is new in Exchange 2003 (it was first introduced as part of the cluster-upgrade procedure from Exchange 2000 to Exchange 2003). When the upgrade process is finished, you should see the message The Exchange Virtual Server has been upgraded successfully.
Bring the Exchange resources back online by right-clicking them and selecting Bring Online. Note that you must manually bring each Exchange resource back online: Bringing the System Attendant online still leaves the dependent resources offline. When all the resources are online, check the Application log for errors and take any necessary action.
At this stage, the binaries for SP1 have been applied to Node2 and the AD objects for the EVS have been updated to show the EVS as running SP1. You can verify these changes by running Exchange System Manager (ESM) and viewing the EVS's build version, as Figure 2 shows. Exchange 2003 SP1 has build number 7226.6.
Node1 is still running an earlier build of Exchange, so you can't move the upgraded EVS to that node. If you attempt to do so, you'll get the error message Version of Exchange on this machine does not match the version of Exchange server NodeName. Apply hotfix 831464 to Node1 and reboot. When Node1 comes back online, apply Exchange 2003 SP1.
To confirm that the upgrade was successful, move EVS1 back to Node1 and verify that Exchange starts correctly on that node. As it is in Exchange 2000, this failover operation is optional because all nodes in the cluster are running at the same revision level, but it helps to verify that everything is installed correctly.
Take a backup of the Exchange cluster nodes, the system state on each node, and a full Exchange database backup. You won't be able to restore database backups that you performed under earlier versions of Exchange. When Exchange 2003 mounts the databases for the first time, it stamps them with information to indicate that the database has been accessed by a new version of the Extensible Storage Engine (ESE), which connects the logic in the IS with the underlying Jet database.
The procedure I've just described is the same for clusters with more than two nodes. However, such clusters have a limitation that governs failovers. Clusters that have more than two nodes can host only one EVS per node at a time. Any attempt to move a second EVS to a node currently hosting an EVS will fail with the error message An error occurred attempting to bring the System Attendant resource online: The cluster resource could not be brought online by the resource monitor. Error ID: 5018 (0000139a). The Microsoft article "XADM: Exchange Virtual Server Limitations on Exchange 2000 Clusters and Exchange 2003 Clusters That Have More than Two Nodes" (http://support.microsoft.com/?kbid=329208) describes this failover constraint.
Let's expand our example to add two more nodes (Node3 and Node4) and a second EVS (EVS2) to our sample cluster. Node3, Node4, and EVS2 are running Exchange 2003 without any service packs. Node1, Node2, and EVS1 have been upgraded to Exchange 2003 SP1, and EVS1 is active on Node1.
Take the EVS2 Exchange cluster resources offline. Move EVS2 from Node3 to Node2. (Remember that you can't move EVS2 to Node1 because that node is currently hosting EVS1.) Right-click the Exchange System Attendant resource for EVS2 and select Upgrade Exchange Virtual Server to update the AD metadata for EVS2. When the upgrade has finished, you'll see the message The Exchange Virtual Server has been upgraded successfully.
Bring the Exchange resources back online by right-clicking them and selecting Bring Online. When all the resources are online, check the Application log for errors and take any action necessary.
EVS1 and EVS2 are now upgraded to Exchange 2003 SP1 and are running on Node1 and Node2, respectively. To complete the upgrade, you need to update the Exchange binaries on Node3 and Node4. Apply hotfix 831464 to Node3 and Node4, then reboot each node. Apply Exchange 2003 SP1 to Node3 and Node4, then reboot each node. When Node3 rejoins the cluster, move EVS2 to Node3. Check the Application log to verify that Exchange is starting correctly. Move EVS2 from Node3 to Node4 to verify that failovers are functioning correctly. Check the Application log to verify that Exchange starts correctly. Finally, perform a full database backup of EVS2 and back up Node3 and Node4.
As "How to install Exchange Server 2003 Service Pack 1 in a clustered Exchange environment" mentions, you might run into a problem or two during the upgrade. Let's take a quick look at these potential gotchas.
If Cluster Administrator's Upgrade Exchange Virtual Server menu option is unavailable, first determine whether the EVS you're trying to upgrade is hosted on a node that hasn't been upgraded. If so, try moving the EVS to a node that has been upgraded. Another possibility is that Cluster Administrator is running on a node that hasn't been upgraded, so try logging on to and running Cluster Administrator on an upgraded node in the cluster. If neither of these actions does the trick, determine whether Cluster Administrator is running on a server that isn't part of the cluster.
If the Upgrade Exchange Virtual Server procedure fails with error ID c1037b44, check to be sure that the EVS is running on a node that's running the most recent Exchange service pack. Also ensure that the following Exchange cluster resources for the EVS are offline:
- Exchange System Attendant resource
- Exchange HTTP Virtual Server Instance
- Exchange Information Store Instance
- Exchange MS Search Instance
- Exchange POP3 Virtual Server Instance
- Exchange Routing Service Instance
- SMTP Virtual Server Instance
Taking the System Attendant resource offline should take the other resources, which depend on the System Attendant resource being online, offline as well. Be sure that the following cluster resources in the cluster group belonging to the EVS are online:
- IP Address resource for the EVS
- Network Name resource
- Physical disk resources used by the EVS
Finally, take a look at the Exchange setup log file (exchange server setup.log, on the same disk as the \winnt or \windows folder). This log is an invaluable tool for troubleshooting installations. A lot of the information in this file will make sense only to an Exchange engineer or developer, but at least you'll have some data about what's happening during the upgrade. Figure 3 shows a sample of the log.
When you compare the cluster upgrade models for Exchange 2000 and Exchange 2003, you can reasonably argue that Exchange 2003 doesn't support rolling upgrades for Exchange service packs: Each EVS must be taken offline to update the AD metadata for the EVS, so you have additional downtime to factor into your SLAs. And even straightforward additional tasks increase the risk that something can go wrong during the upgrade. One way to reduce this risk is to create a test EVS for your cluster. Do so a day or two before you plan to upgrade the cluster so that the EVS's AD information has time to replicate across your Exchange and AD infrastructure. After you upgrade one node in the cluster, you can test the EVS upgrade by using your test EVS before you attempt to upgrade your production EVS.