Breakthrough or heartbreak?
The history of Microsoft Exchange Server clusters is one of peaks and valleys, with cycles of enthusiasm generated by new releases followed by depression as the releases don't work out so well in practice. Exchange Server 5.5 first supported Windows clusters, but the clusters were expensive to deploy and were limited to two nodes configured in an active-passive cluster. Exchange 2000 Server promised clusters spanning as many as four active nodes. However, Microsoft has been forced to rescind that promise and now requires you to maintain a passive node because of memory-fragmentation problems. In addition, you can deploy four-node Exchange 2000 clusters on only Windows 2000 Datacenter Server, so the solution is expensive. The net result is that clusters represent a small percentage of the hundreds of thousands of Exchange servers deployed today. No one is willing to say exactly how many clusters are in production, but anecdotal evidence suggests that fewer than 2 percent of all the deployed Exchange servers are in clusters.
On the surface, clusters seem to be an extremely effective way to achieve high degrees of robustness and reliability. Indeed, UNIX and OpenVMS administrators have been deploying clusters for these reasons for years. So, why haven't Windows administrators been deploying Exchange 2000 and Exchange 5.5 clusters? The reasons why include added complexity, Exchange components that can't run on clusters, the lack of third-party product availability, memory fragmentation, and high costs.
For clusters to work well, the email application and the OS must join forces. So far, Microsoft hasn't been able to make those two entities work together seamlessly. Microsoft promises better clusters with Windows Server 2003 and Exchange Server 2003, but will the company keep that promise? To answer this question, you need to know how clusters work, how Exchange in general operates in a cluster, how Exchange 2000 specifically operates in a cluster, and what improvements Exchange 2003 offers.
Understanding Clusters and How Exchange Operates in Them
The process of setting up and configuring hardware and software for clusters is more complex than that for standard servers. Clusters typically boast multiple NICs. You need at least one NIC for the public network and one NIC for the cluster "heartbeat," which is the network signal between nodes that lets the nodes know that the cluster is alive and well. Clusters use shared storage instead of direct-connected drives because services depend on being able to move data between nodes when problems occur—and the services can't move the data if the data is restricted to a specific server.
Managing clusters differs from managing standard servers. Instead of controlling the set of services for Exchange or other applications through the Computer Management console, you manage them through the Cluster Administrator console, which Figure 1, page 52, shows. In this example, Cluster Administrator shows a set of Exchange services running on a cluster. The console shows the additional resources that combine to form an Exchange virtual server, such as the disks, IP address, and network name.
The concept of virtual servers is crucial to clusters. Exchange runs on a cluster as one or more virtual servers. Each virtual server represents the set of resources (e.g., disks, a network name, the Store) that Exchange needs to provide services to users. Exchange virtual servers run on physical nodes within the cluster. The virtual servers manage the data in mailbox and public stores, which are gathered into storage groups (SGs). An SG is the basic unit of storage for Exchange clusters. If a physical node fails and the cluster has to move work within the cluster, the cluster distributes the SGs from the failed server to other nodes rather than moves individual stores. After the failover, the SGs come under the control of the Exchange virtual server running on that physical node.
After you understand the basic concepts of clusters and how Exchange operates in clusters, you have to face the fact that not all Exchange components can run on a cluster. The reason why is simple. In some cases, the Exchange component is old and wasn't designed to run on anything other than a standard server. Because the component is old and perhaps not needed by the majority of Exchange installations, Microsoft never upgraded the component to support clusters. In other cases, the component is used only in specific circumstances (e.g., for interoperability between Exchange 2000 and Exchange 5.5 servers), so that component doesn't need to support clusters in the long term. Table 1 lists the optional Exchange components and the degree of cluster support for those components.
In the past, some Independent Software Vendors (ISVs) didn't support clusters because these vendors preferred to concentrate on the largest market: support software for standard Exchange servers. The lack of add-on software for clustered Exchange servers caused a problem if organizations wanted to deploy the same antivirus, antispam, and backup software and messaging connectors across both standard and clustered servers. Fortunately, this situation has improved tremendously since Exchange 2000 first appeared, and you now have a reasonable choice of add-ons for clustered servers.
Using Exchange 2000 in Clusters
Exchange 2000 was the first release to support active-active clusters, meaning that every node in the cluster supports an Exchange virtual server at the same time. Unfortunately, active-active clusters ran into virtual-memory fragmentation problems within the Store, and this problem has prevented Exchange 2000 from scaling up as much as it should on a cluster.
As Exchange 2000 runs, Windows allocates and deallocates virtual memory to the Store to map mailboxes and other structures. Virtual memory is sometimes allocated in contiguous chunks, such as the approximately 10MB of memory that's necessary to mount a database. However, as time goes by, providing the Store with enough contiguous virtual memory becomes difficult because the memory becomes fragmented. In concept, this fragmentation is similar to the fragmentation that occurs on disks and usually doesn't cause too many problems, except for cluster state transitions.
During a cluster state transition, the cluster must move the SGs that were active on a failed node to one or more other nodes in the cluster. SGs consist of sets of databases, so the Store has to initialize the SGs, then mount the databases so that users can access their mailboxes. You can track this activity through event ID 1133 in the Application event log. On a heavily loaded cluster, the Store might not be able to mount the databases because not enough contiguous virtual memory is available, in which case you'll see an event such as event ID 2065. Thus, you encounter a situation in which the cluster state transition occurs but the Store is essentially brain-dead because the databases are unavailable. This kind of situation occurs only on heavily loaded systems, but consolidating servers and building big, highly resilient systems are prime driving factors for considering clusters in the first place.
After receiving problem reports, Microsoft analyzed the situation and realized a problem existed when running in active-active mode. Microsoft began advising customers to limit cluster designs and limit the number of concurrently supported clients to 1000 in Exchange 2000, 1500 in Exchange 2000 Service Pack 1 (SP1), and 1900 in Exchange 2000 SP3 and SP2.
The client numbers that Microsoft recommends are based on Messaging API (MAPI) loads. Because MAPI is the most functional and feature-rich protocol, MAPI clients usually generate the heaviest workload for Exchange. Microsoft Outlook Web Access (OWA) clients generate much the same type of demand as MAPI clients. However, other client protocols (e.g., IMAP4, POP3) typically generate lower system demand and can result in a lesser workload for the server. So, organizations might be able to support more client connections than the number of clients Microsoft recommends before the virtual-memory problem appears.
Exchange 2000 SP3 includes a new virtual-memory allocation scheme for the Store. This new scheme changes the way in which Windows allocates and deallocates memory. Experience to date demonstrates that servers running SP3 encounter fewer memory problems on high-end clusters. Thus, Microsoft highly recommends that organizations with large clusters upgrade to Exchange 2000 SP3 or, even better, upgrade the OS to Windows 2003 and deploy Exchange 2003, which better manages memory.
The problems with virtual-memory management have forced Microsoft to express views about how to set up active clusters. Essentially, Microsoft's advice is to keep a passive node available whenever possible, meaning that a two-node cluster should run in active-passive mode and a four-node cluster should have three active nodes and one passive node.
Virtual memory begins to decline as the load on a cluster grows. Exchange logs event ID 9582 when less than 32MB of available memory is present, then flags the same event when no contiguous blocks of virtual memory larger than 16MB exist inside the Store. After Exchange reaches this threshold, the cluster can become unstable and stop responding to client requests, and you must reboot. You might also see event ID 9582 in two other situations:
- Event ID 9582 might appear immediately after a failover to a passive node if the passive node previously hosted the same virtual server. Each node maintains a stub store.exe process, and the structures within the process might have already been fragmented, leading to the error. If this error occurs, you can transition the virtual server to another node in the cluster, then restart the server that has the fragmented memory. If a passive node isn't available, you have to restart the active node. Exchange 2000 SP3 generates far fewer problems of this nature, so you're unlikely to see event ID 9582 triggered under anything but extreme load.
- Incorrect use of the /3GB switch in the boot.ini file can generate event ID 9582. If you're hosting Exchange 2003 on a Windows 2003 server that has more than 1GB of physical memory, you should set the /3GB switch and its associated /Userva= switch in the boot.ini file so that Windows 2003 has a better balance in its allocation of resources between kernel- and user-mode memory. For more information about these switches, see the Microsoft article "XADM: Event Viewer Log Entries Cite Virtual-Memory Fragmentation on an Exchange 2000 Server" (http://support.microsoft.com/?kbid=314736) and "XADM: Using the /Userva Switch on Windows 2003 Server-Based Exchange Servers" (http://support.microsoft.com/?kbid=810371).
Note that some third-party products, particularly virus checkers, can affect virtual-memory usage. The sidebar "Monitoring Virtual Memory," page 56, discusses how to determine the amount of virtual memory a third-party product uses as well as how to monitor the amount of available virtual memory in a cluster.
What's Changed in Exchange 2003
Although you can deploy Exchange 2003 on Windows 2000 servers, Microsoft is fond of saying that Windows 2003 and Exchange 2003 are better together and deliver the optimum functionality because they're designed to work as a team. This statement is true in many respects but is especially true for clusters. I wouldn't recommend deploying an Exchange cluster on anything but Windows 2003 servers. Here are the major improvements in Windows 2003 and Exchange 2003 clusters:
- With Windows 2003 and Exchange 2003, you can configure eight-node clusters. Compared with four nodes, eight nodes provide a lot more flexibility in how you can lay out the servers within a cluster and the roles that the servers take. However, at least one node has to be passive if the cluster supports numerous clients, many connectors, or a heavy processing load. Clusters that support a small number of clients and perhaps run only one SG with a few databases on each active node can typically operate in a fully active mode because virtual-memory fragmentation is less likely to occur.
- The dependency on Datacenter is gone, so you can now deploy clusters without the additional expense that Datacenter introduces.
- Windows 2003 and Exchange 2003 better control virtual-memory fragmentation, which increases the number of MAPI clients that a cluster can support. Windows 2003 and Exchange 2003 also make better use of large amounts of memory (i.e., more than 1GB) when that memory is available to a server. No formal testing has yet established how many concurrent MAPI clients Exchange 2003 supports before it runs into the virtual-memory fragmentation problem, but the fact that Microsoft has deployed clusters that support 4000 mailboxes per node reveals that the limit is high. If a passive node is always available in an active-passive configuration, clusters can support numerous users per active node—perhaps as many as 5000 mailboxes per node. The exact figure depends on the system configuration, the load that the users generate, the types of clients used, and careful monitoring of virtual memory on the active nodes as they come under load.
- You can use drive mount points (otherwise known as NTFS mounted drives) to eliminate the Win2K—Exchange 2000 restriction on the number of available drive letters, which limits the number of available disk groups in a cluster. This improvement is important when you deploy more than 10 SGs across multiple cluster nodes.
- Because of Exchange 2003's new resource-dependency model and some tweaks in the way that Exchange 2003 manages failover, Exchange 2003 appears to be faster than Exchange 2000 at transitioning SGs from failed servers to active nodes when problems occur.
- Microsoft made tweaks to Exchange 2003's management interfaces to make life easier for administrators. For example, as Figure 2 shows, the Exchange System Manager (ESM) console now displays details of server types, so you know immediately whether a server is running on a cluster.
- Assuming that you use appropriate hardware and backup software, you can use Windows 2003's Volume Shadow Copy Service (VSS) API to take hot snapshot backups. This improvement is crucial because clusters can't attain their full potential if administrators limit the size of the databases. Limiting the size of databases limits the number of mailboxes that a cluster can host. However, vendors have been slow to ship VSS-compliant products, so don't depend too much on this feature until you see solid products appear.
- The Recovery Storage Group feature lets administrators recover from individual database failures quickly and without having to deploy dedicated recovery servers. This feature is also available when you deploy Exchange 2003 on Win2K servers.
VSS and the Recovery Storage Group aren't cluster-specific features. However, both contribute to higher levels of service availability and reassure administrators who worry that consolidating many standard servers into a large cluster might be putting all their eggs into one basket.
As you can see, these improvements address many of the reasons why Windows administrators haven't implemented clusters. Some of the other reasons why administrators haven't considered clusters are going away on their own. For example, as organizations complete their Exchange 2000 deployments or finish migrating from a legacy email system, they no longer need some of the older Exchange components that can't run on a cluster. Even some of the newer components introduced in Exchange 2000 that couldn't run in a cluster have been replaced by new products that can run in a cluster. All the core Exchange 2003 components can run in a cluster. Thus, problems with noncluster compliance are now likely to be found in only third-party products. And even that problem is going away because, as I mentioned previously, an increasing number of ISVs are offering add-ons for clustered servers.
The Microsoft Experience
Surprisingly, Microsoft never used clusters in its Exchange deployment in the past, but that situation changed dramatically with the arrival of Exchange 2003 and Microsoft's server consolidation program. Microsoft has replaced its old set of standard Exchange servers with a new set of large clusters. The most interesting configuration is the data-center design, which supports 16,000 mailboxes spread across four Exchange virtual servers. The cluster consists of seven physical nodes, four of which handle the load that the four mailbox servers generate. Another server is passive, waiting to spring into action should one of the mailbox servers fail. The mailbox servers (and the passive node) boast substantial power: They're HP ProLiant DL580G2 models with quad 1.9GHz Intel Xeon III processors, 4GB of memory, and a 400MHz front-side bus. Microsoft has enabled hyperthreading on these servers and reports that this feature provides an increase of about 20 percent of CPU headroom. Microsoft follows its own advice and tunes the servers by setting the /3GB switch and setting the /Userva= switch to 3030 (this value is in megabytes) in the boot.ini file.
The two remaining servers—each of which is an HP ProLiant DL380G2 model with two CPUs and 2GB of RAM—handle backup and other administrative functions. Because they're auxiliary servers and don't host Exchange, these servers have lower-specification configurations.
Microsoft's standard mailbox quota went from 100MB to 200MB, although considerable variation exists in actual quotas based on business demand. Not surprisingly, Microsoft has a lot of mailbox data to back up daily. The best backup solutions can stream data to tape as fast as 100GB per hour, but this rate isn't satisfactory for a 16,000-mailbox cluster. For a cluster this size, you want the mailbox servers delivering the best possible response to users and not handling the load that tape backups generate. Microsoft's operations team solved the backup problem by first backing up the disks to volumes that are temporarily available to the mailbox servers. After the disk backups are finished, the volumes are failed over to the auxiliary nodes and moved to the control of the two auxiliary servers, which copy the backup to tape. Moving disks between servers in this fashion demonstrates how to use cluster features to solve administrative problems. In the future, Microsoft plans to deploy VSS-enabled hot snapshot backups in addition to hot snapshot backups. (Hot snapshot backups are designed to complement, not replace, tape backups.)
An HP StorageWorks Enterprise Virtual Array 5000 (EVA5000) Storage Area Network (SAN) manages all the storage. This SAN is an important contributor to Microsoft's large cluster because of its redundancy and management features. Because SGs, transaction logs, and SMTP work directories all require drives, Microsoft's large cluster has numerous drives. Microsoft heavily uses mount points to get around the drive-letter limitation that would otherwise render these drives almost impossible to configure in a satisfactory manner.
I haven't heard of similar clusters running in production, so Microsoft might take the blue ribbon for large Exchange clusters with this design. Microsoft reckons that it achieves a better service level with the cluster than it achieved with standard servers. The company says it reached the "four nines" territory (i.e., 99.99 percent availability). However, this feat hasn't been independently audited because Microsoft deployed the cluster with beta versions of Exchange 2003, then continually upgraded the software until the final release. Getting anywhere close to such an uptime record would be remarkable.
Don't Be a Fool
Only fools rush in and deploy clusters. Administrators who plunge into cluster deployment without investing the necessary time to research, plan for, and design clusters generally encounter problems. Exchange 2003 clusters are more complex than standard Exchange servers, and experience demonstrates that you must carefully set up and manage Exchange 2003 clusters to generate the desired levels of uptime and resilience. Only those administrators who take the time up front to properly design clusters and successfully manage those clusters will likely achieve their desired results.
The improvements in Exchange 2003 and Windows 2003 are helping make Exchange clusters a viable option. The early reports of successful deployments of Exchange 2003 clusters, including Microsoft's deployment, are encouraging. The challenge for Microsoft now is to continue driving complexity out of clusters so that installing and managing clusters is as easy as installing and managing standard servers. That day is not yet here.
Still, I remain positive about clusters. Clusters do a fine job, provided that administrators carefully plan cluster configurations and then appropriately manage the clusters. However, the road to clusters has been bumpy, and Microsoft didn't keep its promise about clusters in the past. Work is continuing to improve clusters, but in the interim, if you're interested in clustering Exchange servers, you need to consider all options before making a final decision.