Microsoft Cluster Server (MSCS—aka Wolfpack) introduced the concept of server clusters that support application high availability to the Windows arena in the days of Windows NT 4.0. Clusters have come a long way since then. Today, in addition to the Microsoft Cluster service offering, many third-party vendors offer mature clustering products that make implementing a high-availability solution a much less grueling experience than it used to be. Let's look at the architectures and technologies that these third-party products employ.

High-Availability Architectures
High-availability server clusters have in common a concept of multiple servers, each communicating with one another and acting as cluster nodes. When one node's failure causes an application to fail, another cluster node takes over processing the application and end users can continue working. For this process to work, the application must run properly on both the primary node (on which the application runs before the failure) and the standby node (on which the application runs after the failure). Application data is central to any application, and the method for letting both the primary node and the standby node access application data is a key distinguishing factor among high-availability architectures.

Microsoft Cluster service and some of its competitors support a shared-storage configuration, in which the nodes of a cluster have connections to the same disk storage systems—most often a Storage Area Network (SAN). Microsoft Cluster service also supports SCSI-based two-node clusters, in which each node connects to the same SCSI bus. Whether SAN or SCSI-based, each node can view and use the volumes defined on the attached storage as if they were volumes on a locally installed disk drive. When the primary node fails, the standby node takes control of the data volume (and other resources associated with the application, such as IP addresses, NetBIOS names, and share names) and starts the application. These systems must arbitrate disk-volume ownership to ensure that only one server is able to write to the volume at a time. Successful operation of a shared-storage cluster requires hardware components that work well together. For Microsoft Cluster service, Microsoft certifies certain hardware configurations, which it publishes in the cluster section of the Windows Catalog (for Windows Server 2003) or Hardware Compatibility List (HCL—for Windows 2000 Server). Links to these lists are available at Some of the other clustering products discussed here also require Microsoft-certified hardware, although not always the cluster-certified configurations.

Shared-storage configurations avoid data consistency and integrity problems because all nodes work with the same copy of the data. However, shared-storage configurations are typically more expensive than replication-based configurations (which I discuss later), and a shared SAN or SCSI bus imposes strict distance limits. Unless you use high-end SAN hardware with WAN mirroring features, all the nodes of a cluster must be at the same site; thus, you're susceptible to a disaster that affects more than just one cluster node.

Replication-based high-availability solutions use data replication and mirroring to copy application data across a network to storage controlled by another server. Because the data traverses a network, transactions require time and network bandwidth; thus, data consistency and integrity can be a problem. On the plus side, replication-based solutions don't require cluster-certified hardware configurations and replicated cluster nodes can play a part in a disaster-recovery scenario.

Replication, mirroring, and synchronization all refer to the copying of data from a source to a target, but people don't always use the terms exactly the same way. Replication always means the ongoing copying of a write request to a backup volume. Mirroring and synchronization often refer to the initial instance of copying data from a source storage device to the target storage. Sometimes, though, mirroring also refers to the ongoing process of keeping the source and target in sync as normal application processing occurs. Resynchronization refers to the process that brings a mirrored copy up-to-date following some kind of break in communication between the two servers—for example, as part of a failback procedure following a failover. A partial resynchronization can occur if the server running the application keeps track of which blocks of data on a volume have changed—only the changed blocks need to be sent to the mirrored volume. Partial resynchronization can be much faster and use less WAN bandwidth than a full remirroring of a volume.

Synchronous and asynchronous replication describe when the data is actually written to the target volume. In synchronous replication, I/O system drivers intercept write requests to the source volume and send a copy of the write requests to the target system. Only after the source system receives confirmation that the data was successfully written to storage at the target does the source system write the data to its own storage. This approach ensures that the target system always has a complete and accurate copy of the source system’s data. Synchronous replication can be slow, especially when the target system sits on the other end of a WAN link. Slow I/O leads to poor application response time, so administrators and application designers must carefully evaluate where synchronous replication is appropriate in a high-availability strategy.

Asynchronous replication affords greater speed at the cost of more uncertainty about the state of the data at the target volume. In asynchronous replication, the source volume completes writes immediately without waiting for confirmation of successful writes at the target system. Writes are sent to the target volume according to rules implemented by the replication engine. Many replication engines send and complete writes to the target volume as quickly as server resources and network bandwidth allow. Other replication engines can limit the network bandwidth used by replication traffic or schedule replication. VERITAS Software's VERITAS Storage Foundation for Windows supports what the vendor calls soft synchronous replication, which uses synchronous replication until the network path used for replication fails, then starts queuing the write requests so that application processing can continue.

Which data does replication send to the target volume? Disk volumes are divided into data blocks of a uniform size. The file system (e.g., NTFS, FAT) decides how these data blocks are used. Some blocks hold file data. Others hold file metadata—that is, filenames, security attributes, file data block locations. Still other blocks hold volume metadata, such as the volume name, which blocks are in use or free, or which blocks are bad and unusable. Some replication engines intercept writes to the volume at the block level. Block replication engines don’t care what's stored on the volume or even about the volume organization. They seek to ensure that block for block, the target volume is identical to the source volume. This relatively simple replication scheme ensures that all data on the volume is replicated to the target volume. However, an application often doesn’t use all the data that the volume holds, and sending it uses up valuable bandwidth.

Enter file-based replication systems, which intercept writes at the file-system level rather than at the volume block level. Replication engines that work at this level know just which parts of a file have been changed and send only those parts across the network. These engines can also evaluate filenames, so you can create lists of files to include in or exclude from replication, reducing the demands replication makes on network bandwidth.

All the replication-based products support failover to cluster nodes located at a remote site (i.e., stretch clusters). For a look at the challenges that stretch clusters present and how high-availability products address them, see the sidebar "Remote Cluster Considerations."

As I've mentioned, high-availability solutions restart an application quickly after a system failure causes the application to stop processing. Another category of solutions—fault-tolerant solutions—lets an application continue after a failure without even a momentary interruption. In this review, I look at high-availability solutions from six vendors and one fault-tolerant solution.

Marathon Endurance Virtual FTserver
Marathon Technologies' Marathon Endurance Virtual FTserver is a software product that turns a pair of identical Intel-based servers into one fault-tolerant computing system. Endurance is built on the same foundation as Marathon’s Endurance 6200, which I reviewed in "Endurance 6200 3.0," July 2001, InstantDoc ID 21140.

Endurance works by running an application on both servers, which Marathon calls CoServers, in instruction lockstep—meaning that both servers execute the same machine instruction at the same time. To achieve this lockstep, the servers must have identical hardware and configurations. Currently both CoServers must run Win2K Advanced Server (AS) Service Pack 4 (SP4) in a dual-processor system. Endurance 6.0, which will be available shortly, will support Windows 2003 and CoServer configurations that have one hyperthreaded processor.

A minimum configuration requires two disk drives on each server: one to boot Windows and the other to boot the Virtual FTserver OS. Each server has a pair of Gigabit Ethernet adapters that's dedicated to communications between the two CoServers. Another Ethernet adapter in each server, called the CoServer Management Link, lets an administrator communicate directly with the CoServer. You can configure up to four additional Ethernet adapters in each CoServer for public network communication—these are the network links that user and application communication traverses.

When the two Endurance CoServers boot up, the first server to start waits for the second to establish communication over the two CoServer links, named CSLink1 and CSLink2. After communication is established, the first server starts Virtual FTserver, the protected virtual server that runs applications. When initialization of Virtual FTserver has been completed on the first server, Endurance copies the context of the first server—including the current state of virtual server memory and the next instruction scheduled to execute—to the second server, momentarily stopping the first server to complete the synchronization. When the servers are synchronized, they begin redundant, fault-tolerant operation, executing in lockstep. If a CoServer isn't available (e.g., it's down for maintenance or repair), the administrator can mark it disabled, and Endurance will run the functioning Virtual FTserver in non–fault-tolerant mode.

Endurance supports as much local disk storage as the physical server configuration supports. Mirrored pairs of disks, one in each CoServer, provide fault tolerance. The Endurance Device Redirector presents the virtual server with a single-disk view of the mirrored pair and manages the mirrored operation of the physical disks. After running in single-server (non–fault-tolerant) mode, Endurance performs a rapid remirror by copying only the disk sectors that were modified during single-server operation.

You use the Endurance Manager GUI, which Figure 1 shows, to monitor and manage Endurance. Endurance Manager runs on Windows XP and Win2K workstations that can communicate with the NICs configured as the CoServer Management Links. In addition, the MTCCONS command lets you perform routine tasks either from a CoServer or remotely.

Installing Endurance is a rather lengthy procedure but is well documented in Marathon’s Installation Guide. At its conclusion, Virtual FTserver is running in fault-tolerant mode as a virtual server that uses both CoServers. You control the virtual server by using Virtual FTserver Desktop (aka Endurance Desktop) from either one of the CoServers. Endurance Desktop operates similarly to remote control applications—you interact with the virtual server when Endurance Desktop has input focus and with the CoServer when the input focus is outside Endurance Desktop.

Marathon Technologies sells Endurance Virtual FTserver only through authorized resellers at a suggested retail price that starts at $12,000 for a uniprocessor version with 1 year of support and upgrades. Fully configured systems sell for less than $20,000.

PolyServe Matrix Server
PolyServe Matrix Server is the first Windows-oriented product from a company that has been serving the Linux market for several years. Matrix Server combines the PolyServe SAN File System (PSFS) with the features of PolyServe Matrix HA, a separate product that provides clustering functionality.

A PolyServe matrix consists of as many as 16 servers connected to SAN-based data storage. PSFS is at the heart of Matrix Server and lets you grow application processing power by adding nodes to the cluster in lieu of using larger (and more expensive) SMP servers. PSFS supports concurrent access to shared data by multiple cluster nodes with a distributed lock manager that coordinates file updates. Full journaling of data updates promotes rapid recovery from hardware failures. Because Matrix Server doesn't require a master/slave relationship between nodes, the administrator is free to configure failover of an application to any other node in the cluster. Matrix Server supports multiple host bus adapters (HBAs), redundant SAN switches, and multiple NICs for enhanced node availability. Integrated fabric management supports Brocade Communications Systems and McDATA switches and automatically adjusts cluster-node configuration in response to both data-path failures and reestablished data paths. Matrix Server supports most IA-32 servers from major manufactures and isn't limited to hardware on Microsoft’s cluster-certified list.

The PolyServe Management Console (mxconsole.exe), a Java-based GUI, is the primary administrative interface. As Figure 2 shows, the Management Console allows management of all cluster nodes from a central management station. A command-line interface, mx.exe, supports scripted operation. Currently, no SNMP interface or Windows Management Instrumentation (WMI) provider is available.

Matrix Server lets administrators add or delete nodes from a cluster while other nodes of the cluster continue to operate normally—no halting or pausing is necessary. For enhanced security and performance, administrators can specify which network or networks Matrix Server will use for cluster management traffic. Also, access to administrative functions is password-protected. Only the primary administrative user can change the matrix configuration. Other users, created by using the Matrix Server UserManager or the Mxpasswd command, can only view the matrix configuration. A flexible event-notification system can send administrators event information through email messages, pages, the PolyServe Management Console, or another user-defined process.

Matrix File Serving Solution Pack for CIFS. The Matrix File Serving Solution Pack for CIFS supports scalable and highly available Common Internet File System (CIFS) file shares in one of two modes: connection load balancing that uses Matrix File Shares or with failover support that uses Virtual File Servers. Either implementation allows access to file data stored on a Matrix Server PSFS cluster file system. A Virtual File Server, as the name implies, associates the CIFS share name and associated IP address with the active node of a Matrix Cluster, allowing the name and IP address to fail over to a standby server when necessary. When you use Matrix File Shares, each node of a cluster accesses the same shared PSFS file systems and clients access the data through each node’s name or IP address. In this mode, an administrator creates a DFS root that specifies a root share location on each of the nodes, and when clients access the data using the DFS share name, DFS balances the connections between the cluster nodes and detects when a node is no longer available. Clients must access a Virtual File Server by using the DNS host name or IP address—NetBIOS names are supported only by Matrix File Shares. Matrix Server costs $3,000 per physical CPU in the server.

LEGATO Co-StandbyServer AAdvanced and LEGATO Automated Availability Manager
LEGATO Software offers two similar clustering products. The bulk of this section describes LEGATO Co-StandbyServer AAdvanced, which provides application failover support for two-node clusters on standard server hardware (Microsoft-certified cluster hardware isn't required) running Window 2003 or server versions of Win2K. I discuss LEGATO Automated Availability Manager (AAM), which supports clusters that have as many as 100 nodes and other advanced features, later in this section.

Co-StandbyServer AAdvanced supports both shared storage and replicated data configurations. Nodes of a cluster belong to a cluster domain—a LEGATO concept unrelated to Windows domains. Agent software runs on each node. Primary agents maintain a copy of the replicated database containing information about the nodes, resources defined in the domain, and their current state. Both primary and secondary agents manage and monitor cluster resources on their node, with secondary agents sending the information to a primary agent. Because a functioning primary node is necessary for a cluster to work, LEGATO recommends that both nodes of a Co-StandbyServer AAdvanced domain run the primary agent and that you reserve secondary agents for some nodes of the larger cluster domains that AAM supports.

Co-StandbyServer AAdvanced monitors the health of an application by tracking services that the application uses. Both existence monitors (that simply verify that a service is running) and response monitors (that can query a service for an expected response) are supported. An Availability Tracking feature supports service level reporting for application uptime. Co-StandbyServer AAdvanced supports both active-passive cluster configurations (in which only one server is hosting applications and the other server is ready to accept failover) and active-active configurations (in which both servers are hosting applications which can fail over to the other server).

Like Microsoft Cluster service, Co-StandbyServer AAdvanced lets you define managed resource groups that comprise resource objects. Resource objects can include NetBIOS names used as node aliases, IP addresses, storage resources, and services. Co-StandbyServer AAdvanced supports several types of storage: shared disks, Co-StandbyServer AAdvanced mirrored volumes, LEGATO RepliStor replicated volumes, Windows shares, VERITAS VxVM volumes, and EMC's SRDF mirrored volumes. You can configure how Co-StandbyServer AAdvanced should attempt to recover when a resource group fails—for example, by attempting to restart a failed service or by moving an IP address to a backup NIC connected to the same subnet. You can also associate Perl scripts with resource groups to perform additional tasks when the resource group starts up or shuts down.

Three types of communications support Co-StandbyServer AAdvanced cluster operations: a domain network, a verification network, and isolation detection. Normal cluster communications occur over cluster domain networks, which LEGATO recommends be private networks dedicated to cluster communications. Co-StandbyServer AAdvanced uses a verification network to verify the state of the nodes when communications fail on the domain network.

Administrators can also configure other IP addresses on the public network that the agent on a node can ping to determine whether that node has been isolated from the network. In a replicated data cluster configuration, if a cluster mistakenly concludes that an active server has failed and starts an application on a secondary server while the primary server is still processing the application, updates to both copies of the data can occur. This circumstance, often called split-brain processing, can have serious consequences because neither copy of the data is complete. Co-StandbyServer AAdvanced’s isolation detection features protect against this condition. When a standby node realizes that it can't communicate with an active node, it pings other IP addresses on the network. If the pings fail, the standby node knows that it—not the active server—has a problem and doesn't initiate failover. The node can then run an isolation script, which might take actions to protect the application or attempt to recover the server.

Co-StandbyServer AAdvanced triggers failover when a primary agent fails to receive heartbeats from the active node for a user-specified length of time and the active node fails to respond to ping requests over the verification network. Administrators can optionally configure failover to occur when an unrecoverable failure occurs to a resource object.

LEGATO Availability Console (aka the management console) is a Java-based GUI that you use to monitor, configure, and manage Co-StandbyServer AAdvanced and AAM clusters. Figure 3 shows protected Microsoft Exchange Server resources in an AAM environment. Using the GUI, an administrator can grant other users access to management console functions at one of three levels: User-level access allows viewing of node status only, operator-level access authorizes a user to manage nodes and resource groups, and administrator-level access adds the ability to create and modify configurations. The administrator can also restrict the management console to run only from a particular Windows domain or workstation. A command-line interface program provides access to management console functions, with additional support for debugging and batch processing.

Co-StandbyServer AAdvanced performs block-level synchronous data mirroring between a FAT or NTFS volume on the source server and a volume of the same type and drive letter on the target server. Mirroring is bidirectional: Blocks written to either server are mirrored to the other. Co-StandbyServer AAdvanced intercepts writes to the source volume, writes the changed blocks to the target volume, and waits for the target write to finish before committing the write to the source volume. Because a write to a remote volume takes some time to finish (this lag time is known as the write latency), LEGATO recommends that the standby server be no more than 10km from the primary server. LEGATO recommends using a dedicated network link for replication traffic but doesn't require it.

AAM. AAM uses the same architecture as Co-StandbyServer AAdvanced but supports as many as 100 nodes and has many additional features. You can use a system of rules supported by triggers, sensors, and actuators to automate AAM responses to a variety of situations. A trigger can be fired according to a preset schedule, in response to a monitored event, in response to an event-log entry, or on demand. A sensor monitors some aspect of a system or application and fires a trigger. AAM includes a software development kit (SDK) that you can use to generate custom sensors and actuators. Actuators, modules invoked by rules, communicate with AAM through APIs included with the SDK and can control the clustered application.

Co-StandbyServer AAdvanced pricing starts at $5000 for a pair of servers; additional charges depend on the applications being protected. AAM starts at $3000 per server; additional charges depend on the applications being protected and the size of the servers.

LifeKeeper for Windows 2000
SteelEye Technology’s LifeKeeper for Windows 2000 has a venerable history. AT&T's Bell Labs developed the original version to support crucial telephone-related systems. NCR further developed the product and sold it to SteelEye. SteelEye markets versions of LifeKeeper for Linux as well as for Windows 2003, Win2K, and NT, including both server and workstation versions of Win2K and NT. I describe LifeKeeper for Windows 2000 4.2 in this article, but the various versions of LifeKeeper have similar feature sets. Note that all the nodes in a cluster must run the same LifeKeeper version and the Win2K GUI can monitor, but not control, clusters running other LifeKeeper versions.

LifeKeeper supports failover for SCSI and Fibre Channel shared-storage server configurations. When combined with LifeKeeper Data Replication (also known as Extended Mirroring), LifeKeeper supports both local and remote replicated-data failover configurations. In configurations with SCSI shared storage, LifeKeeper supports two- to four-node clusters. In configurations that use a Fibre Channel SAN, LifeKeeper supports clusters with as many as 32 nodes. When using replication, LifeKeeper supports two-node clusters. SteelEye supports LifeKeeper on any hardware on the Microsoft HCL—the hardware need not be Microsoft cluster certified.

LifeKeeper consists of several components: LifeKeeper Core Software includes failover support for volume, IP address, NetBIOS name, and generic application resources. Core Software uses a configuration database, which LifeKeeper replicates to other nodes in the cluster and in which LifeKeeper records information about protected resources, dependencies, recovery scripts, and the current state of cluster and node operation. A communications manager determines the status of servers in the cluster and signals a server’s failure when the manager loses all communication with the server. The Alarm interface triggers events when LifeKeeper detects a resource failure and works with the Recovery Action and Control interface to locate and run the appropriate recovery script.

Optional licensable Application Recovery Kits (ARKs) let you configure failover support for a variety of applications. An Application Protection Agent, supplied with each ARK, monitors the health of the application and the server resources required to support it. Kits are available for Microsoft IIS, SQL Server, Exchange, IBM's DB2, Oracle9i, and many other applications. When no recovery kit exists for your application and the Generic ARK that's included in the basic LifeKeeper license doesn’t meet your needs, a LifeKeeper SDK helps you create your own recovery kit.

Each node in a LifeKeeper cluster monitors other cluster nodes by using a heartbeat mechanism. SteelEye recommends that each node in a cluster have more than one communication path to the other cluster nodes. That way, when one path fails, the next priority path can take over. When a standby server receives no heartbeat from the primary server over any path, LifeKeeper performs a safety check to ensure that the primary server is actually down. Only after verifying that the primary server is down does LifeKeeper initiate failover.

A node can run multiple applications. From LifeKeeper’s perspective, each application is a group of resources ordered to form a resource hierarchy. Administrators can configure the various resource hierarchies on a node to failover to different standby servers—that is, LifeKeeper supports a one-to-many failover architecture.

SteelEye touts ease of use as a key product feature. A wizard guides you through the node configuration process, and the ARKs install the resource types that the application needs. The LifeKeeper GUI client, a Java application running on the cluster node or accessed remotely by using a Web browser, lets administrators manage clusters from any workstation. Figure 4 shows the status of SQL Server resources in a two-node cluster. The LifeKeeper Configuration Database (LCD) and LifeKeeper Configuration Database Interface (LCDI) commands let administrators script management functions.

LifeKeeper’s replication engine performs block-level mirroring and replication for specified logical volumes. The volumes on the source and target servers must be NTFS format, have the same drive letter, and reside on Basic, not Dynamic disks. LifeKeeper monitors I/O writes to the source volume and forwards updated blocks to the target volume. LifeKeeper lets you choose synchronous or asynchronous replication, lets you schedule asynchronous replication for a particular time of day, and provides scripts that support a disk-to-disk data backup strategy.

LifeKeeper starts at $3000 per server. A complete Exchange solution costs about $5900 per server.

VERITAS Storage Foundation HA 4.1 for Windows
VERITAS Storage Foundation HA 4.1 for Windows is an enhanced and repackaged product that incorporates the features of VERITAS Cluster Server and technology drawn from other VERITAS products. VERITAS supports four cluster configurations. For a pure high-availability solution, Storage Foundation HA supports local clusters that use shared storage, the architecture supported by Microsoft Cluster service. For what VERITAS describes as a metropolitan-area disaster-recovery solution, Storage Foundation HA supports cluster nodes that include SAN-based mirroring solutions such as those offered by EMC, Hitachi, and IBM. Storage Foundation HA also supports replication-based disaster-recovery configurations when implemented with VERITAS Volume Replicator. Combining local high availability with remote disaster recovery, Storage Foundation HA and VERITAS Global Cluster Manager connect two local clusters in different locations so that an application can fail over to another local node for high availability and fail over to a remote cluster when the primary site is down. Global Cluster supports configurations in which primary site data is replicated to the cluster at the backup site by using either the Volume Replicator or SAN-based storage mirroring systems. Storage Foundation HA supports clusters of as many as 32 nodes.

VERITAS supports any server hardware found on the Microsoft HCL—it doesn't require cluster-certified server configurations. VERITAS does require that customers use SAN systems that have passed its hardware testing to ensure that Storage Foundation HA's dynamic multipathing features will function properly and to ensure that Storage Foundation HA will work reliably when customers use SAN-switch–based multipathing features.

Storage Foundation HA uses a standard InstallShield interface for the initial installation and can push code out to other cluster nodes. Storage Foundation HA's incorporation of other VERITAS products has simplified the installation process for many configurations.

You can choose from three administrative interfaces—a Java-based GUI interface, a Web-based interface that communicates with an Apache-based Web server installed with the product, and a command-line interface. Figure 5 shows the Web interface displaying the status of two clusters. All three interfaces offer full functionality, including the ability to remotely administer multiple clusters. You control access to administrative functions by using a new component, VERITAS Security Services, that lets you grant authority over clusters to VERITAS-defined users and groups (which you can associate with existing Windows users and security groups). Security Services also supports other Lightweight Directory Access Protocol (LDAP) security services, such as Network Information Service (NIS), in multi-OS environments.

Storage Foundation HA’s Volume Replicator performs block-based replication—intercepting and duplicating I/O to a logical volume—and includes features of VERITAS Volume Manager, now renamed VERITAS Storage Foundation. Storage Foundation HA supports both synchronous and asynchronous replication. A soft-synchronous mode writes in synchronous mode until the communication link to the remote volume fails. When remote writes are uncompleted for a user-configurable length of time, the replication engine begins to queue writes until communication to the remote volume is restored.

After a failover, replicating data back to the primary node so that the application can again run on that node can take a long time. Storage Foundation HA tracks which blocks of a logical volume are updated following a failover and, when the time comes to fail back, replicates only the changed blocks back to the primary node.

Storage Foundation HA includes features that help you set up replication. Given sufficient unallocated volume space on the target system, Storage Foundation HA automatically creates volumes to match the volumes on the source system. Storage Foundation HA is aware of a variety of both hardware- and OS-based data mirroring products and distinguishes between a mirror broken by administrative action or by an error event.

Storage Foundation HA virtualizes a variety of cluster node resources—including IP address and host name-assigning them to the backup node at failover. Storage Foundation HA supports workload management features, so you can specify a minimum level of free resources that a node must have before Storage Foundation HA will consider it an eligible failover node. Storage Foundation HA costs $3995 for Windows 2003 Standard Edition and Win2K Server, $4995 for Windows 2003 Enterprise Edition and Win2K AS, and $49,995 for Windows 2003 Datacenter Edition and Win2K Datacenter Server.

BrightStor High Availability
Computer Associates' (CA's) BrightStor High Availability is a replication-based application failover system designed for use with standard server hardware running Windows 2003, Win2K Server, or Win2K AS. Process-monitoring technology from CA's Unicenter, combined with a heartbeat mechanism, monitors the health of primary and standby servers at several levels. Configurable IP ping-based checking distinguishes server-based communication failures from network-related problems and verifies that the communications path to the servers are working. Application-specific monitoring checks the health of the protected application service or process. When BrightStor detects a failure, it attempts to restart a failed service or process before declaring failure and initiating a failover. When the primary server becomes active again after a failover and sees that the backup server is active, BrightStor deactivates the primary’s resources that failed over to the backup server (e.g., IP addresses, server names). This feature helps to avoid split-brain processing.

BrightStor uses protection tasks to define application replication and failover parameters. When creating a protection task for an application, you specify the application’s executable files and installed services, the source and destination for replicated data if any, IP addresses that will fail over to the stand-in server, and addresses that the out-of-service primary server will use. BrightStor also supports applications that replicate their own data. For some applications, including SQL Server and Exchange, BrightStor can automatically detect application parameters and configure itself for replication and failover. BrightStor supports another list of applications, including some non-Microsoft email and database management systems (DBMSs), with a set of user-customizable scripts that make setup easier.

BrightStor has three modes of operation. Full Protection causes the protected application to fail over to the stand-in server when failover conditions are met. In Data Protection Only mode, the protection task replicates data to the secondary server until a failure, but the application doesn't fail over. Failover Only performs failover when the conditions are met but doesn't perform data replication. BrightStor also supports failover to nodes residing on remote subnets. For more information about BrightStor's methods for failing over server IP addresses, see "Remote Cluster Considerations."

BrightStor's Alert Services component lets you configure who will receive notification of failover events and of service and process failures that the protection task monitors. Alert Services can send notifications through SNMP traps, email messages, or pages.

For a successful failover to occur, the protection task must be running on the primary server, and BrightStor High Availability Manager must be running on the secondary server. After you've corrected the problem that caused the failover, you run the BrightStor Reinstatement Wizard, which guides you through the process of replicating changed data back to the primary server before failing the application back to it. Performing a normal shutdown of the primary server won’t cause a failover—BrightStor sees a shutdown as a nonfailure event.

You use the BrightStor High Availability Manager GUI, a Win32 application that Figure 6 shows, to configure protection tasks and manage BrightStor tasks and alerts on the servers. It runs on Windows 2003, XP, and Win2K computers. The user ID that BrightStor High Availability Manager runs under must have administrative authority over the server being managed.

Replication is asynchronous. You specify the data to be replicated to the secondary server at the directory level. A file filter allows more granular control over which files in a directory structure are replicated. Drive letters and directory structures on the secondary server need not be the same as on the primary server.

BrightStor keeps track of disk sectors that change while an application runs on the backup server, so when you fail the application back to the primary server, BrightStor replicates back only the changes. BrightStor licensed for one primary server and one secondary server costs $2495.

Double-Take, GeoCluster, and GeoCluster+
NSI Software offers three high-availability products. Double-Take supports data replication and failover from one node to another that has the same application installed. GeoCluster enhances Microsoft Cluster service with support for replicated-data configurations, letting customers create Microsoft Cluster service stretch clusters that extend over geographically dispersed sites. GeoCluster+ enables remote availability of the data outside of the initial cluster on any other storage resources, including to non-Microsoft Cluster service nodes.

Double-Take combines a replication engine with automatic or manual application failover support. Double-Take supports one-to-one active-active and active-standby configurations, as well as many-to-one configurations in which one server is configured to stand in for applications on several primary servers.

A Double-Take replication set, described in more detail below, defines a set of data to be replicated. A Double-Take connection links a replication set to a target machine and defines the location to receive replicated data. A connection can specify the NIC on the target server to receive the data and cause the initial mirroring of data to begin when the network connection is first established. Alternatively, you can schedule mirroring or start it manually.

At the target machine, Double-Take initiates monitor requests (Internet Control Message Protocol—ICMP—pings) to the monitored IP addresses on the source machine and declares failure when a user-specified timeout period passes with no response. You can configure failover to occur automatically or initiate it manually. When failover occurs, Double-Take executes an optional, administrator-supplied prefailover script on the stand-in machine. Double-Take then adds the name and IP address of the source machine to the identity of the stand-in machine, creates any file shares that are within the scope of the replication set, and updates the Address Resolution Protocol (ARP) cache of local routers to ensure that client requests are sent to the new machine. An optional administrator-defined post-failover script completes the failover process.

You manually initiate failback on the now-active target machine after removing the original source machine from the network. The failback process removes the name, IP address, and shares added to the identity of the target at failover and optionally runs prefailback and post-failback scripts. You then place the repaired source machine back on the network and remirror replication set data before letting users connect.

The Double-Take Management Console, which Figure 7 shows, is a Win32 GUI that you can use to manage and monitor servers running Double-Take in attached networks. Within the console, the Connection Wizard lets you quickly configure basic one-to-one mirroring, replication, and failover. The Connection Manager component lets you further configure replication tasks. You use another Win32 GUI, the Failover Control Center, to configure and monitor failover activity. You can also choose to manage or script Double-Take operations by using the fully functional DTCL command-line utility, or you can use the interactive DTTEXT command-line interface. Double-Take also includes SNMP support, so it can relay statistics, data, and SNMP traps to an SNMP-based management console.

Double-Take implements asynchronous file replication by intercepting and queuing file-system writes in a manner that preserves the order of I/O operations. Supporting NTFS, FAT, and FAT32 file systems, Double-Take replicates changes to file attributes and permissions in addition to data. You define data to be replicated by creating a replication set, which specifies some combination of volumes, directories, and files. The directory structure of the target volume need not be the same as the source volume—you can specify a directory on the target server to which Double-Take will replicate the data. Thus, you can replicate from several source servers to one target server, if desired. You can also create several replication sets for one source server and send the sets to different locations at different times.

Double-Take also supports chained replication, in which Server A replicates to Server B, and Server B replicates to Server C. You can use this configuration to create both local and offsite copies of data while limiting replication traffic from Server A to one target. Double-Take also supports single-server replication, which you can use, for example, to create a backup copy of application data on another SAN or SCSI volume.

Double-Take lets you transmit queued data as file changes occur, according to a preset schedule or manually. You can specify bandwidth limitations to restrict the amount of network bandwidth used for Double-Take data transmissions. You can also schedule a periodic replication verification that compares the source and target copies of the replication set and reports on differences.

Double-Take, GeoCluster, and GeoCluster+ pricing is based on the OS edition on which the software is installed. At the lower end, Double-Take, server edition, is $2495; GeoCluster, advanced edition, is $4495; and GeoCluster+, advanced edition, is $7495. At the high end, the datacenter edition of Double-Take and GeoCluster is $39,995, and the datacenter edition of GeoCluster+ is $59,995.

Summing Up
Table 1 lists these high-availability and fault-tolerant products and the technologies they use. Although the products have features in common, each is unique.

BrightStor provides replication-based failover. The product incorporates file-system replication, rapid failback replication, and process-monitoring technology from CA's Unicenter enterprise-management solution.

Co-StandbyServer AAdvanced supports two-node shared-storage and replicated-data configurations and includes an application availability reporting feature. AAM supports 100 nodes and a system of rules and triggers that allows automated responses to a variety of situations.

Endurance offers the only true fault-tolerant solution described here. It turns two identically configured servers into one fault-tolerant entity. Endurance supports almost any off-the-shelf Windows application.

Double-Take has a file-oriented replication engine that can send less data across the replication network than a comparable block-oriented replication engine. Replication options include many-to-one, one-to-many, chained, and same-server configurations. GeoCluster enhances Microsoft Cluster service by adding support for replicated-data configurations to Microsoft Cluster services' native support for only shared-data configurations. GeoCluster+ enables remote availability of data on any storage resource, including non-Microsoft Cluster service nodes.

Matrix Server is a shared-storage cluster server that supports up to 16 nodes. The PolyServe SAN File System distinguishes this product from others by giving multiple cluster nodes concurrent read/write access to one copy of shared data.

LifeKeeper supports both shared-disk and replicated-data cluster configurations. With its wizard-based configuration procedures and the automated application setup that its application-specific recovery kits provide, SteelEye touts ease of use and broad application support as key features.

Storage Foundation HA 4.1 for Windows supports a broad range of cluster configurations from basic replicated-data and shared-storage configurations to systems that include high-end storage management products. Flexible security options support granular delegation of administrative authority.

This comparison of high-availability solutions and visits to the vendor Web sites should help you determine which solution best suits your needs.