| Executive Summary:|
LifeKeeper Protection Suite for Windows (LPSW) from SteelEye Technologies combines two SteelEye products: SteelEye Data Replication (SDR), which provides volume replication support, and LifeKeeper for Windows (LKW), which provides high availability features. LPSW worked well and was easy to implement. However, although the documentation for the underlying software components and recovery kits was well organized and easy to follow, it lacked the level of integration you would expect, considering the single-product image that SteelEye is marketing. The wizards made creation of resources and dependencies easy, though you really need to use the documentation to see which resources and dependencies you need to create to protect your application.
LifeKeeper Protection Suite for Windows
LifeKeeper Protection Suite for Windows (LPSW) from SteelEye Technologies combines two SteelEye products: SteelEye Data Replication (SDR), which provides volume replication support, and LifeKeeper for Windows (LKW), which provides high availability features. I tested version 6.1.2. In this version, the two products, though bundled together install as separate services, have separate documentation, and separate management interfaces. A single set-up routine integrates installation of the two products. Administration is somewhat integrated, as LifeKeeper automatically configures SDR when you configure a failover scenario that requires it. Flexibility is a hallmark of LPSW and is supported by an extensive feature set. Key features include block-oriented synchronous or asynchronous volume replication, a variety of failover modes supporting shared or replicated storage on both physical and virtual servers, and a new continuous data protection (CDP) function within the recovery feature set.
A LifeKeeper cluster consists of two or more interconnected servers. A cluster can include servers that are local to or remote to the primary application server, and administrators can configure them to fail over to a standby server either automatically or manually. There can be several standby servers in a cluster. Server hardware configuration need not be the same, as long as servers in the cluster have the capacity to handle the failover load.
LifeKeeper core components include a configuration database, a communications manager, an alarm interface used to trigger system (not user notification) events, and a control interface to locate the correct scripts used for recovery actions. LifeKeeper requires at least two communication paths between the servers—one or more for LifeKeeper heartbeat communications (a periodic message between paired nodes that detects faults), and one or more for normal server communications. SteelEye recommends a private IP network connection for the primary heartbeat path and also supports shared disk and RS232 serial connections. The LifeKeeper GUI runs as a Web client under one of several supported browsers with Java installed. LifeKeeper also installs an executable form of the GUI client, which Web Figure 1 shows, for local administration of a LifeKeeper cluster. Both clients look and work the same. LifeKeeper supports three access security levels for the GUI: Administrator, Operator and Guest.
LPSW includes application recovery support for file server resources, including volumes and file shares, and for Microsoft IIS. LifeKeeper Recovery Kits are available for many applications and simplify configuration of application protection, including failover. A generic Application Recovery Kit lets you write scripts supporting the recovery of other applications. Other editions of LifeKeeper Protection Suite are available for Microsoft Exchange, for a variety of databases including Microsoft SQL Server, for Linux-based applications, and for protection of VMware Virtual Center. Windows editions are supported on x86 and x64 versions of Windows Server 2003 and Windows 2000 Server.
LifeKeeper supports many standard storage types, including iSCSI, Fibre Channel SAN shared SCSI, and Windows volumes with Volume Shadow Copy Service (VSS) snapshots. Windows fault-tolerant disk sets are an exception, and unsupported.
To test LPSW, I used two Windows 2003 systems configured with IIS and a disk volume defined with a share. Each system also had two Ethernet cards, one for LifeKeeper’s heartbeat network, the other for normal server communications.
Following the Planning and Installation Guide, I installed LPSW to both servers, applied the license keys I was given for testing to both servers, and started the executable version of the LifeKeeper GUI on the server I designated as my primary system. An Administrative QuickStart Configuration Assistant online Help page guided me through LifeKeeper’s initial configuration.
Defining a communications path for LifeKeeper heartbeat communication was the first step. A wizard guided me through, and I designated the appropriate network interface on the local and remote servers.
The next step was to define a protected resource hierarchy, using features of the basic recovery kit included with LifeKeeper core components. A wizard guided me through the process of creating a volume resource and configured SDR to mirror the volume’s data to the other server.
I also created an IP address resource, which designates an IP address that LifeKeeper switches to the standby server at failover. Adding a DNS host record for this IP address allows users to access the server by a virtual server name independent of the physical server names assigned to the primary and standby servers. Defining a DNS resource causes LifeKeeper to alter the IP address of the specified DNS entry to that of the active server. I tested this feature by defining a DNS resource. The DNS resource wizard asked for credentials to use to create the virtual host name on the authoritative DNS server and created the name pointing to the currently active server.
When creating a resource, LifeKeeper first defines it on the local server, then gives you the option to extend the definition to one or more standby servers. The wizard lets you specify a resource priority for each standby server—at failover, LifeKeeper reassigns the resource to the lowest numbered standby server available. Since each resource has its own priority on each standby server, this feature lets you fail over resources to different standby servers. After creating the resources, I created dependencies between them, which allows the resources to be operated on as a group.
Continued on page 2
To test failover, I configured LifeKeeper on the primary server to fail over upon shutdown, then shut down the server. In less than a minute, my file share was again accessible through the IP address I had assigned to the virtual server name I had created and defined in DNS. The share took about 5 minutes to become available through the DNS resource—the virtual host name that LifeKeeper altered to point to the active server—because my version of the DNS record hadn’t expired, even though LifeKeeper had updated the DNS server with the new IP address. After bringing the primary server back up, I failed the resources back by bringing the top level resource back in service on the primary server. Again, it took only a minute for the IP resource to be accessible again, and a few minutes more for the DNS resource.
In spite of my successful definition of resources and failover testing, the LifeKeeper GUI showed both the primary and standby servers in a “warning” state. Through trial and error, and subsequently confirming this in the documentation, I figured out that LifeKeeper really wants you to define more than one heartbeat communication path. The warning icon changed to the OK icon after I defined an additional heartbeat path on the primary IP network.
Implementing and testing the Microsoft IIS Recovery Kit was similarly easy, although you must accommodate the kit’s prerequisites. First, the kit supports failover only to standby servers on the same logical LAN segment, as it works by using the IP recovery kit to move the Web site’s IP address to the standby server.
Currently the IIS recovery kit supports IIS 5.0 and 6.0. For my test, I configured two virtual servers with a disk volume shared on a common SCSI bus and placed the Web site files on this volume. I selected a free IP address for the Web site, defined a virtual host name for that address in DNS, and configured IIS for the Web site on both servers to use that IP address. Using the LifeKeeper administrative GUI, I created the IP and IIS dependent upon the volume resource. LifeKeeper added the switchable IP address to the primary server’s IP configuration and showed the Web site as “protected.”
I tested failover in several ways—by simulating power-off for the primary server, by bringing the Web site resource “in service” on the standby server, and by configuring the primary server to fail over upon shutdown. In all cases, the switchover proceeded smoothly, making the Web site accessible on the new server in only a minute or two.
LifeKeeper’s documentation is informative and useful. The Planning and Installation Guide thoroughly describes how to install, uninstall, configure, and troubleshoot. It also includes a nice introduction to the online documentation, where more detailed configuration and administration procedures are documented. The documentation for SteelEye Data Replication is similarly well done, though I had little need to reference it. From the perspective of the Protection Suite, the PDF documentation lacks integration, and only the online documentation based on the .chm file seems to reflect the combined feature set.
Worthy of Your Short List
Overall, I found that LPSW worked well and was easy to implement. However, although the documentation for the underlying software components and recovery kits was well organized and easy to follow, it lacked the level of integration you would expect, considering the single-product image that SteelEye is marketing. The wizards made creation of resources and dependencies easy, though you really need to use the documentation to see which resources and dependencies you need to create to protect your application. I liked LPSW’s support for both replicated and shared storage, its ease of configuration for both scenarios, and its support for more complex failover scenarios involving multiple local and remote servers. If you’re looking for an easy-to-implement high-availability solution, I recommend that you put LPSW on your short list.