Be prepared for system failure--back up your data in realtime

Defining your company's disaster recovery plan is an essential part of LAN/WAN development. The traditional approach of performing system backups offline after hours is no longer feasible because many networks now operate around the clock and because companies can't afford to lose a day's worth of data.

One solution is Double-Take 1.3 Beta for Windows NT (DT), a clustering backup solution and resource management tool that creates continuous, realtime mirroring of data, even from files in active use. Though DT for Novell NetWare has been available for some time, Network Specialists, Inc. (NSI) released DT for NT in April. DT's purpose is to protect business operations from system failures and maintain good network availability.

DT works at the network operating system (NOS) level, not the physical driver level. Instead of copying the physical hard disk as standard mirroring does, DT replicates all OS transactions that modify the contents of the server's file system. The NOS receives and validates disk writes, and then sends the write to the primary disk and to the target system for synchronized replication. Other key features include a fault-tolerant standby server, which lets users log on directly to the backup server if the system fails. According to NSI, one of DT's strengths is WAN support of off-site replication of server data for disaster recovery.

DT supports only Intel platforms and the following configurations: one file server to one backup server, many file servers to one backup server, and one file server to many backup file servers. In the many-to-one configuration, NSI cautions against using more than 50 file servers in one backup. Available storage capacity limits the size of a backup.

Configuring for Backup
DT defines a simple backup system as three network-attached platforms: a file server, a backup file server, and a workstation. DT has four parts: DTSource, the machine that contains the original data; DTTarget, the machine that contains the replicated copy of the data; DTClient, the service that configures and monitors both machines; and the Automatic Failover Utility, which lets the target assume the role of the source. The source and target machines can be either NT servers or NT workstations. DTClient can reside on DTSource or DTTarget, or on any Windows 95 or NT system that has access to the source or the client over the network. The Automatic Failover Utility runs on the target machine.

The beta DT software I tested came on three diskettes. Installation was not painless: I couldn't install the product on any NT workstation or server within the Windows NT Magazine Lab domain. The Lab brain trust could surmise only that some previously loaded management or monitoring software conflicted with DT. Dozens of types of software can be running on the Lab's network at one time, so determining the specific culprit is nearly impossible.

For expediency, I created a new NT domain from four completely clean systems that were isolated from the Lab domain. I installed DTSource and DTClient on the Primary Domain Controller (PDC) and DTTarget on the Backup Domain Controller (BDC). I was never able to install any component on a workstation. After I installed DTClient, I started it on the PDC by selecting the DT-Client icon created during installation. As Screen 1 shows, you start DTSource and DTTarget from the Services applet in Control Panel. Both DTSource and DTTarget must be running continuously for the mirroring and failover features to work.

When the three applications are running, DTClient displays the PDC as a source machine and the BDC as a target machine, as Screen 2 shows. You simply double-click the source computer to get the Replication Set Explorer feature, shown in Screen 3. From here, you select which documents, folders, or directories you want to replicate on the target machine. Then you use the Connection Manager, shown in Screen 4, to link your source and the target by selecting the available target or targets and clicking Connect. The documentation didn't explain mirroring options well, but after several phone calls to NSI, I learned that you select the computer in the Connected box before you click the traffic light.

Because I worked with the beta version of DT, I had to use a photocopy of the page proofs of the user's manual. The documentation includes example configurations, installation instructions, and operations, but the manual falls short in other areas. In the documentation provided, some references to configuration software and operation are for Novell NetWare, not NT. Installation directions are easy to follow but fail to mention key steps, such as the need to enter an authentication number before you can install the software, and how to begin mirroring. The software did not include Help files, but NSI will add them in a later release. Although NSI said that users can download Help files from its Web site, neither the Web files nor technical support via email was available during my evaluation. NSI has since restored these services.

Testing Mirroring and Failover
For testing, I created several small database files (420KB) to replicate from the PDC to the BDC and started mirroring. From DTClient, I could monitor the initial mirroring of the files. I started the Failover utility on the target by typing in the domain name of the source machine and the interval at which I wanted the program to check the status of the source.

From the two other workstations, I ran a SQL program that would continuously increase the size of the database. I monitored the growth of the file size by opening the file folder on both the PDC and BDC and selecting the Details view. Process monitoring through DTClient provides only a snapshot, not cumulative information.

When the files had reached 5MB, I made the PDC fail so I could observe the failover capabilities. I immediately received network error messages on the workstations and continued to monitor failover from the BDC. After two minutes, the BDC reported that failover was complete. According to documentation, the workstations can log on to the BDC and access the files. But the documentation didn't explain how network users or systems administrators access the replicated data. NSI's technical support told me to use the journal functions of SQL Server or Oracle. Enterprise database programs, such as Oracle and Sybase, maintain transaction logs, and the restore process involves accessing those logs.

NSI's tech support didn't respond to the question about whether you have to run DT with SQL Server or Oracle. The documentation didn't cover what steps you follow to bring the PDC back online or whether you can resynchronize the database.

I came up with a workaround after two hours of troubleshooting. The BDC had assumed the name and role of the PDC. But even as Administrator, I couldn't manage user accounts. The system wouldn't let any workstations log on with user accounts. I created a shared file for the replicated files on the BDC and logged on to each workstation as the system administrator. This strategy let the workstations access the data, but I lost file security. The replicated data, however, was accurate to the moment the PDC failed.

I conducted another trial to see whether the BDC could be in an active state with users and still retain the role of target machine. Again, I did a clean reinstall of NT and DT to achieve the original domain configuration. I made the connection between PDC and BDC and initiated mirroring and failover. At first, mirroring from the PDC to the BDC operated well. I had configured one workstation to access a separate database on the BDC, independent of any DT software. When the mirrored databases reached 10MB, the entire network slowed down noticeably, and CPU usage on the BDC increased to maximum and stayed there. Time to complete queries to the database increased from 20 seconds to several minutes. The documentation notes that the more RAM, the faster the service operates.

When I made the PDC fail, I immediately received network errors for the workstations using the PDC. The workstation using the BDC locked up completely. Eleven minutes after I initiated failover, the BDC reported that failover was complete, and the BDC assumed the name and role of the PDC. I rebooted the workstation connected to the BDC, and the workstation logged on to the domain with its original user account. The BDC let the two original PDC workstations log on to the domain using their original user accounts, but the workstations could not access the database. As Administrator, I was able to create a file share on the BDC for the target folder, and then the workstations could access the database. Again, the data was complete up to the time of server failure.

The Report Card
DT has a 90-day warranty--including upgrades and fixes--from date of purchase, and the NSI offers yearly maintenance of DT at $300 per server. NSI also offers free, customized classes and training at its Indianapolis facility.

Because I tested the first NT version of NSI's NetWare product, I guess I had to expect some growing pains. The software mirrors data to the exact moment of server failure; however, DT doesn't make data immediately available on the backup system.

Installation would have been much smoother if I hadn't had to reinstall the NOS several times for DT to work. After I installed the program, I had only to choose which files or volumes to replicate and then connect the source and target systems.

The Automatic Failover Utility worked to a degree, but it doesn't support immediate data availability. The backup file server assumes the functions of the file server; however, the administrator has to re-create shares manually before failover. NSI's testing has shown that both servers must run the same NOS, NOS version, application version, database version, and utilities. DT's documentation warns that synchronization of the original data and the backup data can absorb all available network resources, leaving few resources for user operations; but the documentation doesn't explain how to perform synchronization. When the system is supporting many-to-one target configurations in a heavily used network, a large volume of data can overload the target and slow down the system.

Because I tested a beta version of DT, any recommendation would be premature. NSI is venturing into the NT market with this product, which has had years of success with Novell NetWare. I suggest waiting for a later product release so NSI can work out the bugs. (NSI planned to ship Double-Take 1.4 in May.) If NSI fixes the problems I encountered with the beta version, DT has good potential as a clustering backup solution. The Lab will keep you posted.

Double-Take 1.3 Beta
Network Specialists
201-656-2121
Web: http://www.nsisw.com
Price: $1875 per source server with no client restrictions and no charge for the target agent