Virtualization has been making significant inroads as a viable technology in a variety of applications, not least of which is business continuity and disaster recovery. Virtualization's intrinsic features of encapsulation, consolidation, and independence from the hardware platform can make disaster recovery solutions more manageable, flexible, and less costly.

Thanks to a virtual machine's (VM's) encapsulation properties—in which the complete computing environment contains the OS, BIOS, applications, data, and virtualized hardware—you can recover VMs to any supported AMD- or Intel-based server without worrying about the differences in the underlying physical hardware. Thus, the physical-world necessity to restore to an identical server (i.e., make, model, and configuration) doesn't apply: System-compatibility concerns between the hardware and OS at the recovery site are eliminated, making recovery much more reliable. Another virtualization benefit is the ability to consolidate servers at the recovery site by hosting multiple VMs on one physical server. During a failover scenario, it's often acceptable to temporarily provide somewhat lower application performance. This ability to oversubscribe hardware with multiple workloads with minimal performance impact makes this disaster recovery model economically attractive. During day-to-day operations, you can also use the servers at the recovery site to handle test and development workloads. Then, when a disaster occurs, you can repurpose those servers by shutting down those jobs and starting up the recovery VMs. In this way, resources are fully utilized and system reconfiguration is kept to a bare minimum.

So, how do you start utilizing virtualization as part of your company's disaster recovery process?

Getting Started
Let's assume that your business is running several Windows applications, each on a dedicated server with local SCSI storage. You have no shared storage devices—Fibre Channel SAN, NAS, or ISCSI—in your environment. Your applications are tiered in terms of how critical they are to your business. For this exercise, let's suppose you'd like to initially focus on your second-tier applications, which have less stringent Recover Point Objective (RPO) and Recovery Time Objective (RTO) requirements. Your current environment and business requirements will determine the mechanisms that are viable for replicating the data from the production site to the recovery site.

Microsoft's Virtual Server and VMware's ESX Server lead the market in the server virtualization space. Both offer comparable functionality, and although this article references the ESX Server virtualization platform, you could adapt the process workflow to apply to a Virtual Server environment. Virtual Server installs on an existing Windows host OS, such as Windows Server 2003 and Small Business Server (SBS) 2003. In contrast, ESX Server installs directly on your server hardware—or "bare metal"—and inserts a virtualization layer between the hardware and the individual guest OS. ESX Server partitions a physical server into multiple secure and portable VMs that run side by side on the same physical server. The virtualization layer abstracts the underlying processor, memory, storage, and networking resources into the multiple VMs.

Once you have a virtualization platform in place, you need to consider the three general methods for moving data from a source site to a recovery site: backup/restore, host- or server-based replication, and storage array–based replication. Table 1 shows the replication options that can meet various recovery objectives. Your options for which data-protection mechanism is applicable in your environment will also depend on where you locate the ESX Server system and VM files (as you see in Table 2). Because you have only local SCSI drives in our example, the options for shared storage don't apply. Thus, in this article, we'll consider only backup/restore and host-based replication scenarios.

With virtualization as part of your disaster recovery solution, there are two general architectures that you can deploy: physical-to-virtual and virtual-to-virtual. Let's walk step by step through the implementation of both architectures. I'll start with the backup/restore scenario and follow that with host-based replication.

Physical-to-Virtual Architecture
In the physical-to-virtual architecture, the source (i.e., production) applications will continue to run on existing physical servers. The recovery platform will have the applications running in VMs on ESX Server. The replication mechanism will use the traditional backup/ restore methodology.

To set up the environment, you first need to identify the applications that you want to include in the disaster recovery plan. Then, you confirm that the current file-level backup policy (e.g., full, incremental, differential, frequency) for each physical server meets your recovery objectives. For the recovery target, you select the physical server on which you want to install ESX Server. Before you take this step, be sure to check the "VMware Systems Compatibility Guide for ESX Server 3.x" (http://www.vmware.com/pdf/vi3_systems_ guide.pdf) for compatibility information. Now, you're ready to install ESX Server 3.x Standard on your physical server.

Once installation is complete, you need to convert selected physical servers to VMs. To do so, you can use VMware Converter 3.0, which converts Windows-based physical machines and third-party image formats to VMware VMs. The VMware VM that the Converter creates will contain an exact copy of the disk state from your source physical machine, with the exception of some hardware-dependent drivers (and sometimes the mapped drive letters). Settings from the source computer that remain identical include OS configuration (e.g., computer name, security ID, user accounts, profiles and preferences), applications and data files, and each disk partition's volume serial number.

When you download VMware Converter, you have the option of installing it either on the machine you're converting or on a separate computer. You'll be converting several physical machines, so I recommend installing it on a separate machine so that you need perform only one installation.

In VMware Converter, click Import Machine, then select a physical computer as your source. Follow the wizard by selecting either a remote or local machine. If you select a remote machine, you'll need to enter its computer name or IP address and proper authentication credentials. Choose the disks to import and indicate the desired volume size. You can maintain the disk size, minimize it, or specify an exact size. Choose a destination for the new VM, and follow the wizard steps to select the ESX Server system. You'll need to log on to it and assign a VM name, then specify a data store to contain the VM's configuration files and disks.

Next, you customize the new VM's guest OS. You can customize the VM's identity (i.e., computer name, owner name, organization, new security ID), server license information, time zone, and properties of each network interface. After completing the import task creation, you can repeat the VMware Converter process for all the servers you need to convert.

Now, you're ready to power up the newly created VM and test the applications. You can test a sample restore from a previous backup of the source physical server to the appropriate VM. If the test is successful, you can move the ESX Server system to the desired recovery location. As far as maintenance is concerned, you should always ensure that the OS and application versions and patches between the source servers and corresponding VMs in the recovery system are always in synch. I recommend periodically testing sample restores.

Virtual-to-Virtual Architecture
In the virtual-to-virtual architecture, you'll be running both your production and recovery applications in VMs. Therefore, initially you'll need to convert the source servers to VMs, and subsequently both the source and recovery platforms will be on ESX Server. The obvious benefit of this approach is a completely virtualized infrastructure that boasts the increased flexibility and manageability. To migrate VMs, you can simply copy the necessary configuration and virtual disk files from the source to the target platform. You can apply either the backup/ restore or host-based replication mechanism to this architecture. You have a couple of choices for the backup/restore scenario.

Back up the VM as a physical server. You would use this method for file-level backups of the data stored within the VM's disk image. The method requires a backup agent to be installed on each VM. You should also ensure that the backup operation performs application quiescing of the VM that's being backed up if possible. Quiescing ensures that the application is in a consistent state and doesn't lose any transactions in flight. This option has two phases: a server-consolidation phase, in which your existing physical-source servers are converted to VMs on ESX Server, and a preparatory phase for your recovery targets.

You first need to convert existing physical source servers to VMs. To do so, simply follow the aforementioned instructions for selecting the servers, installing ESX Server, and performing source conversions to VMs. Then, set up the ESX Server systems that you plan to use as recovery targets. After you use VMware Converter to import existing VMs from your source to recovery ESX Server systems, install the supported backup agent of your choice on each VM that will backed up (i.e., the source ESX servers). For a list of supported backup agents, check VMware's "Backup Software Compatibility for ESX Server 3.x" document (http://www.vmware.com/pdf/vi3_backup_guide.pdf).

On the source side, configure your backup server and device (disk or tape). To do so, follow these steps:

  1. Configure the backup server to use the backup device and install the appropriate drivers and backup server software of choice.
  2. Ensure that networking is configured for access between the backup server and VMs that will be backed up. If both the VM to be backed up and the backup server are on the same ESX Server host, you should use a private virtual switch to connect them.
  3. Schedule the backups and manage the backup media.

On the recovery side, install a backup agent on each VM on the recovery ESX servers. Test a sample restore from the source VM to the appropriate VM on the recovery ESX server. In the interest of maintenance, always make sure that the OS and application versions and patches between the source and recovery VMs are always in sync. Be sure to periodically test sample restores.

Back up the VM as a set of files. You would use this method for image-level backups or backups of entire VMs. This method takes advantage of VMs' encapsulation characteristics, providing the capability to back up the entire VM, including the system configuration, applications, and data. You can recover VMs in their entirety by performing a restore of the individual files.

ESX Server uses the VMware File System (VMFS) to store VMs. VMFS is a simple, high-performance file system on physical SCSI disks and partitions that's capable of storing large files, such as the virtual disk images for ESX Server VMs and the memory images of suspended VMs.

In ESX Server 3.x, VMFS supports directories. Typically, there's one directory for each VM on VMFS. This directory contains all the files that comprise the VM. For a complete list of files that comprise a VM, see Table 3.

In this backup option, individual file recovery for each VM isn't possible. To recover a single file, you need to restore the entire VM. The virtual disk files could be larger than a gigabyte, which will probably limit your choice of qualified backup software. Because of the potentially large size of the virtual disk file, restore times will definitely be longer.

The backup image will contain the state of the VM at a particular time. It won't include uncommitted data or memory state. Because this backup process treats the virtual disk as a whole and isn't application-aware, the backups created through this process are only file system–consistent, resulting in a backup that's in a crash-consistent state. To avoid this situation, you can either power down the VM or quiesce the application (if equipped) prior to performing the backup. Alternatively, you can use the utilities in ESX Server to perform your image-level. (See the Web-exclusive sidebar "A VMware-specific Backup Option,"http://www.windowsitpro.com, InstantDoc ID 95596.)

To set up the environment, configure the setup source and recovery sites, as I discussed earlier. Install the supported backup agent on the service console of each ESX Server host (both source and recovery ESX Server systems). On the source side, configure your backup server and device (disk or tape) as follows:

  1. Configure the backup server to use the backup device and install the appropriate drivers and backup server software of choice.
  2. Ensure that networking is configured for access between the backup server and VMs that will be backed up. If both the VM to be backed up and the backup server are on the same ESX Server host, you should use a private virtual switch to connect them.
  3. Create and schedule the backup jobs based on a recommended policy, as you see in Table 4. The probability that data disks assigned to each VM will change frequently is high. Therefore, I recommend a daily incremental backup. The boot disk image of a VM might not change as frequently and should be backed up at least once a week. Changes to the

ESX Server service console are minimal, so the backup policy in this case is totally up to you. If possible, power off the VM prior to the backup. Remember to back up the folder containing each VM.

Test a sample restore of the virtual disk images from the source VM to the appropriate VM on the recovery ESX Server machine. Be sure to restore the set of files in the proper directory location in the recovery ESX Server system. Next, you'll need to register the VM. Using the VMware VI client, connect to the recovery ESX Server machine. Select the host, go to the Configuration tab, and select Storage. In the list, right-lick the data store and choose Browse Datastore to access the Datastore Browser dialog box. The right side of the dialog box displays the file system on the datastore. Navigate the datastore's hierarchy in the Folders tab. To register the VM, you'll need to navigate to the configuration (.vmx) file, right-click it, and choose Add to inventory. If necessary, you can modify the VM's network configuration. Now, you're ready to power up the VM.

Note that from the ESX Server service console, you can view and manipulate files in the /vmfs/volumes directories of mounted VMFS volumes by using ordinary file commands, such as ls and cp. Although mounted VMFS volumes might appear similar to any other file system (e.g., ext3), VMFS is primarily intended to store large files, such as disk images with sizes as large as 2TB. You can use ftp, scp, and cp commands to copy files to and from a VMFS volume—as long as the host file system supports these large files.

Host- or Server-Based Replication
In general, file-based replication mechanisms replicate data asynchronously over the IP network while maintaining write order. There are two basic implementations of file-based replication. The first involves loading an agent in the VM. This method permits file-level changes to the OS, as well as the replication of data through the IP network to a preconfigured virtual host in the recovery site. Then, the duplicate files at the target server are updated. Thus, only real-time byte-level changes travel across the IP connection. The other implementation is through the replication of the actual set of files that comprise the VM (e.g., virtual disk, configuration). With either implementation, remember that if activity to disk isn't quiesced for replication, the replicated copy will be in a crash-consistent state. Products such as Double-Take Software's Double-Take and EMC's RepliStor integrate with Microsoft's Volume Shadow Copy Service (VSS), which makes the creation of application-consistent snapshots possible on a secondary server— allowing recovery of Windows data and applications in the event of corruption or disaster on the source server.

To set up the environment, configure the setup source and recovery sites, as I discussed earlier. Next, install the replication application and agent on both the source VM and the corresponding VM in the target ESX Server system. Create a replication job, specifying the files and directories to be replicated and the frequency. After initial replication, perform a simple test to confirm that the selected files and directories and changes are being propagated over, and that the data is correct.

Testing Is a Must
Initial and regular testing is necessary to ensure that you're indeed prepared for disaster. Some best practices to keep in mind include testing your documentation and always keeping it up to date, testing your restore and failover procedures and determining whether they meet your business objectives, and testing your fail-back procedures. As requirements and technologies change, you might need to revisit your current implementation and modify your disaster recovery plan, architecture, and process.

I've walked you through the various options for applying virtualization technology to disaster recovery in an environment that uses local SCSI storage. Every environment is different, and as such your final solution can be a hybrid of the alternatives I've presented. Even though server virtualization can address a lot of the complexity and rigidity of a physical infrastructure, proper planning, architecture, and testing are vital to achieving your business continuity and disaster recovery objectives.