Virtualization has been making significant inroads as a viable technology in
a variety of applications, not least of which is business continuity and disaster
recovery. Virtualization's intrinsic features of encapsulation, consolidation,
and independence from the hardware platform can make disaster recovery solutions
more manageable, flexible, and less costly.
Thanks to a virtual machine's (VM's) encapsulation properties—in which
the complete computing environment contains the OS, BIOS, applications, data,
and virtualized hardware—you can recover VMs to any supported AMD- or
Intel-based server without worrying about the differences in the underlying
physical hardware. Thus, the physical-world necessity to restore to an identical
server (i.e., make, model, and configuration) doesn't apply: System-compatibility
concerns between the hardware and OS at the recovery site are eliminated, making
recovery much more reliable. Another virtualization benefit is the ability to
consolidate servers at the recovery site by hosting multiple VMs on one physical
server. During a failover scenario, it's often acceptable to temporarily provide
somewhat lower application performance. This ability to oversubscribe hardware
with multiple workloads with minimal performance impact makes this disaster
recovery model economically attractive. During day-to-day operations, you can
also use the servers at the recovery site to handle test and development workloads.
Then, when a disaster occurs, you can repurpose those servers by shutting down
those jobs and starting up the recovery VMs. In this way, resources are fully
utilized and system reconfiguration is kept to a bare minimum.
So, how do you start utilizing virtualization as part of
your company's disaster recovery process?
Getting Started
Let's assume that your business is running several Windows applications, each
on a dedicated server with local SCSI storage. You have no shared storage devices—Fibre
Channel SAN, NAS, or ISCSI—in your environment. Your applications are
tiered in terms of how critical they are to your business. For this exercise,
let's suppose you'd like to initially focus on your second-tier applications,
which have less stringent Recover Point Objective (RPO) and Recovery Time Objective
(RTO) requirements. Your current environment and business requirements will
determine the mechanisms that are viable for replicating the data from the production
site to the recovery site.
Microsoft's Virtual Server and VMware's ESX Server
lead the market in the server virtualization space. Both
offer comparable functionality, and although this article
references the ESX Server virtualization platform, you could
adapt the process workflow to apply to a Virtual Server
environment. Virtual Server installs on an existing Windows
host OS, such as Windows Server 2003 and Small Business
Server (SBS) 2003. In contrast, ESX Server installs directly
on your server hardware—or "bare metal"—and inserts
a virtualization layer between the hardware and the individual guest OS. ESX Server partitions a physical server into
multiple secure and portable VMs that run side by side on
the same physical server. The virtualization layer abstracts
the underlying processor, memory, storage, and networking
resources into the multiple VMs.
Once you have a virtualization platform in place, you need to consider the
three general methods for moving data from a source site to a recovery site:
backup/restore, host- or server-based replication, and storage
array–based replication. Table 1
shows the replication options that can meet various recovery objectives. Your
options for which data-protection mechanism is applicable in your environment
will also depend on where you locate the ESX Server system and VM files (as
you see in Table 2). Because you have only
local SCSI drives in our example, the options for shared storage don't apply.
Thus, in this article, we'll consider only backup/restore and host-based replication
scenarios.
With virtualization as part of your disaster recovery
solution, there are two general architectures that you can
deploy: physical-to-virtual and virtual-to-virtual. Let's walk
step by step through the implementation of both architectures. I'll start with the backup/restore scenario and follow
that with host-based replication.
Physical-to-Virtual Architecture
In the physical-to-virtual architecture, the source (i.e., production) applications
will continue to run on existing physical servers. The recovery platform will
have the applications running in VMs on ESX Server. The replication mechanism
will use the traditional backup/ restore methodology.
To set up the environment, you first need to identify the applications that
you want to include in the disaster recovery plan. Then, you confirm that the
current file-level backup policy (e.g., full, incremental, differential, frequency)
for each physical server meets your recovery objectives. For the recovery target,
you select the physical server on which you want to install ESX Server. Before
you take this step, be sure to check the "VMware Systems Compatibility Guide
for ESX Server 3.x" (http://www.vmware.com/pdf/vi3_systems_
guide.pdf) for compatibility information. Now, you're ready to install ESX
Server 3.x Standard on your physical server.
Once installation is complete, you need to convert selected physical servers
to VMs. To do so, you can use VMware Converter 3.0, which converts Windows-based
physical machines and third-party image formats to VMware VMs. The VMware VM
that the Converter creates will contain an exact copy of the disk state from
your source physical machine, with the exception of some hardware-dependent
drivers (and sometimes the mapped drive letters). Settings from the source computer
that remain identical include OS configuration (e.g., computer name, security
ID, user accounts, profiles and preferences), applications and data files, and
each disk partition's volume serial number.
When you download VMware Converter, you have the option of installing it either
on the machine you're converting or on a separate computer. You'll be converting
several physical machines, so I recommend installing it on a separate machine
so that you need perform only one installation.
In VMware Converter, click Import Machine, then select a physical computer
as your source. Follow the wizard by selecting either a remote or local machine.
If you select a remote machine, you'll need to enter its computer name or IP
address and proper authentication credentials. Choose the disks to import and
indicate the desired volume size. You can maintain the disk size, minimize it,
or specify an exact size. Choose a destination for the new VM, and follow the
wizard steps to select the ESX Server system. You'll need to log on to it and
assign a VM name, then specify a data store to contain the VM's configuration
files and disks.
Next, you customize the new VM's guest OS. You can customize the VM's identity
(i.e., computer name, owner name, organization, new security ID), server license
information, time zone, and properties of each network interface. After completing
the import task creation, you can repeat the VMware Converter process for all
the servers you need to convert.
Now, you're ready to power up the newly created VM and test the applications.
You can test a sample restore from a previous backup of the source physical
server to the appropriate VM. If the test is successful, you can move the ESX
Server system to the desired recovery location. As far as maintenance is concerned,
you should always ensure that the OS and application versions and patches between
the source servers and corresponding VMs in the recovery system are always in
synch. I recommend periodically testing sample restores.
Virtual-to-Virtual Architecture
In the virtual-to-virtual architecture, you'll be running both your production
and recovery applications in VMs. Therefore, initially you'll need to convert
the source servers to VMs, and subsequently both the source and recovery platforms
will be on ESX Server. The obvious benefit of this approach is a completely
virtualized infrastructure that boasts the increased flexibility and manageability.
To migrate VMs, you can simply copy the necessary configuration and virtual
disk files from the source to the target platform. You can apply either the
backup/ restore or host-based replication mechanism to this architecture. You
have a couple of choices for the backup/restore scenario.
Back up the VM as a physical server. You would use this method
for file-level backups of the data stored within the VM's disk image. The method
requires a backup agent to be installed on each VM. You should also ensure that
the backup operation performs application quiescing of the VM that's
being backed up if possible. Quiescing ensures that the application is in a
consistent state and doesn't lose any transactions in flight. This option has
two phases: a server-consolidation phase, in which your existing physical-source
servers are converted to VMs on ESX Server, and a preparatory phase for your
recovery targets.
You first need to convert existing physical source servers to VMs. To do so,
simply follow the aforementioned instructions for selecting the servers, installing
ESX Server, and performing source conversions to VMs. Then, set up the ESX Server
systems that you plan to use as recovery targets. After you use VMware Converter
to import existing VMs from your source to recovery ESX Server systems, install
the supported backup agent of your choice on each VM that will backed up (i.e.,
the source ESX servers). For a list of supported backup agents, check VMware's
"Backup Software Compatibility for ESX Server 3.x" document (http://www.vmware.com/pdf/vi3_backup_guide.pdf).
On the source side, configure your backup server and device (disk or tape).
To do so, follow these steps:
- Configure the backup server to use the backup device and install the appropriate
drivers and backup server software of choice.
- Ensure that networking is configured for access between the backup server
and VMs that will be backed up. If both the VM to be backed up and the backup
server are on the same ESX Server host, you should use a private virtual switch
to connect them.
- Schedule the backups and manage the backup media.
On the recovery side, install a backup agent on each VM on the recovery ESX
servers. Test a sample restore from the source VM to the appropriate VM on the
recovery ESX server. In the interest of maintenance, always make sure that the
OS and application versions and patches between the source and recovery VMs
are always in sync. Be sure to periodically test sample restores.
Back up the VM as a set of files. You would use this method for
image-level backups or backups of entire VMs. This method takes advantage of
VMs' encapsulation characteristics, providing the capability to back up the
entire VM, including the system configuration, applications, and data. You can
recover VMs in their entirety by performing a restore of the individual files.
ESX Server uses the VMware File System (VMFS) to store VMs. VMFS is a simple,
high-performance file system on physical SCSI disks and partitions that's capable
of storing large files, such as the virtual disk images for ESX Server VMs and
the memory images of suspended VMs.
In ESX Server 3.x, VMFS supports directories. Typically, there's one directory
for each VM on VMFS. This directory contains all the files that comprise the
VM. For a complete list of files that comprise a VM, see Table
3.
In this backup option, individual file recovery for each VM isn't possible.
To recover a single file, you need to restore the entire VM. The virtual disk
files could be larger than a gigabyte, which will probably limit your choice
of qualified backup software. Because of the potentially large size of the virtual
disk file, restore times will definitely be longer.
The backup image will contain the state of the VM at a particular time. It
won't include uncommitted data or memory state. Because this backup process
treats the virtual disk as a whole and isn't application-aware, the backups
created through this process are only file system–consistent, resulting
in a backup that's in a crash-consistent state. To avoid this situation, you
can either power down the VM or quiesce the application (if equipped) prior
to performing the backup. Alternatively, you can use the utilities in ESX Server
to perform your image-level. (See the Web-exclusive sidebar "A VMware-specific
Backup Option,"http://www.windowsitpro.com, InstantDoc ID 95596.)
To set up the environment, configure the setup source and recovery sites, as
I discussed earlier. Install the supported backup agent on the service console
of each ESX Server host (both source and recovery ESX Server systems). On the
source side, configure your backup server and device (disk or tape) as follows:
- Configure the backup server to use the backup device and install the appropriate
drivers and backup server software of choice.
- Ensure that networking is configured for access between the backup server
and VMs that will be backed up. If both the VM to be backed up and the backup
server are on the same ESX Server host, you should use a private virtual switch
to connect them.
- Create and schedule the backup jobs based on a recommended policy, as you
see in Table 4. The probability that
data disks assigned to each VM will change frequently is high. Therefore,
I recommend a daily incremental backup. The boot disk image of a VM might
not change as frequently and should be backed up at least once a week. Changes
to the
ESX Server service console are minimal, so the backup policy in this case is
totally up to you. If possible, power off the VM prior to the backup. Remember
to back up the folder containing each VM.
Test a sample restore of the virtual disk images from the source VM to the
appropriate VM on the recovery ESX Server machine. Be sure to restore the set
of files in the proper directory location in the recovery ESX Server system.
Next, you'll need to register the VM. Using the VMware VI client, connect to
the recovery ESX Server machine. Select the host, go to the Configuration tab,
and select Storage. In the list, right-lick the data store and choose Browse
Datastore to access the Datastore Browser dialog box. The right side of the
dialog box displays the file system on the datastore. Navigate the datastore's
hierarchy in the Folders tab. To register the VM, you'll need to navigate to
the configuration (.vmx) file, right-click it, and choose Add to inventory.
If necessary, you can modify the VM's network configuration. Now, you're ready
to power up the VM.
Note that from the ESX Server service console, you can view and manipulate
files in the /vmfs/volumes directories of mounted VMFS volumes by using ordinary
file commands, such as ls and cp. Although mounted VMFS volumes might appear
similar to any other file system (e.g., ext3), VMFS is primarily intended to
store large files, such as disk images with sizes as large as 2TB. You can use
ftp, scp, and cp commands to copy files to and from a VMFS volume—as
long as the host file system supports these large files.
Host- or Server-Based Replication
In general, file-based replication mechanisms replicate data asynchronously
over the IP network while maintaining write order. There are two basic implementations
of file-based replication. The first involves loading an agent in the VM. This
method permits file-level changes to the OS, as well as the replication of data
through the IP network to a preconfigured virtual host in the recovery site.
Then, the duplicate files at the target server are updated. Thus, only real-time
byte-level changes travel across the IP connection. The other implementation
is through the replication of the actual set of files that comprise the VM (e.g.,
virtual disk, configuration). With either implementation, remember that if activity
to disk isn't quiesced for replication, the replicated copy will be in a crash-consistent
state. Products such as Double-Take Software's Double-Take and EMC's RepliStor
integrate with Microsoft's Volume Shadow Copy Service (VSS), which makes the
creation of application-consistent snapshots possible on a secondary server—
allowing recovery of Windows data and applications in the event of corruption
or disaster on the source server.
To set up the environment, configure the setup source and recovery sites, as
I discussed earlier. Next, install the replication application and agent on
both the source VM and the corresponding VM in the target ESX Server system.
Create a replication job, specifying the files and directories to be replicated
and the frequency. After initial replication, perform a simple test to confirm
that the selected files and directories and changes are being propagated over,
and that the data is correct.
Testing Is a Must
Initial and regular testing is necessary to ensure that you're indeed prepared
for disaster. Some best practices to keep in mind include testing your documentation
and always keeping it up to date, testing your restore and failover procedures
and determining whether they meet your business objectives, and testing your
fail-back procedures. As requirements and technologies change, you might need
to revisit your current implementation and modify your disaster recovery plan,
architecture, and process.
I've walked you through the various options for applying virtualization technology
to disaster recovery in an environment that uses local SCSI storage. Every environment
is different, and as such your final solution can be a hybrid of the alternatives
I've presented. Even though server virtualization can address a lot of the complexity
and rigidity of a physical infrastructure, proper planning, architecture, and
testing are vital to achieving your business continuity and disaster recovery
objectives.
Good article, but it does not talk about once you have your physical server up and running again how to (if possible) restore from a vm to a physical.