Delve into the Exchange ESE restore process

In last week's commentary, ("The ESE Backup Process: An Inside Look," http://www.exchangeadmin.com, InstantDoc ID 25350), I dove into the deep waters of Exchange Server's backup operations. Understanding that process can help you better plan and execute disaster recovery for your Exchange deployments. This week, I want to examine the rest of the story: how the Exchange database engine—the Extensible Storage Engine (ESE)—recovers Exchange databases. Armed with this information and last week's discussion of backup operations, you'll be well prepared to tackle the challenges of Exchange disaster recovery.

First Things First
Obviously, a restore operation is an administrator-initiated activity. Before the backup application can restore a database, you must perform two important tasks. First, you need to use the Microsoft Management Console (MMC) Exchange System Manager (ESM) snap-in or some other means (e.g., a script that uses Windows Management Instrumentation—WMI) to dismount the database. Second, you need to use the snap-in or other method to configure the database so that the restore operation can overwrite it. (By default, the database can't be overwritten.) After you've completed these preparatory tasks, the database is ready to be recovered.

Beginning the Restore and Copying the Databases
First, the backup application reads the beginning of the backup set to get a list of available databases. After you select the database to recover, the backup application calls two ESE backup APIs to start the restore. The application asks you for input such as the server to restore to, the location on the server, and a temporary directory for the log, patch, and restore.env files. The application calls the HrESERestoreOpen API to provide this information to ESE, then calls the HrESERestoreAddDatabase API once for each database to be restored. ESE leaves it to the backup application to restore the needed database files to the proper locations. ESE lets the backup application make Win32 file-system calls directly to the OS and lets the backup application copy the database files to disk from the backup set. The reason for ESE's lack of involvement is that it already performed a checksum of the database files during backup, and if the backup set is complete, the databases should be intact. Because the databases being restored are dismounted, letting the backup application copy these files directly to disk is much simpler and faster.

Restoring the Log and Patch Files
Again, the backup application doesn't need ESE's help for a while. The application simply calls the HrESERestoreOpenFile API for each log or patch file to be restored, then copies these files to the temporary directory that the backup administrator specified at the beginning of the backup process. (As I mentioned last week, Exchange 2000 Server Service Pack 2—SP2—and later don't use patch files.) The backup application copies the log and patch files to this temporary directory because these files must remain separate from the log files in the production log-file directory. If naming conflicts or overlaps in the logs on the backup set cause a conflict with the logs on disk, the best course is to copy the log files from the backup set to the temporary directory.

The Restore Environment
After recovering all log and patch files from the backup set, the backup application makes a call that's new in Exchange 2000. Earlier versions of Exchange create the Restore_In_Progress registry subkey during a recovery operation. This key contains information about the recovery operation in progress for the database engine (of which there is only once instance in Exchange Server 5.5 and earlier versions). In Exchange 2000, however, multiple instances of the database engine (i.e., storage groups—SGs) and concurrent recovery capabilities exist, so one registry subkey won't suffice. This change led to the advent of the restore.env file (which stands for Restore Environment). The backup application calls the HrESERestoreSaveEnvironment API during recovery, ESE returns information similar to the information stored in the Restore_In_Progress subkey used in earlier versions of Exchange, and the application saves the information in restore.env in the temporary directory with the log and patch files. (You can use the Eseutil program with the /cm switch to view the contents of restore.env. For more information, see this week's featured Exchange XADM: How to View the Contents of the Restore.env File.)

Completing the Restore and Running Hard Recovery
After the backup application copies all the necessary backup sets for the current recovery operation, the application is ready to complete and terminate its activities and give control back to the Store process (store.exe). The backup application calls the HrESERestoreComplete and HrESERestoreClose APIs to signal ESE that it should take over. At this point, you'd think that the SG that owns the database being recovered would take over and complete the recovery operation. Instead, the Store process instantiates another ESE SG to perform the hard-recovery operation. Hard recovery is the process of applying patch files to the database, replaying log files from the backup set in the temporary directory, and replaying log files from the production log-file directory. After hard recovery is completed successfully, the database is ready to be mounted and made available to users. (Note that you can use Eseutil with the /cc switch to perform hard recovery manually if you're performing simultaneous restores or if hard recovery wasn't completed automatically for some reason.) The recovery SG instance deletes the files in the temporary directory, then is terminated, turning control over to the SG that owns the database.

Too often, Exchange administrators trivialize disaster-recovery operations. However, understanding exactly how backup and restore operations work and how they affect your ability to meet service level agreements (SLAs) for your Exchange deployments is of paramount importance. I hope that this two-part, in-depth look into the internals of Exchange's ESE and its performance of these operations will help you provide the highest levels of availability for Exchange.