Get back in production quickly and recover mail later

In the Windows 2000 Magazine article "Exchange 2000 Storage Exposed, Part 2" (August 2000), Jerry Cochran discussed the Exchange 2000 Server backup and restore process. Exchange 2000 improves on the Exchange Server 5.5 process because it lets you recover a single database while other databases in the storage group (SG) continue to service users. In a traditional recovery process, any users on a mailbox store are left without email while you recover the databases.

However, in some companies, you can't afford for your email system to be unavailable to any users while you're performing a restore. Let's look at a method for bringing an Exchange Server Information Store (IS) back into production quickly to let users back into the mail system immediately, effectively reducing downtime to zero. This method lets you focus on recovering existing mail as a background recovery process.

Reducing Downtime
You can reduce downtime in two ways: eliminate the cause of service interruptions or reduce the duration of an interruption that has already occurred. Let's assume that you've done everything you can to eliminate service interruptions by following the guidelines on topics such as hardware fault tolerance, clustering, and best practices for mission-critical applications. Nevertheless, you have a service interruption—one of your Exchange servers crashes and refuses to mount one of the stores in an SG. In Exchange Server 5.5, the IS service won't even start, but in Exchange 2000, the IS service starts even if one of the stores in an SG won't mount. You determine from the event log errors and by running Eseutil that the reason the store won't mount is that it's corrupted, and you must restore it from the last backup.

At this point, you typically would prepare for a full store recovery by tracking down the last full backup tape and additional backup tapes (unless you have the luxury of disk-based backup, which is becoming a popular option for those who can afford the extra hard disks). Regardless of your backup medium, the users whose mailboxes are on the affected store will have to wait while you recover the database files (the property database—.edb—files and the streaming database—.stm—files). I've found that outages last at least several hours, no matter how small the company size, or even all night, depending on the quality of the company's operational procedures and the experience of the people performing the restores.

However, you have another option: Restore service immediately and recover old mail and other items later. The advantage of this method is that users have immediate access to their mailboxes so that they can begin receiving and sending mail. What they won't have access to immediately is their old mail, calendar, tasks, and so forth. After you perform a background recovery, these items will show up in about the same length of time that users would ordinarily be waiting just for access to their Inbox.

This technique won't appeal to everyone, but the advantage of giving end users instant access to their mailbox might be worth the wait for the rest of the mail to reappear. In a POP3 email environment, the recovery might be totally invisible. Instead of receiving a Server not found error message while you're recovering the IS, the POP3 email client just won't receive any new mail until the phased recovery is complete. So, let's see how to perform this feat.

Prepare Your Environment
Well in advance of the disaster and recovery, you need to set up your recovery environment. At a minimum, you need two computers: an Exchange 2000 computer in the same domain name but in a second Active Directory (AD) forest, unrelated to your production AD, and a workstation—the "recovery console"—to perform the recovery operations. The workstation must be a separate computer for two reasons. First, you must change its domain membership during the recovery process. Second, you can install and test the recovery utilities on the separate machine before you begin. In some cases, the utilities don't run on the recovery Exchange server after you install all the applications and service packs.

For the recovery, you can use one server only if the server has sufficient processor power, memory, and storage capacity to run Exchange 2000 and act as the Win2K domain controller (DC). The Exchange server must have adequate disk space to write out the IS from backup and extract the mailboxes to Personal Folders (.pst files). Therefore, you must allow more than twice the size of the original database files: one for the databases, and another for the .pst files, which also break the single-instance ratio.

On the recovery servers, install Win2K and the same service pack as the production mailbox server has. You'll need a Win2K DC to hold AD and the user accounts, which the Mailbox Reconnect tool will extract. To promote the server to a DC in the new AD forest, you must install the prerequisite DNS. I prefer to configure DNS to be a secondary zone copy of the production primary DNS zone, but you can always make manual DNS entries, if needed.

The next step in setting up your environment is to install on the recovery workstation the necessary Win2K components (i.e., Network News Transfer Protocol—NNTP—and SMTP) for installing Exchange 2000. Install Exchange with the same organization name as the production system has. Apply the same service packs and hotfixes as the production Exchange system has. Make sure that the Exchange server has the same organization name and administrative group name as the production system, or the recovery won't work. The Exchange server name, however, doesn't have to match the mailbox server, so you can easily do this in advance. If your backup software uses an Exchange agent, add the service account to the local administrators group on the recovery mailbox server. This account typically is in the production domain, so you must type in the domain name; you can't resolve the name with the recovery domain.

On the recovery console/workstation, copy the Exmerge utility from the Exchange 2000 Server CD-ROM's support\utils\i386\exmerge directory. Exmerge requires that exchmem.dll be in the same directory or system path. Copy the Mailbox Reconnect tool (mbconn.exe) from the support\utils\i386 directory. I explain these utilities in more detail later.

Finally, make sure that you have a distribution list (DL) that contains all users on a particular IS. You'll see why this is necessary in a moment. Although you can create and manage the DL in several ways, the easiest way is to organize your users by organizational unit (OU) because you can right-click the OU and choose Add members to a group.

Activate the Plan
When you've made these preparations, I recommend that you conduct a practice drill to work out unforeseen problems. If a mailbox store goes down, you first bring back the server, then restore the databases. Here's what you do in each phase.

Bring back the server. When disaster strikes, your first priority is to get the Exchange server back in production. Follow these steps:

  1. Locate the log files for the mailbox store that you're unable to mount, and copy them to the recovery server. Remember that all stores in an Exchange 2000 SG share the same set of transaction logs, so you can look in the SG properties in case you're not sure of the exact location.
  2. Delete the database files on the production mailbox server. Right-click the store and select Mount to mount it. When you mount the store, you'll receive the warning that Figure 1 shows. Click OK. This warning illustrates the first major difference between Exchange 2000 and Exchange Server 5.5. AD maintains the list of users who can have a mailbox—think of the right to have a mailbox as a property of objects in the directory. However, the mailbox doesn't exist until the user receives his or her first piece of mail, then Exchange creates the new mailbox.
  3. Send to the DL an email message stating that you're recovering the users' mail and that no information will be lost. This message both informs and calms the users and creates the mailbox. Verify that users are now able to reconnect to a new mailbox, either by asking some of the affected users, or by opening an email client yourself (e.g., Outlook Web Access—OWA).

Perform the recovery. Now that users are back in production, you can continue with the behind-the-scenes recovery process. Most of this process is similar to a traditional IS recovery.

  1. On the recovery server, configure the Exchange server SGs and mailbox stores with the name of the SGs and stores that you'll restore.
  2. Open the mailbox store Properties dialog box, and select the This database can be overwritten by a restore check box, as Figure 2 shows.
  3. Dismount the store you want to recover, but don't take the Exchange IS service offline, as you would have in earlier Exchange versions.
  4. Copy the log files to the same drive and folder location on the recovery server as they are on the production server.
  5. Use your backup application to restore the IS, and make sure that it is marked as the Last Restore Set. Because you're redirecting the restore to the recovery server, the Exchange server name might not match the original server name. In this case, in your backup application, select the option to allow redirection of the Exchange restore. The store is mounted when the restore is finished.
  6. In the Microsoft Management Console (MMC) Exchange System Manager (ESM) snap-in, expand the recovered store, right-click the mailbox folder, and choose to run the Cleanup Agent. A red circle and an X will mark all recovered mailboxes. These mailboxes are now orphaned objects because they don't have an associated user object in AD, but all contents are still in the mailbox store and aren't affected by the Cleanup Agent.
  7. Now, you need to create the user objects in AD and reconnect each mailbox. To perform these tasks manually—using ESM to reconnect each mailbox individually—would be quite a lengthy process, so instead you use the Mailbox Reconnect tool. Run the Mailbox Reconnect tool on the recovery console/workstation. The program will ask you for the names of the Exchange server and DC (which might be the same). On the next screen, select the store you want to recover, as Figure 3 shows.
  8. Create a file to import into AD by selecting Actions, Export Users from the menu to bring up the dialog box that Figure 4 shows. Select a container (e.g., an OU called Recovery) in AD to place the recovered users in. If you haven't created an OU already, you can leave Mailbox Reconnect running and launch the MMC Active Directory Users and Computers snap-in to create the OU. This name doesn't have to match any OU in the production directory; but I find using a separate OU called Recovery facilitates cleaning up later. I recommend that you type in the path and name of the file, because browsing for the file location caused my Mailbox Reconnect tool to crash.
  9. Click Generate to use a tool called LDIFDE (already installed in your Windows System 32 directory) to create an .ldf file for direct import into AD. The name LDIFDE stands for Lightweight Directory Access Protocol (LDAP) Data Interchange Format Directory Exchange.

  10. Remove the orphaned SMTP and system mailboxes. The easiest way to remove these mailboxes is to open the .ldf file in Notepad and search for SMTP and System.
  11. Open a command prompt, and import the file into AD by running the command
  12. LDIFDE ­i ­f filename

    where filename is the name of the file you want to import.

  13. To reconnect the mailboxes to the newly created user accounts, use the Mailbox Reconnect tool and select Actions, Apply. Now, if you right-click a mailbox from ESM and choose Reconnect, a message appears stating that the mailbox has already been reconnected.
  14. By default, even the highest-level Win2K administrators are denied full access to every user's mailbox. This restriction prevents Exmerge from being able to extract the mail. Therefore, open the mailbox store Properties dialog box, as Figure 5 shows, and clear the Allow inheritable permissions from parent to propagate to this object check box. At the prompt, choose Copy to copy permissions. Clear the Deny check boxes for the Receive As and Send As permissions for the Administrator or your current logon account. To apply these permissions, you must dismount and remount the store.
  15. On the recovery console/workstation, run the Exmerge utility. The first step of the two-step Exmerge process is to extract existing mail and folders to .pst files. The Web-exclusive sidebar "More About Exmerge" on the Exchange Administrator Web site (http://www.exchangeadmin.com) explains Exmerge options you can select to reduce the risk of losing mail. Extracting large amounts of email takes a long time, and you should monitor the remaining disk space to make sure that you don't run out.
  16. Before proceeding with the final step—importing the mail—run a backup of the production mailbox store in case the merge process fails.
  17. When the extraction to .pst files is complete, change the recovery console/workstation domain membership to the production domain. I've found this step necessary to ensure Exmerge access to the production Exchange stores.
  18. Use the Exmerge utility to import the extracted .pst files to the production mailboxes. (This step is the second part of the two-step process.) If you experience a large percentage of failures in the import process, make sure that you followed the earlier step to send a mail message to all users to create the new mailboxes, because Exmerge can't connect to a mailbox that doesn't exist.

If you have sufficient hardware, you can accelerate the recovery process by having one computer use Exmerge to generate the .pst files and a second computer to merge the .pst file information back into the Exchange server. Exmerge offers several options; you can use the utility to move mailboxes and to clean up after virus attacks. Validate the results of the recovery process by making sure that the affected users have their mail back. And finally, you can celebrate!

A Valuable Tool
At first glance, this recovery process might seem daunting, but when you become comfortable with it, you'll find it a valuable tool for your recovery kit. Although you might not use it every time you need to recover an Exchange IS, one day it might keep you from missing the recovery window in your service level agreement (SLA). When the traditional restore methods aren't going well (e.g., you can't locate the latest backup tape, the data on the tape is corrupt), this method lets you get your Exchange server back up immediately and restore the mail later.