If you avoid these errors, you avoid an Exchange catastrophe
Nothing compares with the sinking feeling you experience when you need to restore data from a backup but can't for some reason. Most computer users have this experience eventually; the pain is even more acute and frequent for administrators, who are responsible for large amounts of important business data. Although backup and restore technologies have advanced in the past few years, you probably still use them only as last-ditch safety mechanisms. When all else fails, you try to restore from backup. For this alternative to be viable, you must have a degree of confidence that your data will be available and readable when you need it. However, Exchange administrators make several common mistakes that prevent their backup and recovery operations from running smoothly.
#1: Using the Wrong Backup Method
The two basic methods for backing up Exchange data are online and offline. Online backups use a Microsoft interface (such as Extensible Storage Engine—ESE, backup APIs, or Microsoft Volume Shadow Copy Service—VSS) to copy the selected Exchange data while the Exchange services are running and while the target database is mounted and active. The Exchange-provided APIs back up transaction logs and truncate the logs when necessary.
Offline backups copy the Exchange database and log files while the database isn't mounted. Some solutions purport to copy Exchange data without using Microsoft's APIs but also without dismounting the databases. The Microsoft article "XADM: Hot Split Snapshot Backups of Exchange" (http://support.microsoft.com/?kbid=311898) explains that Microsoft considers these backups to be offline.
Performing online backups is preferable for typical production operations because online backups capture a consistent copy of the Exchange databases without interrupting user access. However, offline backups are useful in some situations. For example, performing a complete offline backup of your Exchange database and logs is a good idea before installing a Windows or an Exchange service pack or performing a forklift upgrade of the database to another server. Although creating offline backups is more time consuming than generating online backups, many administrators prefer the extra safety of having a periodic offline backup in addition to routine production backups.
#2: Not Verifying Backups
If your backup fails and no one notices, does it make a sound? Maybe not, but your users will surely sound off if you can't recover their mail data. I recently worked with a company whose administrator accidentally corrupted a mailbox database. When the Exchange administrator tried to restore the database, he discovered that backups of the database had been failing for more than four months because the administrator hadn't installed the Exchange version of the company's third-party backup agent software. The installed version of the agent tried to back up the files but couldn't because the Exchange Information Store (IS) had the files open. Even a cursory review of the backup software's reports or the Application event log would have shown that the software wasn't backing up the Exchange data. Unfortunately, no one monitored the backups for success.
To prevent this problem, regularly check your backup software's logs. You need to verify
- that the backup software is backing up what you want it to. Make sure the backup type, time, and contents are correct.
- that the backup finishes. Verify that the requested data is backed up, and check for errors that might have occurred.
- that you can restore the data written during the backup. If you're using tape, verify that you can read the tape from another tape drive. Check to see whether you can restore the data to a server and extract Exchange information.
If one of these three checks fails, you should be able to determine the cause of the backup failure and therefore fix the problem. For example, during an online backup, Exchange computes a checksum for each page and compares it with that page's checksum on disk. If the checksums don't match, you receive a 1018 error and the backup stops. Checking your backups would alert you to the error and give you a chance to fix it before the backup stopped.
Even if your backups are working now, don't get complacent. Changing your environment, backup software, Windows configuration, or Exchange configuration might make your backups fail in the future. Check your backups regularly for the best protection. The fastest and simplest way to check your backups to be sure they work is to check the Application event log and the report that your backup program generates. Check the Application event log to ensure that Exchange didn't generate any errors during the backup period. Check the backup program report to verify that the backup program didn't skip any files and that no errors occurred.
#3: Mismanaging the Transaction Logs
Your ability to restore an Exchange database depends on the state of the transaction logs. If you have the correct set of log files for a database, you have a good chance of restoring the database to the point of failure. Conversely, if the logs are lost or damaged, the odds of a complete recovery drop. When you perform a restore, Exchange attempts to play back the log files, in sequence, from the first log required for the database (also known as the low anchor log) to the last log available (the high anchor log). If a log file between the low and high anchor logs is missing, log playback stops. The restore can't continue until you recover the missing log file.
Online backups automatically include the log files as part of the backup data set. During normal operation, Exchange continues to create new log files as transactions occur. These log files remain on disk until you perform a full or an incremental online backup, at which point the Exchange IS process truncates or removes the files. Don't remove log files yourself. In some circumstances, you might need to copy the log files to a separate directory for safekeeping. In "Offline Backup and Restoration Procedures for Exchange" (http://sup port.microsoft.com/?kbid=296788), Microsoft recommends saving copies of the transaction logs in a separate location before attempting to recover data from an offline backup.
When you use NTBackup to perform a restore, the logs don't play back unless you select the Last restore set check box (or the equivalent check box in another backup program). The database you restore isn't mountable unless you select this option, or unless you use the Eseutil /r command to manually start a log playback.
If your transaction logs are missing or any of your log files are damaged, Microsoft's free Exchange Server Disaster Recovery Analyzer (ExDRA) might be helpful. This tool can analyze a dismounted database, tell you which log files are present and which are missing, and give you options for fixing any problems it finds. ExDRA can be valuable if you experience an unexpected restore failure, although it's no substitute for understanding the disaster-recovery process and consulting Microsoft Customer Service and Support (CSS) or other experts when necessary.
#4: Not Allowing Enough Time
Backups take time. Each backup configuration has a throughput number that reflects how much data you can back up and restore in a given time period. A common mistake is to underestimate the amount of time a restore will take. When a restore takes longer than anticipated, you sometimes must break service level agreements (SLAs), and users are often disgruntled.
Microsoft's recommendation is to measure the length of time necessary to back up a volume of data, then allocate twice that time for a restore. You might wonder why a restore takes twice as long as a backup. Suppose you need to back up a 60GB database, using a backup system that can write 12GB per hour. Five hours seems reasonable for a backup. However, when you get ready to restore the data, remember that merely reading the data takes five hours. The restore process requires that you also do the following:
- Locate the appropriate backup media (if you're using removable media such as tape) or find the appropriate disk volume (if you're using VSS or SAN-based backups).
- Transfer the backup data to the server from which you'll perform the restore.
- Create a recovery server or Recovery Storage Group (RSG), if necessary.
- Read the data back from the backup media and correct any errors or problems.
- Replay the transaction logs.
- Move data from the recovery server or RSG to production mailboxes.
- Mount the database successfully.
- Deal with any ancillary problems that arise.
This list isn't trivial; if a problem occurs at any stage in the process, your recovery operation won't proceed through the successive steps. The more restores you perform, the more smoothly they'll go. You'll be able to accurately estimate how long a restore will take, and you'll become familiar with and learn how to solve the types of problems that are common in your environment.
#5: Forgetting the Small Stuff
Exchange backup discussions often focus only on backing up and restoring Exchange data, ignoring the numerous other objects and data items that you must also back up and restore. For example, if your Exchange server has a catastrophic hardware failure that requires you to replace it, you need to install Windows and Exchange on the new server before you can use your Exchange database backups and transaction logs. Maintaining a system-state backup of your Exchange server lets you easily restore the server and Exchange data, putting you back in business much more quickly than if you need to hunt for product installation CD-ROMs, product keys, and so on. If your Exchange environment includes antivirus software, spam filters, X.509 Certificate Authorities (CAs), fax connectors, or other auxiliary services, you need to back up and restore their configuration data as well as the necessary data (e.g., private keys, filter lists) to restore these services to their original operating quality.
When you use NTBackup to perform a system-state backup, NTBackup captures all the system data on the local machine, including the registry, Active Directory (AD) Directory Information Tree (DIT) files on a domain controller (DC), Windows Certificate Services data, DHCP and DNS server databases, and other data that's crucial for recovery. Most third-party backup utilities also have this capability, but you don't need to use third-party tools; you can use NTBackup to schedule a system-state backup to an on-disk file, then include the file with every Exchange backup. This method guarantees that you always have an up-to-date system state to restore. Don't forget to periodically update the Automated System Recovery (ASR) disk. You can often use the ASR disk to repair damaged Windows installations without completely reinstalling the OS. Many third-party backup programs have a similar capability.
#6: Not Practicing
The best time to learn how to recover data in your environment is before you have a problem. Remember that practice makes perfect. Even if you have only one database on one server, you can still practice recovery. Buy a copy of Microsoft Virtual PC 2004 or VMware Workstation, build a test server, and practice restoring data to it. If you're using Exchange Server 2003, you must be thoroughly familiar with RSGs and how to use them. You need to know how to use your preferred backup program to restore data to the original server and to a different server. Keep your product installation CD-ROMs and product keys in a safe location (not in a text file on a server that you might need to recover). Regularly practice recovering items that you might need to recover during an actual outage; depending on your environment, these items might include individual mailboxes, individual messages, databases, storage groups (SGs), or entire servers. Practicing beforehand will be time well spent when a failure occurs.
Spend Time, Not Money
Many companies spend a lot of money on disaster-recovery and high-availability solutions but discover too late that just buying the best hardware and software isn't sufficient. You can use the free NTBackup utility and an inexpensive tape-or disk-based backup system to build a completely adequate disaster-recovery solution. Learn as much as you can about backup and recovery, avoid the common mistakes I've discussed, practice backup and recovery in your environment, and continually monitor your processes. Then, when you experience a failure, you'll be ready to put your skills to work.
For more Exchange backup and recovery advice, see these articles:
"8 Tips for the Solo Exchange Administrator," May 2006, InstantDoc ID 49523