I returned from the holidays to find my Exchange servers had not been backed up over the past week. Now that I have both Exchange 2000 and Exchange 5.5 servers to back up, the process has become a little tricky, and I should have done a little more research and testing on it. It's time to revisit the topic of backup best practices for our Exchange deployments (as much for myself as for you).

First on our best practices list is to test backup operations and procedures in a lab environment before putting them into production. In my rush for the holiday season and the long-awaited vacation, I skipped this step. I had upgraded my backup server (I back up my Exchange servers over the network to DLT drives attached to the backup server) from Windows NT 4.0 to Windows 2000 and thought everything would work fine. And everything would have worked fine if not for one small problem: When I upgraded my backup server to Win2K, I also joined it to a new domain (my new Win2K Active Directory—AD—domain that contains my Exchange 2000 servers). Unfortunately, the backup server software was logged into an account in the old domain, and this account no longer had the necessary permissions on the Win2K backup server, which is a member of the Win2K AD domain. As a result, all of my backup jobs failed over the holidays. Luckily, when I returned, none of my servers had experienced any problems, and I had no need to restore. Had I lost my array or experienced database corruption, I would have been explaining to users why I lost a week's worth of their mail.

Another key best practice that I have poorly implemented is active event and error log scanning and reporting. In my small site, I haven't implemented aggressive management practices (this doesn't mean that I shouldn't). Ideally, you need a third-party application management tool such as NETIQ AppManager or BMC Patrol. Using tools such as these, you can easily monitor event and other logs for errors and events pertaining to your backup operations. Don't just monitor; you must implement alerting also. It doesn't do any good for your backup server to be the only one to know something went wrong. In my case, I did have some rudimentary monitoring that alerted me via email that my backup operations weren't successful. The best practice here is to have monitoring in place that correctly notifies system managers of any error or failure conditions in your daily backups. Most types of problems, including database corruption, will terminate the backup job. Being alerted to these events lets system managers proactively respond instead of reactively trying to figure out what happened after it's too late and you've lost data.

I sincerely hope your new year is off to a better start "Exchange-wise" than mine. I am also confident that you are doing a better job at disaster recovery for your Exchange servers than I am. Don't forget these and other best practices for disaster recovery of your Exchange servers. If you'd like more information about Exchange disaster recovery, check out the Microsoft TechNet Web site.