Most of the time, we think of Microsoft Exchange Server databases as being big, monolithic chunks of information. Exchange administrators usually think of a mailbox database as containing a stew of mailboxes, folders, individual mail items, and mail metadata, all lumped together in an amorphous mass. This view is understandable given that Microsoft doesn't fully document the internal structure of the .edb files that Exchange Server's Extensible Storage Engine (ESE) database creates. However, there's enough material out there (e.g., Brett Shirley's excellent presentations at TEC 2010) to give database aficionados an idea of how data items are structured and linked together.

Despite this nebulous view of the database, I've never yet met an Exchange administrator who didn't know that .edb files are divided into pages. I think that's because the term pages is so pervasive in the Exchange world; it's part of the documentation and online help for Eseutil and Isinteg (remember those?), to cite just a couple of instances. However, it's a little misleading to think of an .edb file as nothing more than a collection of pages because the information on those pages is interlinked. Losing a single page can have effects ranging from minimal to catastrophic-it all depends on which page you lose.

ESE uses a number of mechanisms to protect against page-level corruption. For example, each page contains a checksum that's generated at the time the page is written. Any time you want to know if a page is valid, you can read the page data, compute a new checksum, and see if the new value matches the stored checksum. That's exactly what happens during streaming ESE backups in Exchange 2007, Exchange 2003, and Exchange2000.

In Exchange 2010, a background maintenance task scans each page and performs the checksum check; the process is scheduled such that every page of every database, on both active and passive copies, should be scanned at least once every seven days. The checksum operation is throttled so that it doesn't read more than about 5MB/second worth of data, so its I/O impact is light. You can change this behavior so that checksum scans take place during the regular database maintenance window, but Microsoft recommends that you leave the default behavior in place.

What happens if the checksum scan indicates that a page contains an error? That's where the page patching process comes in. This name is a bit misleading because the page itself isn't patched. Instead, the damaged page is replaced with a clean copy from a replica of the database. The patching process is conceptually simple: When a page fails the checksum check, a new copy of the page is retrieved from the transaction logs on another database availability group (DAG) member that contains the same database. By replaying only that portion of the logs that contains data for the target page, the page contents can be replaced without affecting any other pages. The actual steps required to do this vary according to whether the damaged page is on an active or passive copy of the database. Ross Smith's excellent post on the Exchange Team Blog explains the steps in detail.

Note that page patching can take place only in highly available Exchange environments such as DAGs. Otherwise, Exchange has a couple of built-in page repair mechanisms, but if the page can't be repaired, it will be marked as bad. You'll then have to reload from backup.

These maintenance tasks, along with the others Ross describes, run with a goal of keeping your databases healthy and consistent, but they're no substitute for a proper high-availability design (if your business needs warrant it) and a robust backup system. Yes, yes, I know that it's possible to run Exchange 2010 using DAGs and not doing any backups, but that's a topic for another column!