Isinteg and Eseutil can conjure up diagnostic magic—if you use them properly

I often hear questions about the two diagnostic and repair tools included with Exchange Server 5.5: Isinteg and Eseutil. Misunderstandings are common about what these tools do, how they work, and when—or if—you should run them on your servers. Because these tools are key parts of Exchange Server disaster recovery, the best time to learn how to use them is before you need them (as a friend of mine says, "The time to learn how to be a firefighter is not when your house is on fire"). And like firearms or prescription medications, disaster-recovery tools can be dangerous if you don't understand how they work and when to use them. (Imagine shooting a shotgun at a gallon milk jug full of water—a graphic demonstration of what can happen when you don't properly handle a powerful tool.) But before you can dig into what these tools do, you need a good understanding of the Information Store's (IS's) databases and how they work.

The Logical Structures
Let's examine the IS's private and public databases—priv.edb and pub.edb. As far as the command shell or a tool such as Windows Explorer is concerned, priv.edb and pub.edb look like any other file type, only larger. But in reality, the IS database structures are quite complex. Inside each .edb file is a complicated set of interrelated data that the IS and the client turn into what you see as a mailbox or a message.

The Exchange Server Extensible Storage Engine (ESE) stores everything in the IS databases in special structures called B-trees. These structures allow for efficient and fast indexing and retrieval of the databases' contents, which is exactly what you need for an application such as Exchange Server 5.5. An ESE table is a collection of B-trees; each ESE table is made up of rows and columns. When a column contains data that takes up less than 1KB, ESE stores the data directly in the table. When the data is longer than 1KB, ESE stores the data in separate long-value B-trees and stores a pointer to that data in the original table. This procedure makes the non-long-value tables' size predictable and provides a performance advantage because ESE always knows how much data it must read or write.

Single-instance storage is an important Exchange Server feature that depends on this process. With single-instance storage, if I send the same message to 25 recipients on one server, the IS stores only one copy of the message in the database—the other 24 recipients' mailboxes contain pointers to the appropriate B-tree. A logical error in a B-tree can affect multiple mailboxes or public folders.

The Physical Structures
An ESE database is a collection of 4KB pages. Each page can contain the same maximum amount of data, and the page's owner decides how much of that capacity to use. Depending on its size, a chunk of information (e.g., the ESE table that tracks message attachments) can span any number of pages. A 1MB Microsoft PowerPoint attachment will take up a little more than 250 pages (allowing for some overhead). ESE uses pages to efficiently read and write relatively small chunks of data on demand. Pages and the efficient indexing that the B-tree system offers provide good performance, even under heavy loads.

One page can contain data and pointers to other pages. An 8KB message will span at least two pages: The first page will contain almost 4KB of data and a pointer to the second page. Pages also have headers that contain several items, including an error-checking code (called a checksum) that you can use to verify pages during backup and restore operations. If the header's checksum doesn't match the checksum that ESE calculates when it reads or writes the page, ESE logs an error (event 1018) in the event log to warn you that the physical database structure might be corrupt.

Isinteg: The High-Level Verifier
Isinteg looks at the mailboxes, public folders, and other parts of the IS, checking for anything amiss. Think of a proofreader: Isinteg scans the tables and B-trees that organize the ESE pages into their logical structures. In addition, the tool looks for orphaned objects, objects that have incorrect values, or impossible references. Isinteg can perform 33 tests, some of which work on only a particular database.

You can run Isinteg in two modes. In default mode, the tool runs the tests you specify and reports its findings. In fix mode (which you control using the optional —fix switch), Isinteg runs the specified tests and attempts to fix whatever it can. The repair process is akin to putting together a jigsaw puzzle—Isinteg moves information around but never throws anything away. Isinteg isn't overtly destructive the way Eseutil can be. However, whenever you run isinteg —fix, you run the risk (however small) that you'll lose some data.

Isinteg has another, probably better-known function: After restoring an offline backup, you can run the tool to fix the globally unique IDs (GUIDs) for items in the IS. If you don't fix the GUIDs, your IS won't start, and you'll see an event 1011 error in the event log. (You don't need to use Isinteg when you restore an online backup because Exchange Server-aware backup software fixes the GUIDs on the fly.)

Eseutil: Down and Dirty
If Isinteg is a proofreader, then Eseutil is a plumber. Eseutil scans the physical structures in the databases, looking for and fixing physical errors such as incorrect page checksums and bogus page-to-page pointers. The tool has six modes. Eseutil can check each page's header to ensure that it matches the page's data (integrity check mode). This mode duplicates the process that takes place when you run an online backup. Eseutil can dump the databases' physical structures in a form that humans can read (dump mode). The tool can condense the databases by combining partially full pages into a reduced number of completely full pages (defragmentation mode). And Eseutil can take a database that an earlier version of ESE has generated and update that database for ESE97, which is the version that ships with Exchange Server 5.5 (update mode).

Eseutil has two distinct repair modes. Eseutil can try to fix whatever it can without touching the pages' data (recovery mode). In this mode, the tool might modify table links or B-tree entries, but it won't truncate or throw away bad pages. Or Eseutil can modify or throw out whatever necessary in a last-ditch effort to salvage your data (repair mode).

How to Use Isinteg
To run a complete suite of Isinteg tests in the optimal order for catching and fixing the maximum number of problems, use the following command:

isinteg —pri —test alltests

This command tells Isinteg to run all tests on the private IS. You can check the public IS by using ­pub instead of (or in addition to) —pri in the command line. You can also specify an individual test. To view a complete list of available tests, at the command line, type

isinteg

If you want Isinteg to fix the problems it finds, add the —fix switch. Your best bet is to run Isinteg twice: once to look for errors and once to fix them. Because some problems will be more serious than others, you need to get a feel for what you're up against before attempting to restore the IS databases to their original pristine state.

When you restore an offline backup in Exchange Server 5.5, you need to make sure that the Directory Store and System Attendant are running. Then, you need to use isinteg —patch —pri (or —pub) before the IS will start.

How to Use Eseutil
Eseutil is a little less straightforward because it performs more tasks than Isinteg. I'm going to ignore the dump and upgrade modes (they don't relate to repairs and are useful only in very limited circumstances) and concentrate on the other four modes. The first two modes that I discuss are fairly harmless; the second two are somewhat more dangerous.

Defragmentation mode (eseutil /d). Eseutil makes a copy of the target database, reads each page, and attempts to condense the copy as much as possible. After the process is complete, Eseutil replaces the original database with the copy. This mode requires roughly 110 percent of the disk space that the target database requires.

Integrity check mode (eseutil /g). Eseutil scans the database pages, looking for errors or mismatches in the page headers and contents. The tool doesn't fix the errors it finds.

Recovery mode (eseutil /r). Eseutil tries to clean up the database without removing any data from pages. In this mode, the tool functions somewhat like Scandisk or Chkdsk, trying to fix links between database items without touching the items themselves.

Repair mode (eseutil /p). This mode should be called D-A-N-G-E-R mode. When you use repair mode, you give Eseutil license to throw away data in as many pages as necessary to return the database to a clean, consistent state. Sometimes using this mode is the only way to fix a balky store, but you run a real risk of data loss.

When to Use These Tools
The following simple guidelines can help you determine which tool to use and when, depending on what's wrong (or what you think is wrong) with your private or public stores.

Let's start with the less dangerous tool, Isinteg. Because Isinteg only inspects the logical (as opposed to the physical) structure of the database, and because in its default mode Isinteg will only inspect (rather than fix) the database, the tool is relatively safe to use. (I've never seen Isinteg cause a problem when run against a healthy database, but use the tool with caution just in case.) Most administrators run Isinteg for the first time after restoring an offline backup, in which case they must run the tool with the —patch switch. You might want to run Isinteg in several other cases:

  • When an item count is inconsistent. For example, if a mailbox containing 100 messages reports its size as anything other than 100, some of the counters and pointers in your private IS might be corrupt.
  • When Exmerge or the Tools, Move Mailbox command fails on a particular mailbox. The logical structure of the mailbox (or a message inside the mailbox) might be corrupt.
  • When the IS or mail client crashes repeatedly when a user tries to access a particular message or mailbox. In most cases, a page error (which requires Eseutil to fix) will cause the crash, but running Isinteg might catch other classes of errors.

If you're running regular restore tests to confirm that you can restore your backups, running Isinteg on the databases after you restore them is a good idea. (If you aren't performing regular restore tests, start as soon as you finish reading this article. See "The Six Deadly Backup Sins," April 2000.) If Isinteg reports errors, you can run the tool on your production server.

What about Eseutil? Running Eseutil in integrity check mode is always safe, though time-consuming. If you want to set a good example for your coworkers, run isinteg —test alltests and eseutil /g on the IS every time you restore a backup to your test server. Defragmentation mode is safe, too, although it's usually unnecessary; Exchange Server 5.5 does an online defragmentation as part of its daily IS maintenance. The online defragmentation moves all the empty pages to the end of the file rather than shrinks the database file the way Eseutil does. If you perform a task that moves a lot of data out of your store, an offline defragmentation will help you recover the formerly used space.

I advise you not to run the two repair options unless one of two things happens: You call Microsoft Product Support Services (PSS) and it tells you to do so, or you're desperate to recover data (and if you're desperate, you need to call PSS anyway). I soften my stance a bit when you have a good backup that you can restore to a test server—in that case, you can restore the backup to the test box, run Eseutil, and find out whether the resulting store is still usable before you touch your production server.

Look Before You Leap
Isinteg and Eseutil aren't intrinsically dangerous, but using these tools without foreknowledge of their capabilities and possible effects is foolhardy. In particular, be aware that running these tools requires you to shut down the Exchange services—not something to do lightly—so running them on your production servers requires some downtime. Plan your use of these products carefully.Remember, the mailbox you save could be your own!