Although it’s not as obvious as in an application like Microsoft SQL Server, databases are at the heart of Microsoft Exchange Server. Databases need constant tending to remain efficient. Exchange maintenance comes in two flavors: ongoing and on-demand. This article explores the two types of maintenance, what they’re used for, and the changes Microsoft made in Exchange Server 2010—including some new cmdlets that are available in Exchange Server 2010 SP1.
The Need for Maintenance
Some people would assert that a properly designed and engineered database application should be self-maintaining. However, such Utopia has yet to be achieved in most applications—and Exchange is no different. Maintenance is needed to optimize internal database structures, remove old data that’s no longer required, and apply management policies. Most of this work occurs in the background as part of the ongoing maintenance performed within the Exchange Information Store process, whereas the Managed Folder Assistant takes care of applying the rules of retention policies to mailboxes that come under the control of these policies. (For more details about the processing performed by the Managed Folder Assistant, see the Learning Path at the end of the article.)
Exchange 2010 introduces a new database schema that marks the first overhaul of the internal structures since Exchange Server 4.0 in 1996. Previous tweaks, such as the increase in page size from 4KB to 8KB in Exchange Server 2007, helped Exchange cope with the demands of modern messaging but didn’t provide the foundation for operating in a world where a 10GB mailbox will soon be the norm, even in corporate email systems. The new schema introduced in Exchange 2010 uses a set of internal tables that belong to individual mailboxes rather than using tables that contain data for a complete database. This change doesn’t sound dramatic, but it lets the Store retrieve data much more efficiently to respond to user requests, especially as the number of mailboxes supported on a server increases to the several-thousand level commonly seen in production today. Other internal database changes, such as increasing the page size to 32KB and deferring view updates until items are requested by clients, transform the I/O profile from multiple small random I/Os to fewer and larger sequential I/Os. Essentially, Exchange 2010 processes more data in bigger chunks rather than nibbles. (Microsoft sometimes calls the use of random small I/Os “nickel and diming.”)
This approach is sensible given the swelling size of an average message from 4KB in circa 1996 to well over 100KB today, and the results are seen in a radical decrease in I/O operations per second (IOPS) generated by each mailbox. As with all aspects of performance, your mileage will vary depending on the details of your deployment, especially the storage hardware you use and how the different files (system, Exchange, databases, and transaction logs) are laid out—but in general, it’s fair to say that companies that deploy Exchange 2010 in production will experience a large reduction in I/O demand over Exchange 2007 and a massive reduction when compared with Exchange Server 2003. Microsoft’s publicity for Exchange 2010 indicates a reduction of 70 percent in I/O between Exchange 2003 and Exchange 2007 and a further improvement of about the same because of the changes made to Exchange 2010. However, such figures should be taken with a grain of salt until you verify the performance characteristics of your production servers. There’s no doubt that you’ll see improvement. The question is simply how much better Exchange 2010 performs on the type of hardware that you’ve chosen to deploy.
Operating In a 24x 7 World
Exchange has always had the capacity to perform background maintenance. The difference in Exchange 2010 is that Extensible Storage Engine (ESE) maintenance, or the maintenance done for internal database structures, is done on a24x 7 basis by default rather than within a predefined time window, which is the approach used by legacy Exchange servers. (If desired, you can create a custom maintenance window for Exchange to use.) The problem with relying on a time window is that there might be too much work to get through in the available time. This problem grows in line with database sizes, so as database sizes increase, the only solution is to assign a larger time window in hopes that you keep pace with the work.
Maintenance operations are essential for an Exchange database because they do the following:
- Remove items and mailboxes from the database (a hard delete) after their retention time expires
- Discover pages that were previously occupied by deleted items and mailboxes, and free up these pages for reuse by the database
- Validate checksums on pages to ensure that they aren’t corrupt
Exchange 2010 still performs these maintenance operations, but the big difference is that ESE scanning can now occur on an ongoing24x 7 basis, unless you disable background maintenance for a database by updating its properties, as Figure 1 shows.
Figure 1: Setting the maintenance properties for a mailbox database
When24x 7 ESE scanning is enabled for a database, the Store validates page checksums on an ongoing basis to ensure that the integrity of the database is continually verified. This is important because Exchange 2010 also includes the ability to patch single problem pages within a database availability group (DAG). Essentially, if the Store detects a problem page (one that fails a checksum check), it’s able to signal to servers that host other copies of the database to ask them to provide a good copy of the page. After a good copy is received, the Store is able to patch the database and restore its overall integrity. Automatic problem page detection and fixing is a tremendous advantage of running mailbox servers in a DAG because it removes the classic “-1018 page corruption” problem from the list of things that administrators have to worry about.
24x 7 ESE scanning isn’t the only maintenance that proceeds on a continuous basis. Exchange 2010 performs online defragmentation to keep internal structures optimized, items are removed from the database immediately after their retention period expires instead of waiting for the next maintenance window, and deleted pages are recycled so that they can be reused to store new items immediately. Finally, the Store analyzes the effect on database contiguity as transactions occur and, if necessary, the Store launches a background thread to move data between pages to make sure Exchange can fetch large chunks of contiguous data instead of resorting to a hunt and peck to find all the pages required for a transaction in multiple parts of the database.
All of these activities are auto-throttled to ensure that background maintenance never takes away from the ability of the server to handle client requests. In other words, in times of peak demand, Exchange limits the amount of background maintenance and then increases background maintenance when user demand drops.
As you can see in Figure 2, some additional CPU cycles and I/O are necessary to perform the processing required by maintenance on a24x 7 basis, such as shuffling pages around. However, this shouldn’t be a concern for most modern multi-core servers, especially given the gains made on I/O elsewhere.
Figure 2: Evidence of background maintenance
Given all the automatic maintenance that’s going on in the background, administrators have less reason to intervene to perform on-demand maintenance on Exchange 2010 servers. However, we still don’t live in a perfect world, and administrators must be able to recognize the two basic types of database corruptions that occur: logical and physical.
Logical errors are evident in problems such as an incorrect count in a folder or a view that doesn’t include all the items that it should for some reason. Logical errors often result from a client-side bug in which a client manipulates items in a folder but fails to update Messaging API (MAPI) flags properly. These problems are usually tolerable in that you can function perfectly well even when errors are present in a folder or mailbox. Some users don’t even realize that errors exist. After all, if Microsoft Outlook reports that a folder holds 1,119 items, will anyone take the time to count all the items to verify that Outlook has correctly reported the count provided to it by Exchange?
Physical errors are far worse in terms of their effect on the smooth running of an Exchange server because they can render a database completely inaccessible to users. In the past, a physical error or corruption could be caused by a software bug or hardware failure. Today, the vast majority of physical errors are caused by hardware, such as problems in a disk controller when it attempts to write an updated page correctly back into a database. Physical corruption causes data loss if pages that hold indexes and mailbox contents can’t be fixed.
In previous versions of Exchange, on-demand maintenance is performed with two command-line utilities provided as part of the Exchange toolkit. ISINTEG (the Information Store Integrity maintenance utility) takes care of logical errors; ESEUTIL (or even EDBUTIL if you remember back that far) handles problems at a much lower physical level, in the bowels of the database. Both utilities are throwbacks to the days when it was acceptable to take databases offline for several hours to perform preventive maintenance. As such, these utilities are anathema to administrators. Given the size of mailbox databases today, it could take several hours for a utility to complete processing, creating a potentially huge effect on the ability to meet service level agreements (SLAs) and other operational requirements.
A New Approach to Fixing Logical Corruptions
ISINTEG isn’t used in Exchange 2010 because Microsoft didn’t do the work to update the utility to reflect the new database schema. In fact, the change in focus within the schema from tables that work across the entire database to those that are specific to a mailbox means that it’s increasingly rare to encounter logical issues that interfere with a database—and if you find a problem with a mailbox, a simple mailbox move from one database to another is often sufficient to sort out problems with structures, such as named properties, views, and item counts. The reason a mailbox move fixes these problems is that the move operation essentially rebuilds the new mailbox in the target database and therefore eliminates many logical problems as data is moved. (For more information about how Exchange 2010’s move operations work, see “Moving Mailboxes the Exchange 2010 Way.”)
In Exchange 2010 SP1, Microsoft completed the move away from ISINTEG by providing a new set of repair cmdlets for mailbox and public folder databases to allow administrators to create repair requests that address the most common causes of corruption for views and item counts. These include the following:
Search folder corruptions (mailbox)
Incorrect aggregate counts on folders (mailbox)
Incorrect contents returned by folder views (mailbox)
Public folder replication state
Public folder view verification
Public folder physical corruption
These repair cmdlets use roughly the same model as Exchange 2010 mailbox move, import, and export requests in that an administrator creates a repair request that’s queued for processing by the Store, which then performs whatever repairs are required asynchronously with the database online. There’s no need for the user to log out of his or her mailbox while the Store examines and adjusts internal mailbox structures. There’s no UI available in Exchange 2010 SP1 to allow repair requests to be generated from the Exchange Management Console (EMC) or the Exchange Control Panel (ECP), so everything has to be managed using Exchange Management Shell (EMS) commands. Also, you can’t run mailbox or public folder repair requests against legacy Exchange servers because this functionality depends on the Active Directory (AD) schema updated by Exchange 2010 SP1.
The New-MailboxRepairRequest cmdlet creates a repair request for a mailbox, whereas the New-PublicFolderDatabaseRepairRequest cmdlet creates a repair request for a public folder database. For example, this command creates a mailbox repair request to check that folder views are valid:
If you add the -DetectOnly parameter to the request, Exchange will report any corruption that it finds but won’t repair it. The other corruption types that can be fixed in a mailbox are SearchFolder, AggregateCounts, and ProvisonedFolder. These repairs fix problems with search folders, counts on folders, and provisioned fields.
You can perform several repairs with one pass through a mailbox by specifying a list of the different fixes that you want to make. For example:
The Archive parameter defines whether or not the Store scans the mailbox’s personal archive. If omitted, the archive isn’t processed—so to include the archive in the repair, we need a slightly modified command:
You can also scan all the mailboxes in a database at one time to fix any corruptions that are found in any mailbox. For example:
Only one type of corruption can currently be fixed for a public folder database. This is the replica list, which is repaired as follows:
When you submit a new mailbox or public folder repair request, Exchange responds with a task identifier and the name of the server that will handle the request, asFigure 3 shows. This is the mailbox server that currently hosts the active copy of the database or where the public folder database is mounted.
Figure 3: Submitting a mailbox repair request
The only evidence of the progress that Exchange makes with the repair exists in the application event log, which captures event 10047 when a mailbox repair request is initiated (or event 10059 when you request repairs for a complete database) and event 10048 when it’s completed successfully and no corruptions remain in the mailbox, asFigure 4 shows. These events are logged on the server that processes the request. If a corruption is detected, Exchange logs event 10062 with the details of the corruption that was found and the results of the action. Note that the Store might need to make several repairs before it can eliminate all problems from a mailbox, so you need to continue running repairs until event 10048 is logged to report a clean mailbox.
Figure 4: Viewing details of a mailbox repair request logged into the event log
To ensure that performance isn’t affected, you can run only a single repair against a complete database on a server at one time. However, you can run up to 100 individual mailbox repairs concurrently on a server (spread across multiple databases).
If the database has copies within a DAG, the results of any repairs made to fix problems found in the tables within the mailbox are replicated along with other transactions to the database copies and are logged as events in the application event log on the server where the repair is performed. Much the same happens when repairs are applied to a public folder database, with the exception that the repair occurs on a specified public folder database and any results are replicated using the public folder replication mechanism.
You can’t cancel or review the current status of a repair job. This functionality is likely to be added by Microsoft in a future release. For now, the only way to terminate a repair job is to dismount a database or move the database to another server (or if the database crashes because of a software bug). These actions clear out any repair jobs that might be active within the database.
The Myth Around ESEUTIL
At times, it seems as if some commentators endowed ESEUTIL with mythical abilities to cure all known problems in Exchange databases. Furthermore, they recommended that ESEUTIL should be run regularly to compact and repair databases so that the database would be as efficient as possible. Let’s be clear: This is a myth and a fallacy that should be consigned to the wastebasket as quickly as possible. My view is that ESEUTIL is brain surgery for Exchange databases, because if ESEUTIL isn’t run by an experienced practitioner for the right reasons, it can turn a database into an incoherent lump.
There was a time when running ESEUTIL against a database was the only way to return space to the storage subsystem and fix internal problems. That time passed at the start of the present decade when Microsoft finally figured out how to make background maintenance recycle deleted pages efficiently. Many of today’s administrators were still in short pants—it’s that long ago!
There are still good reasons to run ESEUTIL, but not on an ongoing basis and certainly not to free up disk space. You might need to run ESEUTIL to make a backup copy of a database consistent before it can be mounted as a recovery database, or you might be advised by Microsoft Customer Service and Support (CSS) to run ESEUTIL to fix a low-level problem in the database that can’t be fixed with the repair cmdlets—in this instance it’s almost sure that some data loss will occur, because ESEUTIL will drop any page that it can’t repair.
Databases operating within a DAG have a major advantage over non-replicated databases in that they can patch single problem pages by requesting good data from another database copy. The requested data is replicated in the transaction log stream and replayed by the Store to patch the problem.
Aside from the cases that I outlined, I can’t think of a good reason why I would want to dismount a database and remove access from users to run ESEUTIL for several hours to pursue some ethereal improvement that might or might not be applied to the database. In a production environment, this just doesn’t make sense.
The Facts of Life
Database maintenance is a fact of life for Exchange administrators. Most of the work is automatic and progresses behind the scenes, but there are some on-demand actions that must be taken to fix problems that occur at logical and physical levels. The new repair cmdlets introduced in Exchange 2010 SP1 are a welcome advance because they allow on-demand logical repairs to be performed online. However, we’re still grappling with the command-line ESEUTIL utility—surely it must be next on the list for Microsoft to modernize and update!
To learn more about Exchange Server 2010’s retention policies: