Microsoft has always had a blind spot when it comes to getting data into or out of Exchange Server. Applications that include persistent data repositories typically provide a mechanism to let administrators load data into or export data out of the repository—but not for Exchange. Data import mechanisms are used to load user data into mailboxes, often during a migration from another email system, whereas data export mechanisms are used for the reverse purpose, usually to extract data for later examination, in scenarios such as legal discovery when companies need to provide copies of messages and attachments for legal review.
Over the past 15 years, since Microsoft first released Exchange, administrators have had to resort to client-side tools to load or export data from mailboxes. In some cases these tools have been highly functional. The ExMerge utility is a good example; this tool began as a solution created to help customers move mailbox data into Exchange from other email systems and evolved into an essential part of an Exchange administrator’s arsenal. Outlook is the most common mechanism used to access mailbox data. Outlook is an adequate solution for quickly browsing a mailbox to extract items to a PST file, but it’s not designed to import gigabytes of data—nor is it scriptable or programmable by the average administrator who simply wants to extract user data to satisfy a request for information. Skilled programmers can use the Outlook Object Model (OOM) to extract data, but the necessary code often isn’t available when you need it. Thus many administrators resort to programs such as ExMerge.
Client applications depend on MAPI libraries to access the contents of Exchange databases. The most accessible MAPI libraries in terms of documented APIs are those provided with Outlook. You must therefore install Outlook on an Exchange server before you can run a client that depends on the MAPI libraries. This dependency also exists for Exchange Server 2010’s and Exchange Server 2007’s Import-Mailbox and Export-Mailbox cmdlets. When Microsoft released Exchange 2010 in October 2009, you had to install a 64-bit version of Outlook on your Exchange 2010 mailbox servers before you could run the cmdlets to import or export data. However, the 64-bit version of Outlook wasn’t formally released until Outlook 2010 came along in April 2010—which illustrates the problems that occur when one product depends on another but development schedules aren’t aligned. In reality, Exchange’s import/export capability was a mess for years and in dire need of a redesign.
Exchange 2010 SP1 discards previous import/export approaches in favor of a new model based on import and export requests managed by the Microsoft Exchange Mailbox Replication Service (MRS). Data moves into and out of Exchange via a new data provider that’s integrated into the Client Access server role and isn’t dependent on any other product. The old Import-Mailbox and Export-Mailbox cmdlets are eliminated in SP1, which means that you must rewrite any PowerShell scripts to automate data loads or extracts that depend on the cmdlets. A new set of cmdlets exist in SP1 to create mailbox import and export requests, retrieve their status, report their disposition, and so on. This new mailbox import/export approach is similar to Exchange 2010’s mailbox move model.
Preparing for Import and Export Operations
Before we look at the details of the new cmdlets and explore some examples, let’s examine the prerequisites. First, administrators can’t import or export mailbox data unless they have explicit permission to do so. This restriction is to protect the integrity of user mailboxes because you obviously don’t want every administrator to be able to manipulate user data. The ideal situation is to restrict access to a limited set of administrators and to audit access regularly. Access is gained through membership in the Role Based Access Control (RBAC) Mailbox Import Export role group. Users who are members of this group can run the new cmdlets, whereas users who aren’t members of the group won’t be able to queue mailbox import or export requests. In fact, they won’t even be able to run the cmdlets. Exchange won’t load the cmdlets into these users’ Exchange Management Shell (EMS) session because their accounts aren’t members of the Mailbox Import Export role group.
Second, all data is imported from PSTs and exported to PSTs. No other data format is supported—which shouldn’t be an issue, because the PST format is the de facto standard for moving data into and out of Exchange. Output PSTs created by Exchange use the latest Unicode format. Exchange can import data from PSTs in both Unicode and the older ANSI format. Data can be imported from a PST into several different mailboxes, but only a single mailbox import request can access a specific PST at a time. No other client can access a PST while Exchange is using it.
Third, PST files that contain data to be imported into mailboxes must be placed in a file share that allows read/write access for the Exchange Trusted Subsystem group, as Figure 1 shows. Exchange 2010 uses the Exchange Trusted Subsystem group to let Exchange servers access secure data. You don’t need to grant any other access to the file share. Likewise, when Exchange exports data out to a PST, it writes the PST into a secured file share. The reason Exchange uses a file share is simple: All mailbox processing is performed by the MRS running on the Client Access server. It’s possible to assign a mailbox import or export request for processing by a specific MRS instance—but if you don’t, any MRS instance running on the Client Access server can process the request. Thus you can’t assume that the MRS instance will run on the same server on which the data to be imported is located. The data location must be accessible to any Client Access server in the site—hence Microsoft’s decision to use a file share.
Although Microsoft did a nice job of introducing the new import/export model in Exchange 2010 SP1, the nature of software development is such that the first version can’t address everything. In this case, you have to run all mailbox imports and exports through EMS because Microsoft wasn’t able to upgrade the previous mailbox import and export wizards that are available in the Exchange 2010 version of Exchange Management Console (EMC). Forcing administrators to use EMS isn’t a great hardship, and it’s certainly a good tradeoff to be able to use the new model’s increased level of functionality, as well as drop the silly requirement to install Outlook on an Exchange server before you can access mailbox data.
Table 1 lists the new cmdlets that Exchange 2010 SP1 provides to import and export mailbox data. The older Import-Mailbox and Export-Mailbox cmdlets are no longer available in SP1.
The Exchange 2010 SP1 Help file contains many useful examples of how to use the new cmdlets, including the format required by the different parameters. The new cmdlets’ wide range of functionality gives third-party developers and consultants wide scope for building future tools to automate mailbox imports.
Importing Mailbox Data
Let’s explore how the cmdlets are used to import mailbox data. First, we create a new mailbox import request, like so:
This command creates a new import request for the mailbox with the alias TRedmond and identifies that the source PST file is located on a file share called Imports on a server called ExServer1. After it’s created, the import request is held in an Active Directory (AD) queue. The request will be processed by the first MRS instance in the site to become aware of the request.
By default, Exchange checks for duplicate items when it imports data into a mailbox; it doesn’t create a copy of an item if it already exists in the target mailbox (the message identifier is used to detect duplicates). In this case, the ConfictResolutionOption parameter specifies that if a duplicate is detected during import, Exchange should keep the latest version of the item. The other options are KeepAll (keep all versions) and KeepSourceItem (keep the version of the item from the import PST).
In the example code, a unique name called Import-TR is provided for the import request. This parameter is optional; if you don’t provide a name, Exchange will use the default MailboxImport. If you create multiple import requests for the same mailbox, Exchange will use names such as MailboxImport1, MailboxImport2, MailboxImport3, and so on to uniquely identify each import operation. The mailbox name and the request name are combined and used to retrieve information about the import request.
Assigning a specific name to an import request becomes important if you want to run multiple concurrent imports for the same mailbox, each of which processes data from a different PST. (You can’t run concurrent imports from the same PST.) In this scenario, the default names assigned by Exchange work perfectly well, but it’s easier to track each job’s progress and troubleshoot errors if you assign more meaningful names, such as the name of the source PST.
You don’t have to import everything in a PST because the ExcludeFolders and IncludeFolders parameters let you control exactly which folders Exchange imports. For example, to import just a few named folders, we can pass their names as follows:
If you wanted to include all the folders under a specific root folder, you’d pass the name of the root folder as follows:
In this example, all the data in the subfolders under the Projects folder will be imported. If you need to navigate to a specific folder deep in the hierarchy, you can pass its name like so:
PSTs often contain associated items, which are hidden items used by Outlook to store data such as rules, forms, and views. Exchange doesn’t import associated items unless you configure it to do so by setting the AssociatedMessagesCopyOption parameter to Copy. In most cases, you can avoid copying associated items from a PST because an equivalent associated item is likely to already exist in the mailbox. An exception would be if an application required forms that you knew didn’t already exist in the mailbox.
After you submit a job and the MRS starts to process it, you can retrieve progress information. In our sample scenario, we’d use the Get-MailboxImportRequest cmdlet, as follows:
Note that the name of the mailbox (TRedmond) and the request (Import-TR) are combined to form a unique identity for the job we’re interested in.
The Get-MailboxImportRequest cmdlet supports several parameters to let you retrieve the status of different groups of jobs.
- The BatchName parameter fetches details of all requests that belong to a specific named batch.
- The Database parameter fetches details of all requests that belong to mailboxes in a specified mailbox database.
- The Status parameter fetches details of all requests with a specified status. Valid status codes include Completed, InProgress, Queued, CompletedWithWarning, Suspended, and Failed.
To report the progress of an import, you can use the Get-MailboxImportRequestStatistics cmdlet to discover how much data has been transferred. Initially you’ll see that the MRS creates the folder hierarchy in the target mailbox to accept the imported data; then you’ll observe an increasing count of transferred items as the MRS moves data from the PST into the mailbox. For example:
The Get-MailboxImportRequestStatistics cmdlet reveals a lot of information. Thus it’s a good idea to limit the properties returned, to reveal only essential data about the import operation. I typically use the command that Listing 1 shows. Figure 2 illustrates the New-MailboxImportRequest, Get-MailboxImportRequest, and Get-MailboxImportRequestStatistics cmdlets in action.
When the import finishes, you can use the Get-MailboxImportRequestStatistics cmdlet to retrieve a report of everything the MRS did to populate the mailbox with data from the PST:
The report is dumped to screen by default. However, piping the output to a text file results in a more convenient report that’s easier to read and that contains a lot more information. The mailbox import report is divided into summary information at the beginning of the report, followed by detailed information about the processing of each folder from the source PST into the target mailbox. Important information includes:
- The name of the source PST
- The name of the target mailbox and the database where it’s located
- The current status of the job (e.g., completed, with no warnings)
- The number of bad items encountered during processing (three, which is less than the five-item limit specified in the BadItemLimit parameter in the sample command)
- The start and end time for the job and the name of the MRS that processed the job
- The total number of items and their size transferred from the PST into the mailbox
- Whether any folders were explicitly excluded or included
Exporting Mailbox Data
Much the same approach is followed to export data from mailboxes. However, a different set of cmdlets is used. You run the New-MailboxExportRequest cmdlet to create a new export request. For example, to export a complete mailbox to a PST, you might use a command like this one:
This command takes all the content from the nominated mailbox and writes it out to a PST in the file share location. The contents of the dumpster folders are excluded from the operation. If the PST isn’t present, Exchange will create a new file; otherwise if you pass the name of an existing PST, the MRS will write the exported data into it.
Experience demonstrates that it’s more common to apply qualifiers to filter or restrict the information exported from a mailbox than it is when you import a PST into a mailbox. For example, if you respond to a legal discovery action, you’re probably only required to provide copies of specific relevant information rather than a complete dump of a user’s mailbox. Exchange uses several parameters to control the data that’s exported.
- The SourceRootFolder parameter specifies a folder in the mailbox to use as the base of the export. If this parameter isn’t passed, Exchange exports the complete contents of the mailbox. For example, the following command exports only items that are stored in the Project Bluesky folder and any of its subfolders.
In this instance, the Project Bluesky folder is a subfolder of the Projects folder.
- The TargetRootFolder parameter specifies a root folder in the target PST to create the folders exported from the mailbox.
- The IncludeFolders parameter specifies one or more folders that are to be exported. For example:
- The ExcludeFolders parameter specifies one or more folders that are excluded from the export.
- The IsArchive parameter specifies that the export should be done for the user’s personal archive rather than the primary mailbox.
Once again, the Exchange 2010 SP1 Help file contains all the necessary instructions for using these parameters.
During a migration project, you might have to import many gigabytes of information into Exchange. Even with the latest release of Exchange 2010, this isn’t a quick task. Some planning is necessary to import information into user mailboxes.
You might need to adjust destination mailbox quotas for the PST data to import. A quota increase might be permanent, or you might be able to adjust it down again after the user has a chance to review the imported material and decide whether it should be kept in the mailbox or moved to an online archive. Increasing the quota for a set of destination mailboxes before an import is a relatively simple procedure to script—but having to adjust quotas is indicative of the manual nature of the processing that PST imports can require.
One way to determine the size of the mailbox quota is by referencing the size of the PST to be imported. Although this approach is reasonably effective, the size of a PST on disk doesn’t directly equate to the amount of data that will be imported into the mailbox. The PST file structure imposes a “tax” or overhead of approximately 20 percent over what’s required to store items in an online mailbox or archive. You need to ensure that sufficient quota is available in target mailboxes before you begin to import data—or even better, temporarily increase the quota by an excessive amount rather than run the risk that an import will fail because of quota exhaustion.
Optionally, you can import data into archive mailboxes rather than primary user mailboxes. (Use the IsArchive parameter to instruct Exchange to import into the archive.) SP1 lets you place archive mailboxes in databases other than primary mailboxes. You can also use this feature to create databases that are dedicated archive repositories. Remember to check and perhaps adjust archive quotas for target mailboxes before importing data.
Every item that’s imported into a mailbox creates a transaction that the Exchange store must capture in a transaction log. I/O demand spikes during the import as the databases that host the target mailboxes commit transactions to accommodate the incoming PST data. Further I/O and CPU activity occurs to add items to the content indexes maintained for the target databases. More I/O is generated if users move information from their primary mailboxes into personal archives after the import operations are complete.
If the import occurs inside a database availability group (DAG), similar I/O spikes are experienced on servers that host copies of the databases that host target mailboxes because of replication, replay, and indexing activities. This situation has the potential to create a tsunami of I/O activity across the DAG.
The MRS uses the MaxSendSize transport configuration setting to control the maximum size of items it can import into a mailbox. The default size is 10MB. If you want to import larger items, you need to run the Set-TransportConfig cmdlet to increase the setting. For example:
Make sure you remove any passwords from the PSTs that you want to import data from because there’s no way to provide a password to the cmdlets. Likewise, you can’t place a password on a PST when you create it during a mailbox export operation. You must therefore take steps to protect the PST information held in the file share to ensure that the files can’t be opened by unauthorized clients. Remember that unlike an OST file that can be opened only when a client has knowledge of the mailbox that owns the replica folders inside the OST, any MAPI client can open a PST file. In addition, the presence of a password on a PST doesn’t guarantee its security because many utilities are available on the Internet to crack open a PST in a matter of seconds.
Because Exchange 2010 SP1 is still very new in production environments, it’s difficult to characterize how efficiently an individual Exchange server will be able to handle multiple concurrent import or export operations. For this reason it’s best to schedule these operations at times of low user demand to avoid competition with the I/O and processing demand created by normal user activity. The ability of an Exchange server to process mailbox data varies from server to server and is highly dependent on current load and system capability. Disk capacity and I/O throughput are obviously important. As an example, importing a 1.35GB PST containing 7,450 items took a server under moderate load 15 minutes to process. Using a tabmore powerful server or scheduling the work to occur at times of low user demand will increase the throughput; most servers should be able to import 8GB per hour.
Look to the Future
You can’t underestimate the huge progress Microsoft made by introducing the new mailbox import/export model in Exchange 2010 SP1. A cynic would say that Microsoft merely cleaned up a festering sore that existed in the product since it was first released—and there’s some truth in this statement. The history of depending on software such as Outlook and ExMerge certainly isn’t one of the high points in Exchange functionality. However, Exchange 2010 SP1 provides such an elegant solution to mailbox import and export that it’s pointless to dwell on past limitations.