One of the most important new features in Microsoft Exchange Server 2010 is the introduction of Personal Archive mailboxes. Many third-party software vendors also provide archive solutions for Exchange.

These solutions have worked for many years, and they often provide more developed features than Exchange 2010 offers in areas such as data ingestion and more sophisticated compliance functionality. In addition, third-party solutions typically don't focus only on Exchange but also offer the ability to archive data from other sources, such as Microsoft SharePoint, websites, and file shares.

Finally, because they've stood the test of time, third-party solutions have real-life case studies and customer references about how to effectively manage the growth of information associated with Exchange over a sustained time period. But it's just too early to have the same degree of information about Exchange 2010 archiving.

Because archive mailboxes are built in to Exchange 2010 and the UI to access archives is available in the latest Microsoft clients, many companies—especially those that have never deployed archiving before—are attracted to the prospect of offloading information from primary mailboxes (the mailboxes used to send and receive messages) into an archive mailbox that's accessible online and can be managed through other Exchange 2010 compliance features, such as retention policies and multi-mailbox discovery searches.

I can't think of any technology that can simply be introduced to a large group of users without some degree of planning—and deploying archive mailboxes without planning ahead is an especially bad idea.

 

How Archives Work

When you enable a mailbox for an archive, you instruct Exchange to create a second mailbox that's marked for use as an archive. This mailbox can be held in the same mailbox database as the primary mailbox, or it can be assigned to a completely different database (in Exchange 2010 SP1 and later). It can also be situated in the cloud within a Microsoft Office 365 domain if you configure the hybrid connection required to link on-premises Exchange 2010 and Office 365.

The separation of primary and archive mailboxes opens up design possibilities such as creating specific databases to store nothing but archive mailboxes (an archive database) or placing a set of archive databases on a dedicated mailbox server (an archive server). Both of these approaches have pros and cons, which are beyond the focus of this article.

An archive mailbox is simply another form of mailbox that's stored in a mailbox database. You can work with folders and items in the archive mailbox in exactly the same manner as you can with the primary mailbox.

The primary and archive mailboxes for a user are linked through a globally unique identifier (GUID) maintained in the ArchiveMailboxGUID property of the user mailbox. GUIDs are 64-bit numbers that mean nothing to human beings. But in this case, they allow Exchange to locate a mailbox's archive no matter which database it's stored in.

 

 

 

To discover which mailboxes in an organization have archives, you can use the fact that mailboxes with archives have the ArchiveGUID property populated. Enter the following command, which looks for any mailbox where the link to an archive is not null:

Get-Mailbox -Filter {ArchiveGUID -ne $Null}

As Figure 1 shows, if you examine the archive-related properties that Exchange 2010 SP1 maintains for a mailbox, you'll see the database that holds the archive (ArchiveDatabase), its GUID, the name of the archive as displayed by clients, and the quotas that are used to control the point at which Exchange flags warnings about an approaching limit (ArchiveWarningQuota) and the point at which it's no longer possible to store more data in the archive (ArchiveQuota).

Redmond_WIN2361_Fig1_0
Figure 1: Archive properties on a mailbox

The ArchiveDomain and ArchiveStatus properties are used only when an archive is stored on an Exchange server in the cloud running on Office 365.

Clients and Archives

The introduction of archive mailboxes is a major upgrade for the Exchange Store, so it shouldn't come as a surprise to discover that not all clients are capable of revealing the presence of an archive or include the UI necessary to let a user interact with data held in the archive, including the UI to reveal retention policies and tags. In fact, full functionality is currently limited (at press time) to Microsoft Outlook 2010 or Outlook Web App (OWA).

In December 2010, Microsoft released an update for Outlook 2007 SP2 to let the Outlook 2007 client access archive mailboxes. This code works, and the archive mailbox is displayed like any other repository, such as a personal folder store (PST).

However, Outlook 2007 doesn't include any of the UI features necessary to display data such as retention policies and tags—so some important functionality is invisible to users. In addition, Outlook 2007 doesn't perform searches automatically across primary and archive mailboxes in the same transparent manner as Outlook 2010 does.

(For more information, see the Exchange team's blog post "Yes Virginia, there is Exchange 2010 archive support in Outlook 2007.")

Note that Autodiscover is the component that lets Outlook know about the presence of an archive mailbox for both Outlook 2010 and Outlook 2007. After you enable an archive mailbox, Autodiscover will detect its presence the next time Outlook starts up, and the archive will be automatically listed in the set of available repositories.

 

 

 

Neither Outlook for Mac 2011 nor any ActiveSync client supports access to archive mailboxes, probably because the underlying APIs (Exchange Web Services and ActiveSync) haven't yet incorporated the necessary API calls to open and manipulate archive mailboxes. You also can't work with archive mailboxes using a BlackBerry device because BlackBerry Enterprise Server doesn't support the necessary access.

All earlier versions of Outlook remain blissfully unaware of the archive but are able to access mailboxes on an Exchange 2010 server. POP3 and IMAP4 clients don't care about archives because these interfaces weren't designed to support a division of storage between primary and archive mailboxes.

Figure 2 shows how OWA presents an archive to a user. In this case, the user has opened a folder in the archive and is reading an item. The menu is revealed by a right click, and you can see that Retention Policy is shown to let the user select a retention tag to apply to the item. If an item were open in the primary mailbox, the user might also see Archive Policy listed in the menu to let the user apply tags to control when Exchange moves items into the archive mailbox.

Redmond_WIN2361_Fig2_0
Figure 2: Accessing documents in an archive with Outlook Web App

 

Exchange's Default Retention Policy

When you enable a mailbox to have an archive, Exchange automatically assigns a retention policy to the mailbox unless a retention policy is already assigned. The logic here is that the user who owns the mailbox can now move items into the archive. Exchange wants to make this process as easy as possible, so it makes sense that the user should be provided with some retention and archive tags to help manage the movement of items into the archive. Figure 2 provides a good example of how the tags in the policy are exposed by OWA.

Exchange's developers created a set of tags and gathered them in the Default Archive and Retention Policy that's created as part of the Exchange 2010 SP1 installation. This policy contains personal tags, which users can apply to items to keep them for a predetermined period of time before the Managed Folder Assistant moves the items into the Recoverable Items folder. The policy also contains archive tags, which dictate when items are moved into the archive mailbox.

 

 

 

Most of the tags in the Default Archive and Retention Policy are personal (shown as Personal Tag in Figure 3). In Exchange, this means that a tag must be explicitly applied by the user before the Managed Folder Assistant processes the action determined in the tag.

Redmond_WIN2361_Fig3_0
Figure 3: Tags in the Default Archive and Retention Policy

Tags contain a property called TriggerForRetention that determines the date used to calculate the age of an item. This date is typically measured from the date the item is first delivered to a mailbox; you can't alter or extend the date.

For example, if you select the 6 Month Delete tag and assign it to an item, the Managed Folder Assistant moves the item to the Recoverable Items folder (the action DeleteAndAllowRecovery) after 6 months (the retention period), after first delivery to the mailbox (TriggerForRetention = WhenDelivered). The same is true for an archive tag such as Personal 5 year move to archive. In this case, the Managed Folder Assistant moves the item into the same folder in the archive (the action) after 5 years (the retention period). Retention periods are always stated in number of days, so 5 years is 1,825 days.

The highlighted tag shown in Figure 3 is named Default 2 year move to archive, which you can see applies to All other folders in the mailbox. This is a default policy tag (DPT), which means that the Managed Folder Assistant applies this tag to every item in the mailbox unless it comes under the control of another (more explicit) tag. We'll return to the effect that the DPT has on a mailbox shortly; it's sufficient for now to say that a DPT typically exerts a very powerful influence over a mailbox. Note that a retention policy can include two DPTs—one that influences when items are moved into the archive and the other that determines when items are moved into the Recoverable Items folder.

 

 

 

Exchange Server 2010 RTM displays a similar retention policy listed when you run the Get-RetentionPolicy cmdlet. This policy is called the Default Retention Policy. The difference between this policy and its SP1 equivalent is that the SP1 policy includes archive tags. The upgrade to SP1 leaves the old retention policy in place because it might have been applied to mailboxes. Removing the policy would invalidate the retention settings for user mailboxes, which would be a bad idea. However, this version of the default retention policy isn't much good because it contains a limited set of tags, so it would probably be best if you replaced it with a purpose-built retention policy designed to meet the business needs of the organization or the new default policy supplied with SP1.

Office 365 domains also use a default archive and retention policy called Default MRM Policy that includes a tag called Default 2 year move to archive that's applied to user mailboxes that have archives. Office 365 mailboxes are assigned archive mailboxes by default, so archiving is active immediately. Note that different Office 365 plans include different archive quotas. For example, the entry-level Office 365 Plan P1 for professionals and small businesses includes an archive quota of 25GB.

The Role of the Managed Folder Assistant

The Managed Folder Assistant must process a mailbox before a user sees the effects of a retention policy, including the appearance of retention policies and tags in the Outlook 2010 and OWA UIs. It's important to communicate to users well ahead of time the fact that the Managed Folder Assistant affects mailbox contents because users will inevitably panic if they think items have been lost.

The Managed Folder Assistant can only do what it's told to do, and its instructions come from the retention tags that are placed on items. Users will understand that the Managed Folder Assistant processes an item according to a tag that the user explicitly places on it. For example, if you select an item and tag it to be retained for a year, you expect nothing to happen to that item during the year.

Things get a little more dicey when the Managed Folder Assistant processes items according to the DPT, if one exists in a retention policy. Remember, a DPT dictates what happens to any item in a mailbox that isn't already under the control of another tag—so if the default tag for the retention policy applied to a mailbox dictates that items are deleted or archived after a set time period, that's what will happen.

The DPT in the Default Archive and Retention Policy specifies that items are moved into the archive mailbox after they are 2 years old (730 days), as Figure 4 shows.

Redmond_WIN2361_Fig4_0
Figure 4: Properties of the Default 2 year move to archive tag

 

 

 

 

Most mailboxes hold items that are older than 2 years. It therefore follows that the Managed Folder Assistant has some work to do immediately after the policy is applied to a mailbox. In this case, the Managed Folder Assistant scans for any item that's over 2 years old and isn't stamped with another tag and moves it into the equivalent folder in the archive mailbox. This is extremely logical from a computer science perspective but incomprehensible for most users.

Unless you tell users up front what will happen, they will perceive that items have disappeared from their mailbox. They won't think to look in the archive, where the Managed Folder Assistant has faithfully moved the missing data. Cue frantic calls to the Help desk and panic all around as every user who has been given a new archive mailbox suddenly discovers that he or she now has a slimmed-down mailbox.

Of course, there are other consequences too. Anyone who uses Outlook configured in Cached Exchange Mode is accustomed to having replicas of all their server mailbox folders available in the OST and therefore accessible even when a network connection is unavailable to the Exchange server. However, items in archive folders aren't synchronized down to the OST and are therefore unavailable when the user is working offline. Again, this is logical because an archive is intended as a repository for information that's infrequently accessed, but that's probably not how the user who has just lost access to his data sees the situation.

It's also important to understand that the DPT exerts an ongoing influence over the mailbox and will be applied by Exchange to new items as they enter the mailbox, whether they arrive as new messages or are imported into the mailbox from a PST.

Stressing Mailbox Servers

Applying retention policies to mailboxes can force the Managed Folder Assistant to do a lot of work as it processes items, especially if it has to move items from one database to another—as in the situation when the primary mailbox is in one database and the archive is in another. It's easy for an administrator to run a command to apply a retention policy to a group of mailboxes. For example, the following command creates a list of mailboxes in a specific database and then applies a retention policy to the mailboxes:

 Get-Mailbox -Database 'DB1' |
  Set-Mailbox -RetentionPolicy
  'Management Retention Policy'

Running this command takes a matter of seconds, even if it has to process hundreds of mailboxes. By comparison, the Managed Folder Assistant has to crank into action and do a lot of work to process all those mailboxes. The Managed Folder Assistant in Exchange 2010 SP1 is more efficient at processing mailboxes because it schedules work automatically across the entire day rather than being constrained to a limited time window, as in Exchange 2010 RTM and Exchange Server 2007.

Formerly, the Managed Folder Assistant was given a fixed time period within which it could process mailboxes. Sometimes not all mailboxes could be processed within the set time period, leading to inconsistent results for users.

 

 

 

Exchange 2010 SP1 includes the concept of work cycles, which is a way to assign workloads to Exchange components that must be processed within a certain time period. It's then up to Exchange to figure out how best to perform the necessary work within the allotted time period.

In the case of the Managed Folder Assistant, its work cycle is 1 day, meaning that it's expected to process every mailbox on a server at least once daily. In effect, this means that the Managed Folder Assistant might be active throughout the day but will automatically throttle back its processing to respect the current workload of the server so that it doesn't interfere with the ability of the server to satisfy user demand. Interestingly, Microsoft uses a different work cycle for the Managed Folder Assistant on Office 365 servers to process mailboxes on a weekly basis.

Even with the more efficient work cycle, a lot of work must still be done to locate items that are now under the control of the policy and to apply the actions determined in the retention tags. As an example, let's assume that you apply the Default Archive and Retention Policy to 500 mailboxes and that each mailbox has 2,000 items that are over 2 years old. The Managed Folder Assistant must now process 100,000 items and move them into an archive.

Processing 100,000 items won't happen quickly, and it creates a reasonable demand for CPU and I/O on the mailbox server that hosts the user mailboxes, as well as on any other server that's associated with the activity, such as the servers that host the databases that contain the archive mailboxes.

Given the world of multi-core high-end servers used to run Exchange 2010 and the automatic throttling used by the Managed Folder Assistant, it's difficult to be more precise about the workload generated on any particular server. However, it's fair to say that enabling archives can cause a spike in demand if the Managed Folder Assistant has many mailboxes to process for the first time.

However, you might not notice the extra demand if this work occurs at night when the servers are otherwise not busy. If you run replicated database copies within a database availability group (DAG), additional resources are consumed to replicate and replay the transaction logs containing the transaction generated by Managed Folder Assistant activity on the servers that host the database copies.

Processing Older Items

Another fact that users won't be aware of is that the Managed Folder Assistant applies the retention policy to all items in the mailbox. Again, this is totally logical because there's no point in assigning a retention policy to a mailbox unless Exchange ensures that its directives are respected.

In most cases, because the DPT typically specifies a reasonably long retention period (2 years or more), the DPT doesn't have an immediate effect on items such as new messages that arrive in the user's Inbox. The retention countdown clock starts as soon as Exchange creates items in the mailbox, and the DPT will eventually move items into the archive or Recoverable Items folder—but only after their retention period expires. The situation becomes more interesting when you introduce older items into the equation.

Older items might be stored today in PSTs. I have items in PSTs that go back to the original Exchange Server 4.0 version shipped in 1996. Apart from laziness, I don't have a good reason why these items are still around. However, the point is that the typical user isn't good at cleaning out old items (which is why the need exists for the kind of automatic mailbox cleanup that you can implement with retention policies). Because users are human packrats, a lot of the items in their PSTs are probably not required and certainly should never occupy valuable space in an online database.

 

 

 

After you import data from a PST into a primary or archive mailbox, the Managed Folder Assistant examines the imported items the next time it processes the mailbox. It's probable that the Managed Folder Assistant will discover that a high percentage of items exceed the retention period specified in the DPT and will therefore apply whatever action is stated in the DPT. Thus, you can imagine a situation in which the following occurs:


1. The company decides that PSTs are difficult to manage and that a project will be created to import data from PSTs into Exchange 2010 mailboxes (primary or archive) so that all data in the company is accessible to multi-mailbox discovery searches. A similar approach might be taken if the company decides to migrate data from a third-party archive system used with a previous version of Exchange. Given that there are no migration features built in to Exchange 2010 to ingest data from third-party archives, it's likely that you'll have to take a two-phase approach to the migration and move data out of the third-party archive into PSTs and then import the data from the PSTs into Exchange 2010. Third-party products are available from companies such as TransVault Software  and Sherpa Software that can move items from different archives into Exchange. Because archiving products sometimes compress data, it's likely that some careful exercises in calculation will be required to estimate the data storage necessary to hold information as it passes from the third-party archive into Exchange 2010 SP1.

2. Administrators proceed to gather PSTs from users and import the data using the New-MailboxImportRequest cmdlet available in Exchange 2010 SP1. Note that this isn't an exercise that you should perform without careful planning because of the strain that the imports will exert on mailbox servers. It's also worth noting that you can import data directly into an archive mailbox with the New-MailboxImportRequest; no interim movement through the user's primary mailbox is necessary. Again, third-party software vendors have tools that can help you locate and ingest PST contents into Exchange 2010. Microsoft has promised to provide a PST ingestion tool (see the Exchange team's blog post " Coming Soon: PST Capture Tool"). However, that software hasn't yet appeared, even in beta—and even when it does, it's possible that the greater experience of the third-party software vendors in this space will mean that those vendors' tools will continue to be more functional and sophisticated. Useful abilities here include PST discovery on laptop disks, automatic removal of PSTs and blocking after their contents are imported, and policy-driven imports (e.g., import information only from the past 3 years and delete everything else that's found in the PST except in certain folders).

3. Users are happy to see all their PST data in their online mailboxes.

4. The Managed Folder Assistant runs to process user mailboxes and discovers all the items that have been imported from PSTs. The items are deemed to exceed the retention period specified in the DPT, and depending on where the item is stored (primary or archive mailbox) and the retention action determined by the DPT, the Managed Folder Assistant either moves the items into the Recoverable Items folder or into the archive mailbox.

5. Users now discover that some of their PST data is no longer where it was after the initial import and ask the Help desk what's going on.

Careful planning is necessary to understand exactly what happens to items after they're imported and how users might perceive the consequences. After you understand the flow, you can translate it into terms that users will understand and present what will happen in a positive and proactive manner.

 

 

 

Successful Deployment

To successfully deploy archive mailboxes in Exchange 2010 SP1, you need to consider several factors. The following list is a good place to start.


1. Determine whether archive mailboxes satisfy the business needs of your organization. There's no point in deploying technology if it doesn't satisfy a business requirement. Remember that archives require enterprise CALs (eCALs), so factor this cost into the evaluation unless you already use eCALs for other purposes.

2. Determine the mailboxes that will be enabled with archives.

3. Determine whether the archive mailboxes will be in the same databases as their primary counterparts.

4. Determine what retention policies will be used within the organization and what policy will be assigned to different user groups. It's a good idea to assign the retention policies to mailboxes before you enable them for archives because this means that Exchange won't automatically assign the Default Archive and Retention Policy to the mailboxes.

5. Well before any retention policies are assigned, tell the people who own the mailboxes what the effect of the policies will be (e.g., when items will be moved to the Recoverable Items folder). Inform users about personal tags and how they can avoid the effect of the DPT by applying personal tags to individual items, complete folders, or conversations.
6. Make sure that the affected users have clients that reveal retention and archive tags. Outlook 2010 is best, OWA is acceptable, and Outlook 2007 is acceptable but has limitations.

7. A week before implementation day, remind users about the retention policies and their effect on mailboxes.

8. The day before implementation, send an email message to inform users that they might notice that some items have been moved into folders in the archive after the Managed Folder Assistant runs and processes their mailboxes. (Don't tell users about the Managed Folder Assistant; tell them what will happen in terms they will understand.) Explain how users can retrieve items from the archive and how easy it is to move items between the primary and archive mailboxes. Explain that items in the archive aren't accessible offline.

9. The day after implementation, make sure the Help desk is ready to handle calls from users who think they've lost items. Provide Help desk staff with a 1-2-3 cheat sheet for dealing with users that explains how to find items in the archive and how to move them back into the mailbox and stamp them with an appropriate personal tag to prevent the Managed Folder Assistant from moving them back into the archive again.

10. Take a well-earned rest and prepare for the next group of users.

Be Prepared

Archive mailboxes are a great new feature in Exchange 2010—but like so many new features, the mere fact that technology is available doesn't mean you can simply deploy it with your brain in neutral. Some thought and careful planning will ensure that both you and your users survive the introduction of archive mailboxes with just a few scars.