Effectively managing Microsoft Exchange Server data is no easy job. Striking a balance between user demands and Exchange performance and stability has never been fun, but these days, it's a must. Email has become a critical business application, current regulatory demands are putting administrators in the hot seat, and you've gone from spinning plates to juggling knives. To get a grip on your Exchange data, you need a multidisciplinary approach that combines clearly defined policies and appropriate technologies (e.g., storage hardware, monitoring and reporting tools, data-management applications). Where do you begin?

First, let me clarify what I mean by effective data management. I define it as the practice of securely handling stored Exchange data in a way that optimizes the data's storage while providing adequate access to the data. That said, the best way to get started is to examine the financial, technical, and regulatory constraints that apply to your organization. These factors will influence your options for managing the Exchange data that resides in storage groups (SGs), databases, and user mailboxes (including offline folder stores—OSTs—and personal folder stores—PSTs) while providing efficient backup, recovery, and archiving capabilities.

Juggling Constraints
Anyone who manages Exchange data is used to balancing sometimes contradictory demands. When you're looking for data-management solutions, three considerations will come into play. You'll need to look at the costs of your various options, the technical limitations of the options, and the types of regulatory requirements that apply to your company.

Financial constraints. As messaging traffic continues to grow (in terms of both volume and size) and as more businesses decide to retain messages in online Exchange databases or verifiable offline stores, data storage requirements—and the associated costs—increase. Financial considerations involve more than the cost of extra disks to support larger databases. You must also pay for the required storage infrastructure (e.g., additional storage arrays, backup devices, Storage Area Networks—SANs) and personnel to manage increased data volumes. These costs vary based on the size and nature of your organization. Small organizations of several hundred users might get by with simply increasing storage to satisfy demand, but larger organizations of several thousand users might incur significant costs.

If demand for storage outstrips your capacity to pay for additional resources, you might consider implementing stricter backup policies that retain less unnecessary data, or archiving solutions that help optimize your retention policies. The costs associated with these approaches can often be significantly less than the costs incurred to add more storage in a frantic attempt to satisfy demand.

Technical constraints. Even when your company is able and willing to throw money at data storage, unchecked data growth impairs your ability to maintain effective backup and recovery solutions. Even though tape technology (for example) continues to improve, increased data volumes take longer to back up and restore. Thus, the attempt to meet one need (e.g., easy access to data) can reduce your ability to meet other needs (e.g., quick recovery).

Look for a technical solution that balances these requirements. Such balanced solutions combine the management of online Exchange data with nearline or offline archiving solutions. This approach is appealing because it lets you cap the growth of your primary Exchange storage subsystems and archive critical data according to agreed policy limits, but still keeps that data within easy reach.

Regulatory constraints. Many organizations have implemented policies that mandate the archiving of email communications. In companies that implement archiving solutions to meet internal standards (rather than external requirements), adherence to these policies is largely a matter of internal corporate governance. Organizations that are regulated by external agencies have much stricter requirements for archiving—or more accurately, compliance (a topic that I will discuss in more detail in an upcoming article). A solid regulatory-compliance system intercepts all email that enters, leaves, or circulates within your organization. When you implement such a system, you can guarantee the archiving of all the data in your environment, wherever it may ultimately reside (e.g., in PSTs, on mobile devices).

Once you know which constraints apply to your organization, you can begin to determine the types of guidelines that you need to place on the data that resides in your Exchange server database files, Microsoft Outlook cache files (i.e., OSTs), and PSTs. You'll also be able to decide which backup, recovery, and archiving solutions will work best in your environment.

Dealing with Server–Based Data
Exchange stores email data in databases on the Exchange server (or servers). These databases are arguably the best repository for email content, not least because of the single-instance storage mechanism that exists within each database (though not between databases). In general, data located on the server is more accessible than data in PSTs, at least from a management perspective. Shared information is best stored in Exchange public-folder databases. An Exchange Server 2003 or Exchange 2000 Server machine can contain as many as five databases within an SG and can hold as many as four SGs; thus, one server can contain as many as 20 databases. Established best practice advises against letting your databases exceed 40GB each so that backups—and more importantly, restores—can occur within acceptable time limits.

Storage capabilities determine the maximum number of active users that can be served by a single Exchange system. Exchange storage subsystems must be capable of dealing with the I/O load that the user population will place on the system. The Microsoft guide "Optimizing Storage for Exchange 2003" (http://www.microsoft.com/technet/prodtechnol/exchange/2003/library/optimizestorage.mspx) suggests that you implement a subsystem that can provide an average of about 0.75 I/Os per second per active user. For most subsystems—even those on high-end SAN platforms—this guideline dictates a practical maximum of just less than 4000 active users per server.

You have to keep the guidelines for database size and user limits, along with other performance factors (e.g., transaction log volumes), in balance when sizing your servers, allocating storage, and setting mailbox limits. Figure 1 shows a typical spreadsheet tool that I use to calculate storage requirements. For example, a disk quota of 200MB is achievable for a server that hosts approximately 4000 mailboxes.

Aside from using mailbox quotas (which you can apply on a per-database as well as on a per-user basis) to manage Exchange server–based data, you can use Group Policy to configure the Exchange Mailbox Manager to detect and delete old or large messages from users' mailboxes. This approach can help keep mailbox sizes in check before users run up against the dreaded mailbox quota exceeded message. If you're worried about accidental deletions, Exchange's Deleted Items Recovery feature, when enabled, lets users recover messages even after the Deleted Items folder has been emptied. You can use this great administrative feature to combat user errors that otherwise would result in costly restore operations, but be careful—it can increase the size of your databases. I've seen empirical evidence suggesting that a deleted-items retention period of just 7 days can cause a database bloat of 10 to 30 percent.

Managing User-Maintained Data
The data that users store in OST or PST files on desktop or laptop PCs is the most troublesome type of Exchange data to manage because it's distributed and often inaccessible (from an administrative perspective). OST files are a lesser problem because they're simply slave copies of Exchange server mailbox content. In Microsoft Office Outlook 2003 Cached Exchange Mode, the OST is a complete replica of the online Exchange mailbox, whereas in non–Cached Exchange Mode (or in earlier Outlook versions), the local OST contains a subset of the server mailbox data.

PSTs are a different story. Mailbox quota restrictions often force users to store important email data in PST files, but these files are usually large (several hundred megabytes or greater) and—when stored locally—typically are excluded from local hard-disk backup procedures, if any even exist. Users often place PST files on network shares, which is certainly better than keeping them on hard disks. But though backups are simpler when dealing with network share–based PST files, unchecked PST growth can still be problematic. As a PST's size increases, so does its chance of corruption, which can be irrepairable. In reality, little benefit is to be had from moving data to a network share–based PST versus keeping the content in a user's mailbox. Furthermore, PSTs are inherently unsecure. You can encrypt PST files, but decrypting utilities are well known and widely available. If users are storing sensitive corporate data in PSTs on, say, laptops, that data is at risk if the laptop is lost or stolen. Even data held in PSTs on network shares must be adequately protected from unauthorized access. Furthermore, if you have legal requirements for archiving or retention, unmanaged PST files can get you in a lot of trouble.

Better Backup and Restore
Your choice of backup—and more importantly, restore—solutions will depend on the amount of data that you need to process and the speed with which this processing must occur. For server-based data, many enterprise deployments implement procedures that allow for the databases to be restored within 1 hour (your specific Service Level Agreements—SLAs—might provide for variance from this figure). For example, to meet the goal of a 1-hour restoration of 40GB of server data, a tape device must provide restore rates (not just backup rates) of no less than 10MBps. Many backup solutions now involve intermediate backup to disk before eventually streaming off to tape, so initial backup rates (i.e., the rate of the backup-to-disk portion) and restore rates can often be significantly higher than backup-and-restore traditionally associated with tape only.

SAN-based solutions often offer high tape-restore transfer rates; figures in the region of 100GB to 140GB per hour aren't uncommon. Such capability might influence the size limits that you assign to databases. The ability to backup and restore larger volumes of data faster means that you can implement larger databases, which in turn can mean either increased mailbox quotas for users or more users per server.

Windows Server 2003 provides support for Volume Shadow Copy Services (VSS) which in conjunction with Exchange 2003 offers the capability to take a consistent snapshot of an Exchange database in a matter of seconds. Note that the snapshot is merely a point-in-time view of the disk map for the original database file, so if the physical volumes on which the database resides becomes unavailable, the snapshot is effectively useless (although many vendors attempt to insulate systems from this problem). Therefore, even though databases can be "snapped" in seconds, the snap volume must still be streamed off to some storage medium, typically tape. Accordingly, however, the snapped volume can be restored in a matter of seconds as well. VSS-aware storage subsystems and backup and restore solutions can dramatically influence your data-management framework, but be sure you carefully research and test them before putting them into production.

Exchange 2003 (especially Service Pack 1—SP1) introduces new functionality in the form of the Recovery Storage Group (RSG). The concept is straightforward: If a database from a particular SG becomes unavailable to users and must be restored from backup media, an empty recovery database is made available to users homed in the affected database while that database is being restored from backup. Although none of the users' existing messages will be available during this restore period, the ability to send and receive email is maintained. When the restore is complete, the recovery database (which is now populated with new content) can be merged with the restored database. When properly worked into disaster recovery and restore plans, the RSG concept can positively influence SLAs and maximum database sizes. And SP1's Recover Mailbox Data Wizard simplifies the merging of the restored data with newly created data.

Backing up user-maintained data, such as PSTs, presents greater challenges, as I mentioned earlier. Backups of PSTs on local hard disks are almost impossible to enforce or control because they rely almost solely on the user. PSTs on network shares can be backed up centrally but still seem to offer little advantage over large mailboxes in the Exchange database.

All About Archiving
Strictly speaking, archiving solutions differ from regulatory-compliance solutions in the following ways:

  • Archiving is often user-initiated, in that a user arbitrarily decides to archive an object from his or her Exchange mailbox to an archive store.
  • Arbitrary archiving is often complemented by policy-based archiving of expired content to archive stores.
  • Archiving solutions typically don't guarantee that all messages that are created or sent within a system or that pass through an ingress or egress point will be written to an archive store.

You might be aware that Outlook provides a rudimentary form of archiving whereby the user can configure Outlook to move messages older than a defined age to a PST file. However this approach just moves the data around rather than delivering it to dedicated, protected archive stores, so Outlook archiving isn't a serious contender.

More sophisticated solutions, such as VERITAS Software's KVS Enterprise Vault, can provide user-initiated and policy-based archiving to a second-tier (or higher) data location. Solutions such as these are effective because they can retain a message stub in the user's Exchange mailbox while moving sizeable attachments or message content to the archive store. If a user wants to review archived content, it's often accessible merely by clicking the message stub, at which point the archived content is retrieved. Thus, Exchange storage consumption is optimized while large content is offloaded to a system more suitable for bulk storage.

This type of archiving solution is often integrated with Exchange's Journaling feature to intercept and trap all messages circulating within an Exchange environment. But when large volumes of traffic are expected or when regulatory-compliance issues dominate, even archiving systems that integrate with Exchange Journaling (which might not provide the non-rewritable, non-erasable storage environment that most regulations stipulate) must integrate with or be replaced by more advanced technologies.

Examples of this form of technology include EMC Centera as well as HP's Reference Information Storage System (RISS). These types of solutions let you store static content, in a non-modifiable format, on disk and usually implement RAID-like technologies to guarantee data integrity and content authentication by means of digital signatures and time stamping. Typically, these solutions implement sophisticated Hierarchical Storage Management (HSM) systems, in addition to providing content indexing and retrieval. When you're dealing with regulatory compliance, HSM functionality is important because of the huge volume of email that can quickly mount up, especially in larger organizations. The average user sends 20 emails per day at an average size of 25KB per message. In an organization of 10,000 users, this estimate correlates to a total of 200,000 messages per day—4.7GB of content per day or 1.7TB per year. If you also need to archive inbound messages, storage requirements can grow significantly. Of course, these are average figures, but I'm aware of one organization with 9400 users that receives between 120GB and 150GB of email per month.

Many organizations choose to implement an archiving solution as a first step when migrating from one Exchange version or organization to another. This technique reduces the amount of data that must be migrated and can speed up the migration process.

Get Your Act Together
You can no longer ignore the importance of managing Exchange data, especially as email traffic and message size continue to grow and as regulatory-compliance requirements become more commonplace. Users will continue to demand that you retain more data—yet leave it at their disposal—and that you maintain fast recovery times and as little downtime as possible. As an administrator, you must try to meet these demands while operating under your organization's financial, technical, and regulatory constraints. Fortunately, you have many options at your disposal: mailbox quotas, storage technologies, and archiving solutions. For more information and ideas about your options and how to evaluate them, see the "Learning Path" on page 62, as well as the Web-exclusive sidebars "Putting Exchange Data Management in Context" (http://www.windowsitpro.com, InstantDoc ID 45625) and "Data Management Challenge: How Did We Get Here?" (InstantDoc ID 45624).