Splitting the Information Store

GartnerGroup recently published a research note that discussed weaknesses in Microsoft Exchange Server's scalability to high-end systems. In GartnerGroup's opinion, managing the databases that form the Exchange Server 5.5 Information Store (IS)—especially when the databases grow very large—poses difficult operational problems. GartnerGroup's words carry a lot of weight in the industry, and the company's analysis of the situation is cogent.

Platinum, the next major functionality release of Exchange Server (due to appear in early 2000), contains many new advances that will address GartnerGroup's concerns. Platinum is dependent on Windows 2000 (Win2K) and will debut shortly after Win2K ships. This article reviews some of the design constraints that limit the number of mailboxes you might place on one Exchange server and discusses some changes that Platinum makes to help scale the product in a highly available manner. I base this article's discussion on Platinum's first public beta release.

Hardware Sprints Past Exchange Server
Recent hardware improvements have outstripped Exchange Server's capacity to scale beyond a certain point. Even major vendors' smallest servers can easily support hundreds of clients, and if you upgrade the server with an extra CPU, more memory, and an uprated disk I/O subsystem, the system can support thousands of clients. You can continue adding CPUs, memory, and disks—and even cluster two servers together—but most experienced Exchange Server designers don't recommend placing more than 3000 mailboxes on one server (or cluster). This general rule runs counter to the LoadSim test results that vendors report for hardware. Some of the latest 4-way and 8-way servers seem to support as many as 26,000 concurrent clients. What's to stop someone from building a production server to host a similar population?

The answer is simple: data. Three thousand mailboxes create a lot of data. Regardless of how well equipped the server is, the more mailboxes the server hosts, the more messages and attachments the server generates, and the larger the IS will be.

Exchange Server 5.5 divides the IS into a private store (priv.edb) and a public store (pub.edb). The private store holds all the user mailboxes and tends to be larger than the public store, especially on very large servers. (For information about how large a private store can grow, see the sidebar "How Large Will the Store Be?") To scale Exchange Server so that it can support tens of thousands of mailboxes on one server, Platinum moves away from the single large database model.

Platinum Does the Splits
Like Exchange Server 5.5, Platinum uses the Extensible Storage Engine (ESE) as the underlying database technology for the IS. The ESE operates at a very low (i.e., page) level. The code in the IS process (store.exe) performs all the data processing. Platinum extends this storage architecture through the concept of a Storage Group (SG). An SG is a set of databases that operates as a defined instance within the IS process. You can manage an SG as one entity, or you can deal with each database individually. By default, Platinum installs with one SG that contains the public and private ISs, just as you'd expect to see on an Exchange Server 5.5 system. The IS process controls all of a server's active SGs. Each SG has a separate set of transaction logs, so you can back up or restore each SG individually. This capability is the key feature that will provide Platinum with more resilience than Exchange Server 5.5 provides. Today, an IS problem such as a corrupt disk renders every user unable to work until you resolve the problem. In Platinum, you can divide mailboxes across SGs so that a problem affects only the users associated with a specific SG.

Figure 1 illustrates how SGs make Platinum more resilient than Exchange Server 5.5. Two SGs are active on the sample server. If a problem affects the disk holding the Fourth Private Store database, you must take only SG 2 offline to fix the problem. SG 1 remains active, and users whose mailboxes reside in any of the databases in SG 1 won't be aware that a problem has occurred. However, you need to take the complete SG offline even if a hardware problem affects only one database in the SG, because Platinum writes all the SGs transactions into one set of log files.

Because each SG has a separate set of transaction logs, Exchange Server administrators need to understand how to roll forward the transactions in log files to recover data. Experience shows that most administrators today err in recovery situations by unnecessarily deleting log files or by attempting to apply the wrong set of logs. You need to get up to speed with all the common operational procedures for dealing with the new store structure (e.g., performing backups, mounting and dismounting individual databases). You also need to thoroughly review and update your disaster-recovery procedures as part of your Platinum planning exercise.

Splitting the IS improves Exchange Server's flexibility. Exchange Server 5.5's IS service won't start if it detects a problem—even a minor problem—with the private store or the public store. An important service that won't start until everything is perfect is frustrating. Imagine if that service has to deal with 20 or 30 databases on a server. By contrast, if Platinum detects a problem with a database, the system considers that database offline and the IS service loads the next database. The problem affects only the mailboxes or public folders that reside in the offline database, so the entire system doesn't go down. After you fix the problem, you can bring the repaired database back online to restore full service.

Start Planning
Platinum supports a maximum of 15 SGs on one server. Each SG can control as many as 6 databases, so splitting the IS across 90 separate databases on one server is technically conceivable. However, until we know how to balance the demands of CPU, memory, and I/O to obtain maximum performance and resilience, I don't expect to see anyone rushing to create more than two SGs or more than three or four databases within each SG. Certainly, each loaded database requires memory to cache internal structures (e.g., indexes), and systems that try to load too many databases risk exhausting virtual memory.

Such performance data will take time to accumulate. To determine where Platinum's theoretical limits lie, hardware vendors and Microsoft need to test the product in laboratory conditions. Then, users must validate those results with real-world experience.

In the near future, even the largest server is unlikely to use more than 20 databases, so we need to come up with rules for planning Platinum deployments. You might want to pick a storage limit as the point at which you'll split the IS or consider dividing the user communities across separate SGs (e.g., place all marketing people together). In my experience, the average size of databases in production today on midsized to large servers (e.g., more than 500 mailboxes) is less than 20GB, so perhaps 20GB is a good starting point for creating a new private store. At 20GB, the new store is still in the default or first SG; therefore, creating a new SG at 20GB gives you maximum resilience. However, each SG increases the administrative complexity of the system, especially in terms of backup and restore operations, so I recommend 50GB as the threshold for creating a new SG.

Clusters are a special case. An SG becomes a cluster resource that you can switch between servers in a cluster. SGs will definitely be part of Platinum cluster administrators' day-to-day life.

Creating New Stores
Platinum replaces the Microsoft Exchange Administrator program with a set of Microsoft Management Console (MMC) snap-ins. The Exchange System Manager console, which Screen 1 shows, provides a packaged set of these snap-ins. This console is where you perform pure administrative Exchange Server activities such as creating a new database within an existing SG or setting up a new SG. You'll perform many common administrative activities that you currently perform through the Exchange Server 5.5 Administrator program (e.g., creating new mailboxes or distribution lists—DLs) through other MMC consoles that work with Win2K's Active Directory (AD), such as AD Users and Computers.

Creating a new SG is straightforward. In fact, Microsoft has simplified the operation to such an extent that unwary systems administrators can create new SGs or stores without going through the necessary planning process. You need to have a good reason to create a new store. For example, you might want to isolate important users into a store where they have increased mailbox quotas. I already discussed how to set a threshold at which you'll consider creating a new store. Obviously, if you create a new store after you reach a threshold, you need to move some mailboxes to the new store to prevent the old store from growing past the threshold. Additionally, Exchange servers that host several companies might want to create multiple storage groups. In Exchange Server 5.5, all the companies share a common IS, but Platinum's ability to provide each company with a separate SG creates an effective firewall between each company's information.

Each new SG and store makes demands on system resources. Databases take up hard disk space, and each SG creates a new set of transaction logs. Win2K uses memory to load each database when the IS service starts, and initial indications are that each database requires 10MB of physical RAM.

When you create a new SG, Platinum asks you to name the new SG and provide a location for the SGs transaction logs, as Screen 2 shows. Experienced administrators are already accustomed to identifying and isolating I/O sources on a server to obtain optimum performance. Best practice for Exchange Server 5.5 keeps the IS databases and their transaction logs on separate physical devices. In the same way, if you run multiple SGs on a Platinum server, you need to keep each SGs databases and logs well apart, and you need to keep each set of transaction logs on a separate volume. Servers that support several thousand mailboxes are probably equipped with multiple controllers and many physical disks, so you need to place files in such a manner that a failure on any individual disk won't affect multiple databases or sets of transaction logs. For best performance on high-end systems, you also need to place each SG on a separate array. These procedures might seem expensive and difficult to configure, but they're easier and cheaper than struggling to control the I/O that multiple SGs can generate or dealing with a disk failure that affects both a database and its associated transaction log.

After you set up the new SG, you can create the stores that the SG will manage. Screen 3 shows how to set the properties of a newly created private store. Note that each database has a separate maintenance schedule for activities such as background defragmentation. This scheduling ensures that a server isn't swamped with simultaneous maintenance tasks for multiple databases.

Screen 3 specifies two files for the new store. The first is the familiar .edb file that Exchange Server has used since version 4.0. The streaming database (i.e., vip mailboxes.stm) is a new Platinum feature that holds native MIME content that Internet clients such as Microsoft Outlook Express or Web browsers generate. Outlook Express can use IMAP4 or POP3 to access the IS, whereas Web browsers will use a new version of the Outlook Web Access (OWA) Microsoft Internet Information Server (IIS) application and access data over the HTTP-DAV protocol. The new streaming database replaces part of the work that the IS performs in Exchange Server 5.5, which uses the IMAIL engine (embedded in the IS) to handle protocol access for Internet clients and provide content to those clients in the format they require. Splitting up or partitioning the functionality of the IS is an important evolution for Exchange Server because the process lays the foundation for front-end/back-end configurations that can support tens of thousands of mailboxes on one virtual server.

Messaging API (MAPI) clients such as Outlook 98 or Outlook 2000 ignore the streaming database and use the .edb file. Exchange Server handles all the necessary processing to hide the interaction between the two databases when MAPI clients need to retrieve content that Internet clients store in the streaming database. The streaming database stores only content and can quickly access data that isn't suitable for storage in the Exchange database (EDB), which the database engine organizes into 4KB pages. In terms of processing, retrieving attachments such as the 25MB Phantom Menace trailer is costly because you have to fetch the video from multiple individual pages that the database might not store contiguously. To solve this problem, the streamed database lets clients access data in a continuous stream. Like Exchange Server 5.5, Platinum always stores message header data in the EDB.

Managing Users and Stores
When you first install Platinum, the system allocates all mailboxes to the default SG and single private store. After you create new private stores, either in the default SG or a new SG, you can move mailboxes between stores. You can place new mailboxes in any available private store, as Screen 4 shows.

To move a mailbox from one store to another, select the desired user account from the MMC console that manages AD users and computers. Right-clicking the user object reveals a set of operations that you can perform on the user object, and if the user has an Exchange Server mailbox, you'll see a number of Exchange-specific operations in the list, as Screen 5 shows.

The ability to move mailboxes so easily is great, but—as with creating new SGs and stores—the ease of the operation disguises a background complexity. You still need to think about mailbox placement and avoid scattering mailboxes across all available stores.

Multiple databases affect the single-instance storage model. If you send a message to three recipients and each recipient's mailbox resides in a different store, Exchange Server must create a separate copy of the message in each store. To retain the benefit of single-instance storage, consider minimizing the impact of multiple databases as much as possible. For example, you can keep people who exchange a lot of email in the same store. Assuming that everyone in a workgroup or department sends a lot of mail to one another, you can allocate complete workgroups or departments to a store. However, you could argue that this approach exposes people to a single point of failure because a store that experiences a failure affects everyone in the workgroup. If you split users across different stores, a failure won't affect everyone in a workgroup, but it will drive down the single-instance storage ratio. After Platinum is in production for a while, expect much debate as system designers try to determine the most effective approach to take.

Multiple Public Stores
Platinum permits multiple public stores. Public stores hold public folders, and until now, public folders have been less of a priority than mailboxes for most deployments. People tend to focus on smoothing email flow and workflow and sorting out associated add-on products (e.g., fax connectors, antivirus software) before they focus on how to effectively use public folders. Public folders are useful repositories for documents, and when you combine them with electronic forms, they can provide the basis for powerful applications. However, not many deployments have made use of these facilities. So, you probably won't see people rushing to implement multiple public stores. However, the facility exists for those who need it.

The Web Store
Platinum introduces a URL namespace throughout the IS. You can reference every object and folder with a URL. For example, you can reference a public folder using the format http://Exchange server name/exchange/foldername, and you can reference a mailbox using the format http://Exchange server name/.exchange/alias. Microsoft calls this feature the Web Store, a name that confirms the company's belief that the IS can act as a Web site repository—effectively replacing the NTFS directories that typically store Web content. Public folders are logical places to store Web content, but you probably don't want to encourage public Web access to folders in mailboxes. Interestingly, some early indications are that the IS performs better than NTFS when responding to HTTP requests for Web content. The IS uses advanced caching techniques to stream content and deliver it to clients. NTFS can't stream data, so it suffers when it needs to provide data formats that perform well when streamed (e.g., video, audio).

Third-Party Extensions
Most Exchange servers don't run only Exchange Server. An array of third-party products can extend Exchange Server's capabilities and add features. You need to update, validate, and test these products against Platinum to ensure that they'll work successfully after an upgrade and after you split the IS.

Splitting Isn't Simple
Considering the development of previous versions of Exchange Server, Platinum has substantial room for change before Microsoft delivers the final product in early 2000. But in its beta incarnation, Platinum marks a massive paradigm shift in Exchange Server technology. With this product, Microsoft wants to build one code base that can scale from the smallest email server to the largest, such as those that ISPs use. Splitting the IS is an important part of Microsoft's Exchange Server strategy and is one of the most fundamental changes that the Exchange Server community faces with Platinum's introduction. You'll need to devote substantial effort to careful consideration and planning before you split the store, but the fact that the capability now exists is exciting.