Message stats give you a snapshot of email usage
To aid in migration planning and Exchange Server infrastructure design, you'll find it helpful to know something about how your users use Exchange. In "Sizing an Exchange 2003 System," July 2005, InstantDoc ID 46333, I discuss techniques and concepts related to classifying users—specifically, categorizing their email usage as light, medium, or heavy. One way to start classifying users is to determine per-user statistics, such as how many messages have been sent and received or the total number of bytes sent and received. Exchange's message-tracking log files are an excellent source of such information. Let's examine the basic format of the tracking logs, what to look for in the logs, and how to interpret their information to classify users' email usage.
Tracking Log Basics
After you've enabled Exchange's message-tracking feature, whenever a message is sent or received by an Exchange component such as the Information Store, Message Transfer Agent (MTA), a gateway, or a connector, Exchange records an event in the tracking log for the day the event occurred. A tracking log incorporates 15 to 20 fields of information (depending on the version of Exchange you're using) that include an event code to identify what action occurred as well as information such as the sender, the recipients, message size, and a message ID. The log fields and event codes have evolved with each version of Exchange. TechNet provides several articles that explain the log format and event codes for each version. For a list of these articles, see Additional Resources at the end of this article.
The tracking logs are tab-delimited text files that you can easily import into Microsoft Excel or WordPad for viewing. The logs are usually too large and contain too much data for you to completely evaluate them as an Excel or WordPad file. For example, when you load a log in Excel, it usually informs you that the entire log couldn't be loaded; without the complete log, you can't perform a full analysis. However, such applications can be useful tools for deciphering the log and field formatting and performing cursory evaluations. To assess a large amount of data, you'll need to develop a script to extract the information, or you can load the files into a database and use queries to extract what you need. Alternatively, you could use a third-party reporting suite, such as those by Quest Software or PROMODAG, to help you assess the tracking logs.
The sample Exchange Server 5.5 and Exchange 2000 Server tracking-log entries in Figures 1 and 2, respectively, have tab-delimited data fields. I've added arrows to show where the tabs are, to help you distinguish where one field ends and another starts. Additionally, I've wrapped the log entries to make them easier to read.
Multiple Exchange components can generate log entries. What you might consider a single action—such as Ben sending Andrew a message—is actually a series of separately logged events. You'll need to use data in each log entry, such as the Message ID (Exchange 5.5) or Linked Message ID (Exchange 2000), to correlate the entries to one another. Another thing to consider is that the logs are server-specific and record only the events that occurred locally to the server. If Ben and Andrew are on different servers, the log on Ben's server will record the message submission and an SMTP or MTA "transfer-out" event. Andrew's server will record the SMTP or MTA "transfer-in" and message-delivery events.
A final key point about tracking-log formats is how recipients are recorded in each event. Exchange 5.5 uses a rather messy and somewhat hard-to-understand format in which each log entry spans multiple lines. The first line of each event records fields 1 through 12 (as described in the Microsoft article "XADM: Tracking Log Field Descriptions" at http://www.support.microsoft.com/?kbid=173280). Fields 13 to 15 are recorded on one or more successive lines—one for each recipient. For the event in Figure 1, the message had three recipients, so four lines are recorded in the log file. Notice the lines in Figure 1 that start with /o=HP. Individual Exchange Server 2003 and Exchange 2000 events don't span multiple lines in the tracking log, but each recipient has an entry. In Figure 2, you can see two complete entries for the two recipients. The advantage of the Exchange 5.5 log format is that data such as sender and message size is recorded only once; the disadvantage is that record by record, the log is much harder to parse than an Exchange 2003 or Exchange 2000 log.
Depending on your Exchange version, between 30 and 50 possible events can be logged. Most of the events are useful for diagnostics; I usually use only two or three in classifying usage. For Exchange 5.5, use event 4—message submission, event 9—message delivery, and event 1000—local delivery; for Exchange 2003 and Exchange 2000, you need evaluate only events 1027—Message submitted to Store Driver and 1028—Message Delivery to Local Store.
When a message is submitted, you have two basic routing choices: Transfer the message to another (remote) server, or deliver it locally. For Exchange 5.5, the event that's recorded in the log file will vary depending on the mix of local and remote recipients. If all recipients are local, only event 1000 is logged. This single event represents both the message's submission and its delivery. If any recipients are remote, event 4 is logged. If a message has both local and remote recipients, events 4 and 1000 are logged and each event records the recipients associated with its respective delivery type. For example, if a message has nine recipients, with four local and five remote, four recipients are logged with event 1000 and five are logged with event 4.
To determine the number of message submissions, you must locate all the instances of events 4 and 1000. However, simply counting all the event 4 and 1000 log entries doesn't reflect the true number of messages submitted. Because some messages will have both local and remote recipients, you'll need to pair the events that correspond to the same message so that you don't double count. You accomplish this by looking at the message ID in field 1, which is always unique. For example, Figure 3 shows three unique message ID entries (the message IDs are highlighted) and one duplicate message ID, which means that only three messages were sent although four log entries are listed. (The figure shows only the first line of each log entry.)
To gather statistics on a per-user basis, which you need to do to classify individual users, you must also look at field 7 (highlighted in bold), which records the message originator. Figure 3 shows that there are two unique senders, Ben and Andrew. The message IDs are unique for Andrew's messages, so his sent-message count is 2. Because we see events 4 and 1000 but only a single message ID for the message Ben sent, his sent count is 1, not 2.
When logging a message submission, Exchange 2003 and Exchange 2000 don't distinguish between local and remote recipients. These Exchange versions will log an event 1027 entry for each recipient, as Figure 4 shows. This makes the task of counting message submissions much easier because you need look for only one event. Because an event is logged for each recipient, you still have to eliminate duplicates by using the unique message ID to link the events that represent the same message. In Exchange 2003 and Exchange 2000, tracking logs have two message ID fields: field 10, MSGID, and field 18, Linked-MSGID. The field 10 value is generated by the particular component that's writing to the log and varies from entry to entry. For example, MSGID has one value when the Information Store writes to the log and another when the Categorizer writes to the log, although they're processing the same message. You use field 10 when you need to associate all the events for a particular component. The field 18 Linked-MSGID values are those you'll especially want to use because they link all the corresponding entries in the log. The field 18 entry is the same as the Message ID that you can view in Outlook on a Message's general Properties page.
The process for determining the number of received messages is similar to determining the number of sent messages. As I mentioned earlier, Exchange 5.5 uses event 1000 to signify both message submission and delivery when the sender and recipient are on the same server. When the sender and recipient are on different servers, event 9 is logged on the recipient server. To determine the received-message counts, locate all the 9 and 1000 events. You don't need to use the message ID to eliminate duplicates because you'll have only one of these two events on a server for any single message. To gather per-user statistics, use field 13, which logs the recipient.
Each of these events could record a delivery to multiple recipients, so you need to apply the event to the received-message count for each user listed. For example, the event in Figure 1 records a message delivery and in field 13 lists three recipients: Emily, Andrew, and Jake. The received count for each recipient is incremented by one, although only one delivery event is recorded. Determining the per-user received counts in Exchange 2003 and Exchange 2000 is much easier because each log entry applies only to one recipient. Locate all the 1028 events in the log and examine field 8, Recipient-Address, to determine which recipient's count to increment.
Other statistics that can be useful in classifying usage are the total bytes for messages sent and received. The message size is found in field 9 in Exchange 5.5 and field 13 in Exchange 2003 and Exchange 2000. After you've identified a sent or received message, you can use the message size from the event and calculate a running total and develop averages. In cases when multiple events are recorded, it doesn't matter which event's message size you use—the same value is recorded. But be sure to apply only one of the events toward the total; don't count them all!
In Exchange 2003 and Exchange 2000, the message size reported in the message-tracking logs is exactly the same as the message size reported by the Information Store for a particular message. For example, if the Information Store reports that a message is 4233 bytes, the tracking log records 4233 bytes in field 13. Unfortunately, the size of a message in the Exchange 5.5 Information Store and its size as recorded in the tracking log are different. For example, if the Information Store reports a message's size as 2316 bytes, the tracking logs might record 3397 as the size in field 9. The difference is because of the overhead associated with transporting the message and is affected by the message's formatting and content. By my estimates, the Exchange 5.5 tracking log reports sizes that are 30 to 46 percent larger than the message's size in the Information Store. Bear this discrepancy in mind if you're using an Exchange 5.5 tracking log to estimate message size.
Assessing the Results
After you've gathered per-user statistics, you can start to classify an individual's usage. The number of messages and number of bytes sent are the metrics to start with because they directly correlate to the user's own actions. For example, a person chooses to send a message but typically has little to no control over how many messages he or she receives. A person who receives a large number of messages might not be a power user but simply someone who gets a lot of spam. The sent-message numbers can also be compared to published metrics such as the Messaging API (MAPI) Messaging Benchmark (MMB3). I haven't found any good statistics that correlate the number of messages received or the size of those messages to a classification such as high, medium, or low, but you shouldn't discount the received counts in an evaluation because these messages have an impact on system resources.
The end goal of your analysis is to determine how many accounts a server can host. You don't want to merely assume that all your users are high-demand, then order enough servers to host all high-demand users. You'll end up spending too much money on hardware that will be extremely underutilized. If you underestimate usage, the servers will be overloaded and users will complain constantly about poor performance. Estimating the perfect mix is difficult to do because what's considered an average usage is highly variable and subjective. To compound the problem, the average usually changes over time as system capabilities change and users change the way they work and use those systems.
Using the MMB3 benchmarks as a starting point, a medium-usage person sends about 9MB of data and 48 to 78 messages per day. For a system I recently evaluated that had around 5400 users, I found that about 3 percent were in the high-usage category, 8 percent were medium-usage, and the remaining 89 percent were considered low-usage according to the per-day sent-messages counts. This particular site received about three times as much mail as it sent, so in this case using only the sent count didn't produce an accurate usage classification. Assuming that sending and receiving messages place about the same load on system resources, I combined the send and received counts by using a combining function in which N (the normalized total) equals the square root of ( ( S2 + R2 ) / 2 ) where S is the sent count and R is the received count. Still using 48 and 78 as my classification boundaries, using the newly calculated message count shifted the percentages of high, medium, and low usage to 12 percent, 19 percent, and 69 percent, respectively.
Get the Facts
As I discuss in "Sizing an Exchange 2003 System," when scaling a system you must consider more issues than just how many messages are sent and received. You also need to determine how people are using the system. If the Exchange system will serve as a filing cabinet for frequently accessed files and data, the load on the system will be greater than simply the number of messages or bytes sent and received.
You can gather some of these metrics from logs and performance counters, but others will require you to survey and observe the people who use the system. Classification is part science and part subjectivity, but gathering real data gets you a little closer to understanding actual usage, eliminates guesswork, and lets you build better systems.
"XADM: Tracking Log Field Descriptions"
"XADM: Exchange 5.0 and 5.5 Tracking Log Event Numbers" http://www.support.microsoft.com/?kbid=173364
"XADM: Message Tracking Logs Field Descriptions in Exchange 2000 Server" http://www.support.microsoft.com/?kbid=246965
"Tracking log event numbers for Exchange 2000 Server" http://www.support.microsoft.com/?kbid=311739
"Tracking Log Event Numbers for Exchange Server 2003" http://www.support.microsoft.com/?kbid=822930