Understanding and Using LoadSim 5.0

Over the past several months, I've examined many aspects of LoadSim. In the first installment of this series, I presented basic LoadSim topics, such as the theory behind LoadSim and the approach you will use to test Exchange servers with it. In part 2, I discussed LoadSim installation and configuration. In part 3, I showed you how to customize LoadSim settings for your specific user profile requirements. This month, I'll show you how to collect and analyze data, and how to make sense of the work you're going to do with LoadSim.

In part 2, I proposed a sample scenario for walking through LoadSim installation and configuration, and I'll continue with that scenario in this article. I assumed two major requirements: First, I wanted to ensure that one server sufficiently handles 300 users that closely match LoadSim's Medium user profile. Second, I wanted response times of 1 second or less.

Saving Your Work and LoadSim Settings
When you're finished configuring LoadSim as I was at the end of part 3, be sure to save your work. From the File menu, select Save, and give the file a name. LoadSim has a few different support files, which I describe in Table 1, page 172. For this example, name the file lstest.sim, and save the file in your LoadSim home directory for later use.

Initializing the Exchange Database
Now that you have fully configured LoadSim and have names in the Exchange Server Directory, you need to initialize the Exchange database for the test. In part 2, when I created the LoadSim topology, the Exchange Directory was populated with LoadSim users in a recipient container called LoadSim. At this point, these imported users have nothing in their mailboxes; they need to have some data before you start a test run. Select Run, Initialize Test from the LoadSim main menu to begin populating Microsoft Exchange's Information Store with data. This action simulates having mail in each mailbox as though the LoadSim users had been using Exchange all along.

For the Medium user profile, shown in Table 2, page 174, the LoadSim users read existing mail 15 times a day. The Initialize Test process creates that existing mail. Also, having some data in the database doesn't give the Exchange database engine an unfair performance advantage of dealing with an empty database.

Running Initialize Test adds data to the private store (priv.edb). If you plan to run LoadSim on a live database, note that this approach adds data to your live database for the LoadSim users that are in the LoadSim recipients container. You can delete the data when you finish your test. If you are running multiple tests so you can compare the results from each test, I suggest you start with the database at the same state each time. In other words, don't let the messages pile up in the LoadSim users' mailboxes; always start your test with the same number of messages in the mailboxes.

When you run Initialize Test, the LoadSim client begins methodically stuffing messages into mailboxes on the server according to the parameters specified in the Initialization tab in the Customize Test property sheet, which Screen 1 shows. This property sheet appears when you select Configuration, Test Properties from the LoadSim main menu and click Customize.

You also use Initialize Test to populate the public store (pub.edb) with messages, just as you do the private store. When you run the Initialize Test, it will ask whether you also want to initialize public folders. This procedure worked differently in LoadSim 4.0, in which you had an explicit Public Folder Initialization process. With LoadSim 5.0, initializing public folders is part of Initialize Test.

Now you're ready to run Initialize Test. Open the Run menu, and click Initialize Test. LoadSim starts initializing the database. You can run this task on one LoadSim client, walk away, and come back when it's finished. You'll then be ready to run the real tests and start gathering data.

At the Medium user setting, Initialize Test increases the priv.edb database by about 2MB for each user. So for 300 users, expect to end up with a 600MB database. The Light user setting is about 500KB for each user. The amount of time Initialize Test takes to complete depends on the user profile you've chosen, the performance of your LoadSim client and Exchange Server machines, and the number of mailboxes LoadSim has to populate.

Running the Test
Now you're ready to run the test. Open the Run menu and select Run Simulation. LoadSim will start churning away as processes start up and LoadSim users log on. Don't worry if LoadSim appears to be doing nothing. Other than an increasing number of users, you won't see any activity on the screen unless LoadSim needs to create Exchange profiles for its users. After all the LoadSim users are logged on, LoadSim has created all the profiles, and LoadSim has started its work, you'll see a screen like Screen 2, page 175.

You'll want to let the test run for a fixed amount of time. If you want reliable scores, let LoadSim run for at least 4 to 5 hours: 1 hour to warm up (reach steady state) and 3 to 4 hours to collect data. Then you can use the LSLOG utility to parse the LoadSim logs.

Analyzing the LoadSim Score with LSLOG
When your LoadSim run finishes, you need to analyze the data that LoadSim gathered to see how the Exchange server performed. The metric you will be looking at is the score, or response time in milliseconds (ms).

LSLOG is a command-line utility to parse the LoadSim logs (.log files) and give you the performance scores for the run. The utility, lslog.exe, comes with LoadSim as one of the program executables. See the Sidebar, "Command Syntax for LSLOG 5.0," page 176, for details.

By default, when you run LSLOG on a LoadSim log, you'll see the 95th percentile score. So, if you are simulating 100 users and 1100 is the LSLOG score, 95 of the simulated users are experiencing an average response time of 1.1 seconds or less. You must use LSLOG to derive the score to use for any data analysis.

Obtaining a set of scores is useful if you want to see how fast Exchange Server is servicing client requests. Furthermore, if you impose performance goals of subsecond client response time, as I did in my example scenario, you expect the score to be 1000 or less during a LoadSim run. Screen 3 shows typical LSLOG output.

To get this output I performed two steps: First, I ran my LoadSim log file through LSLOG's truncate process. From the LoadSim home directory command prompt (assuming lslog.log is in the same directory), I typed

LSLOG truncate lslog.log > lslog1-4.log

LoadSim takes about an hour to reach steady state, so I want to remove the first hour of data and only analyze data from the second through the fourth hours. Conveniently, the truncate parameter defaults to the values I want to use (i.e., 1:00 and 4:00).

Second, I ran the resulting file, lslog1-4.log, through LSLOG's answer process, as shown with the command

LSLOG answer lslog1-4.log

Screen 3 represents the output for my test, which the LSLOG answer command provided. You can see from the Weighted Avg line at the bottom of the screen (identified by <--"score") that the 95th percentile score is 270. This score tells me that 95 percent of the users can expect to experience a response time of 0.27 seconds or less. Realistically, this result predicts that users will experience half-second or less response time for most operations.

This result is well below the subsecond requirement I imposed in the example criteria. I can conclude from the test that the Exchange server under test is providing adequate response time for the users. Assuming you don't find any weird NT Performance Monitor (Perfmon) data on the Exchange Server computer, you are ready to deploy 300 users on the server.

Screen 3 lists other scores, broken out for each activity type in LoadSim. In addition to the respective 95th percentile score, you can see the weight LoadSim gives each task and the number of times the task occurred (Hits). Note that the final average is a weighted average based on the Weight column in Screen 3, not a mean average.

About LoadSim Data
As I mentioned earlier, LoadSim takes about an hour to reach steady state. If you throw away the first hour (time 00:00 through 00:59, also referred to as hour 0) and use the second through fourth hours (time 01:00 through 03:59), the scores are more realistic. LSLOG, the LoadSim log analysis program, makes this process easy. It provides options for preparing the data in the .log files. Otherwise, you'd have to analyze the .log file by hand.

Even if you follow this advice, the score will have some margin of error. Typically, the longer you let LoadSim run, the more accurate the score is. But if you are considering very long runs (more than 12 hours) to solve all your problems, you need to know that as LoadSim runs for long periods, conditions can degrade. For example, if you let a test run for days, you will see fluctuations in the response-time trends. Longer is not necessarily better when it comes to LoadSim runs; take the first 4 to 5 hours of data to get a good idea of how the system will run.

Identical runs don't always produce identical scores, but they are usually within 100ms. Often, they are within 50ms. The theoretical minimum score (depending on the Exchange Server hardware) is about 150ms. Any score less than 150 is meaningless.

Comparing Results from LoadSim 4.0 and LoadSim 5.0
I don't recommend that you compare the results obtained from LoadSim 4.0 with LoadSim 5.0. If you perform identical runs on identical LoadSim clients on identical Exchange servers, the scores produced by LoadSim 4.0 and LoadSim 5.0 will probably be different. Usually, the LoadSim 5.0 scores will be lower.

This difference in score does not invalidate any of your tests with LoadSim 4.0, because you're looking at response time trends, not absolute response time. Rather, changes in the LoadSim 5.0 engine make it more efficient and more accurate in the load it imparts on Exchange. Compare LoadSim 4.0 scores to other LoadSim 4.0 scores and LoadSim 5.0 scores to other LoadSim 5.0 scores.

Gathering Useful Data
You can gather useful data with LoadSim in different ways. I'll talk about two methods: using scores from multiple clients and using a dedicated monitor client.

You can run multiple, identically configured LoadSim clients to ensure that each LoadSim client is producing accurate scores. You can then average the scores for accuracy.

Using the example of testing 300 users and based on the guidelines in part 2 of this series (February 1998), you'll need three LoadSim clients. Each machine is a 100MHz Pentium with at least 48MB of RAM and a fast disk. Each LoadSim client can easily handle 100 Medium users, and the score reflected is reliable because the LoadSim client is not bottlenecked. Run LSLOG, and take the average score of the LoadSim clients to produce the final score for the run. If the tests have been run correctly, each score will be within 50 to 100 points of the average.

This technique is probably the best, but it requires a lot of hardware resources, especially if you intend to simulate many clients. Keeping track of all the .log files as the number of LoadSim clients increases is difficult. However, the scores this technique produces are probably as accurate as possible.

Another technique for gathering accurate LoadSim scores is to dedicate a LoadSim client as a score machine. The key is to ensure that the dedicated LoadSim client is underloaded. This machine is always assured of having a fair score regardless of whether the other LoadSim clients are overloaded.

Although a heavily loaded LoadSim client might not produce an accurate score, it still produces an accurate load on the server. You probably can't simulate 500 Medium users on one LoadSim client, but you can squeeze more users out of a LoadSim client if you don't care about the score the client produces.

For example, in the scenario with 300 Medium users, simulate 250 of the 300 users on two slightly less powerful machines--for example, 66MHz Pentium machines with at least 32MB of RAM and a fast disk. You don't have to configure the machines the same way. Simulate 125 users for each LoadSim client rather than 100. Then, with a separate LoadSim client--for example, a 100MHz Pentium with 64MB of RAM and a fast disk--simulate the remaining 50 users. This machine is the dedicated monitor client. Use the score from this monitor client, and disregard the scores from the other two machines.

Using a dedicated monitor client is helpful if you don't have enough hardware to configure multiple identical high-performance LoadSim clients. The LoadSim clients that produce the bulk of the load don't have to be identical, and you need only one high-performance machine to get the score from. You won't get the benefit of averaging scores, but scores compared from identical runs should be within 100 points of each other.

Analyzing Data
After you've gathered your LoadSim data, you will need to analyze it to see what was going on inside Exchange Server. Let me offer some suggestions for analyzing the data.

Plot a users-per-server graph. The most useful piece of information that you can glean from this simulation exercise is the number of users a server supports. You can use the techniques I outlined above to plot a graph with this information.

If you take the sample scenario one step further, you can see how much headroom the server has. You already know the example Exchange server yields a score of 270 for 300 Medium users. But how does it handle up to 800 Medium users? Reconfigure the test for 100 users, and repeat the process to get another data point. Repeat the test for 200 users, 400 users, and so on, until you test 800 users. Then plot the data points. The resulting graph might look like Graph 1.

You can get useful information at a glance from a users-per-server graph. In Graph 1, the current hardware on the Exchange server takes you to 600 users before it starts encroaching on the subsecond response time requirement. This information will help you anticipate the need for a server upgrade. And you have concrete data to characterize the performance capabilities of your server for this particular load.

Correlate scores with Perfmon data. The data points for a users-per-server graph take time to generate, but the resulting graph is an excellent picture of how the server performs. To augment this picture, I recommend gathering Perfmon data during each LoadSim run. But, do not run the Perfmon application on the Exchange server. Run Perfmon from another machine and collect the metrics from the Exchange server remotely. I recommend collecting data at an interval of 10 seconds or longer to keep the overhead on the server to a minimum. Remember, part of the goal is to disturb the system under test as little as possible.

Observe the Perfmon data from the LoadSim run to see what's going on in your server. I recommend you capture all Perfmon objects except Thread because of the overhead that collecting this object will impose on the server. You'll have all relevant counters at your disposal. Look at objects such as CPU utilization, paging, available memory, Exchange message queues, and disk utilization.

You can easily view a graph of the data with Perfmon, or you can export the data into Excel for detailed analysis. You can correlate the scores LoadSim produces with the Exchange server's Perfmon data from each run. Very quickly, you'll begin to spot trends for which subsystems in the server are beginning to get used up. At that point, you have excellent insight into what's going on with Exchange. Then you're well on your way to mastering the performance of your server.

Stay consistent. When you're conducting performance tests, stay consistent. If you select a method of gathering data, producing scores, or running tests, stay with it. Consistency is the most important aspect of getting meaningful performance data. Any variables you introduce into the process can create havoc and produce puzzling results. Remember, performance is not an exact science; some subjectivity is involved. If test results do not make sense, run the test again. Investing time to be sure of your results is better than proceeding on a false premise. If the results look good and they're repeatable, you are on the right track.

LoadSim Complete
This series covered a lot of ground. You learned about the philosophy behind LoadSim, how to use LoadSim, and how to collect and analyze data in a useful fashion. LoadSim veterans learned what's new in LoadSim 5.0. And I offered tips and advice about how to proceed with your own performance testing.

LoadSim is a powerful tool, but with that power comes a degree of complexity. If you are willing to invest the time and effort, the results can be useful.

Now you are ready to set up LoadSim and start running simulations. For information about where to find LoadSim, current releases, and protocol support, see the sidebar, "Updated Information on LoadSim." If you're looking for additional resources, refer to Microsoft's white paper that covers LoadSim and the concept of users per server, "MS Exchange Performance: Concurrent Users Per Server." This white paper was available with the May 1996 edition of Microsoft TechNet, and you can find it at http://www.microsoft.com/syspro/technet/boes/bo/mailexch/exch/technote/upswpfnl.htm. Another Microsoft white paper from Tech Ed '96, "How Many Users Can I Put On a Single Exchange Server?" is available from MSDN. It covers general Exchange Server performance issues. You can also refer to the Microsoft Knowledge Base article Q155417, "XADM: LoadSim, Microsoft Exchange Server Load Simulation Tool," which was revised April 29, 1997 (http://support.microsoft.com/support/kb/articles/q155/4/17.asp). Finally, LoadSim 5.5 comes with a LoadSim User Guide that explains concepts and methodology for using LoadSim 5.5.