Does NT Scale?

NT scaleability is one of the biggest questions in the PC industry today. As PC business desktops and PC servers move into spaces formerly reserved for large-scale mainframes and UNIX boxes, answering whether or not NT can fill those extremely large shoes gains paramount importance.

Not only does Microsoft need to know if its brainchild fits the bill for enterprise deployment, but the customers need to know. In this issue of Windows NT Magazine you will find the first answers to the big question: Does NT scale?

Our initial data says: Yes, NT scales. What does this data mean? It means that as you add resources to an NT client/server system, such as CPUs, memory, faster network components, and more disk space, system performance increases. It means that as you add users and application load to your systems, the operating system doesn't choke.

Yes, your system configuration should match your user load. Now, you might say, "You just said that I can only get better performance if I throw a bigger-more expensive-machine at my users!" But, isn't that the definition of scaleability? If you need better performance, and need to support more users or heavier user loads, that you can build a bigger system and the OS will handle it?

NT is on the road to big things. It isn't all the way there yet-our tests show that it doesn't scale completely linearly, and it is bounded (response times do increase with user load, and you can, to a point, combat this by adding system resources)-but NT is moving up fast. So climb on board for the first big ride of the 21st century as we look at the operating system and the machines that will make it all happen.

ARTICLE
The burning question of the day is, "Does NT scale?" This is one of the most difficult questions to answer, because it calls a huge variety of issues into play, not the least of which is the nature of client/server computing itself.

Windows NT Magazine is going to answer these questions for you in a series of articles over the next six months (at least), in which we will review client/server issues such as networking, disk performance, system configuration, application configuration, user load, end-to-end performance testing, and more. We will fold in reviews of servers from companies such as Compaq, HP, IBM, NEC, Tricord, and many others to bring you performance data about server hardware scaleability, upgradeability, overall performance, and even clustering. Along the way we will discuss the tests we use, the metrics we record, our goals, and our findings, and then relate this information in a real-world fashion to what you can expect from the same boxes in your environment. You can then use this information to aid in your buying process, to tune your existing setups, or to make fundamental decisions about migrating to NT from other operating systems and what applications to use in your enterprise. The obvious answer is to buy a system that matches your needs, but when you need to scale and grow, it is a far more complicated issue to know what to buy.

Each of the above issues in client/server computing contains a myriad of others, each of which can affect your server's performance, network throughput, and user "happiness." Networking involves the wire you use, the NICs installed in the client systems and the servers, the protocols you run, the configuration you lay out (domains, workgroups, multiple segments, connection hardware, etc.), and the I/O capabilities of the system/NIC relationship. Disk performance covers how many and what kind of drives you use, what your data set size is, what the disk transaction mix is (reads vs. writes, random vs. sequential), what controllers are installed and how many there are, disk caching, RAID configurations, and system/disk subsystem relationships. The system configuration includes things such as number of CPUs and amount of memory, PCI and system bus architectures, amounts and types of CPU cache (Level 1, Level 2, and Level 3), types of components used, and so forth-plus, you need to find a happy medium for your situation, because what is optimal for one application is not necessarily applicable to another, even on the same physical box. Application configuration encompasses everything from SQL Server optimization to application serving to what and how many applications can be run from the same system. User load may seem obvious-how many users can be supported on a server-but it also involves what the users are doing, how the client systems are configured, and what type of network they are on. The last piece we'll look at-end-to-end performance testing-is perhaps the most complicated of them all. It calls into question everything I've listed here, and adds complexities such as, What is a real-world test? What numbers do people care about? As well as reality checks on user simulation, system/network configurations, transaction mixes, and much, much more.

Client/server computing is the most complicated environment ever to exist in the PC world. You can't look at just one component in your enterprise architecture and assume that optimizing it will improve everything. Not only do you have to remember that your system is only as fast as its slowest component, but also that the highly complex nature of this new paradigm called client/server computing means there are emergent properties causing problems in places you may not even be aware of. In other words, it is almost completely unpredictable-knowing the input to your system doesn't necessarily mean that you'll know the output.

Exchange
For our first round, we decided to use Microsoft Exchange 4.0 for testing systems as messaging platforms. We also took the opportunity to test Exchange itself, and see how well it scaled with varying levels of CPU and memory configurations-we'll look at disk, network, and other factors at a later date. We wanted to answer the question of resource utilization and whether or not throwing more at it made it perform better. The results were pretty startling, as you'll see in a bit.

Windows NT 4.0 claims to offer scaleability enhancements over NT 3.51 in overall performance and multiprocessing, and Exchange purports to scale up to 10,000 users on a single server system. While we have no question that-for the most part-these claims are true, what price do you have to pay for them? Yes, Exchange can support 10,000 users, but if it takes you ten minutes to download your mail and connections are lost left and right, it may not be worth the effort.

To find this out, we used LoadSim, a tool that Microsoft provides with Exchange Server, for testing system response times under varying user loads (see the sidebar, "LoadSim Revealed"). It is an end-to-end testing tool, meaning that it measures total system response time, from the moment the user request leaves the client to when the result returns (the simple explanation in this case is how long it takes for an email message to be sent and an acknowledgment to come back from the server, allowing you to move to your next task).

We did our first run of tests on a Tricord Systems PowerFrame (see the review, "Tricord: A Mainframe's Little Sibling" on page 55 of the magazine) with four 166MHz Pentiums (2MB of Level 2 cache for each), 1GB of RAM, and seventeen 2.1GB Seagate fast and wide differential SCSI-2 drives. Our intent was to remove as many bottlenecks as possible so that we were testing CPU and memory performance, rather than choking on other factors. Our client testing environment consisted of fifteen 60MHz Pentiums with 32MB of RAM on each, running NT 3.51 Server, all connected to the PowerFrame server via two 100 Base TX Ethernet segments (with 100Mbit going to each client, and two NICs in the server). Each physical client system simulated 100 users, giving us a total of 1500 users.

We set up the environment as a domain, since Exchange is a domain-oriented enterprise messaging platform and not a workgroup email system (although, Exchange functions just fine in small workgroups-the workgroups just have to be set up as domains). The target server (the Tricord) acted as the Primary Domain Controller (PDC), as well as the Exchange server and performance monitoring station. At the same time, there was an auxiliary workstation attached to the server's management card passively recording hardware-level data on CPU, bus, memory, and disk utilization.

In the Thick of It
I'm going to delve into a number of aspects of this test, laying the groundwork for future articles that you'll see in the months to come. There will be brief discussions of each aspect as it relates to our findings on the Tricord box, along with more general thoughts about benchmarking strategies, errors, etc.

We watched many different system/performance factors while running the LoadSim tests. These factors included CPU utilization, memory utilization, and disk performance, along with the metrics collected by LoadSim itself: average response time and transactions per day. With this data, we can analyze a number of performance characteristics of Exchange on NT, as well as the performance of the Tricord hardware platform.

The Environment
The first step in testing any platform is determining a hardware/software configuration that will best bring out the characteristics that you are trying to measure and that simulates a real-world environment (system price, amount of resources, etc.). At the same time, you don't want to introduce new problems by setting the system up improperly to run the application/OS you are testing-you need to know where the bottlenecks are.

To test scaleability of an Exchange/NT platform, we decided to look at CPU and memory. This decision required removing all other system aspects from the equation by throwing as many resources at the test as possible, such as disks and network bandwidth. This setup accomplished a number of things.

First, we set the system up with enough disk space so that the memory to disk ratio would more closely emulate a real client environment; remove the disk I/O bottleneck from the system; and be optimal for the application under test. In a client environment, an ideal setup is to have far more disk than memory, so that the slower disk access can be spread out over multiple controllers and multiple drives to keep up with fast memory accesses. This setup also has the effect of mimicking IS shops where large data sets are in use.

Also, most database applications eat up as much memory and disk as you can throw at them-since Exchange is based on a client\server database architecture (although it is optimized for email and semistructured groupware data storage, and not for structured transactional processes like SQL), we assumed its behavior would be fairly similar to an application like SQL Server and run well in a database-optimized environment. While it is true that the maximum configuration we used was pricey (more than $100,000), we decided that in this first go-around, we would test Exchange itself, and remove the hardware factors as much as we could.

To that end, the Tricord system was configured with 17 drives on a single multichannel fast and wide differential SCSI-2 RAID controller, as seen in Figure 1. This controller has four independent buses, capable of supporting up to 60 devices total. We set up a single OS volume on one drive with just NT 4.0 Server and the Exchange executables on it, which resided on bus 1. Since the system was running with 1GB of RAM, NT needed a large pagefile; also, NT operates best if the pagefile is split across multiple physical drives, so we put it by itself on a RAID 10 volume made of four drives, two on bus 1 and two on bus 2 (a RAID 10 volume is a mirrored set of RAID 0 disks-see the sidebar, "RAID Performance and NT" for a discussion of why we chose this). The logfiles we were creating from both LoadSim and NT Performance Monitor (Perfmon) were recorded to another RAID 10 volume, also on four drives and split between buses 1 and 2. All of Exchange's data files were stored on a completely separate RAID 10 data volume composed of eight drives, with four on bus 3 and four on bus 4. Splitting the disk accesses across more than one SCSI channel or bus further improves performance.

Next, we didn't want network I/O to be a significant limiting factor on our testbed. Ideally, 1Gbit fiber optic links between client workstations and the server would give the best possible performance, but it isn't terribly realistic. A reasonable compromise was to use two full-duplex 100 Base TX (100Mbit) Ethernet segments, with eight client systems on one, and seven on the other. Each network segment had a 12-port 3Com LinkBuilder 100 Base TX hub (a 3C250-TX/1). To maximize server throughput, both segments ran through a 100TX switch to minimize network collisions and provide the fastest (and fattest) possible data pipe from the clients to the server.

On your network you might only have a single 10Mbit NIC in your server, and in future articles we will be covering the impacts of changing your network configuration (upping the data transfer speed with faster cards, adding NICs, changing network layout, tweaking protocol settings on the server, etc.) on overall system performance. However, it was important for us to know that any changes in measured response times were not due to network timeouts and lost packets while the server chewed on user authentication and other operations. A machine this big has more than enough processing power to handle a vast number of users, but it will only work properly if the server is not wasting time waiting on the network-this is the only way to test true server throughput. We still experienced network problems at the server (see the sidebar, "LoadSim Revealed", and "Final Analysis" later in this article) which could have been due to the LoadSim tool itself, Exchange, or any number of factors-we were apparently unable to eliminate all network dependencies.

Ideally, you want to simulate only one user per physical client system, but that gets a trifle unwieldy when you are emulating 1500 users. This is another reason for using a 100Mbit instead of 10Mbit network: when a single system is running operations from 100 simulated users, the network traffic on that system is obviously far higher than if only one normal person is using it. We needed to provide enough bandwidth so that this was not a problem.

The Client
Client dependencies in LoadSim are actually pretty significant, so our testing strategy required the use of some fairly heavy duty systems. A fat client can simulate a lot of users without bottlenecking; a thin client couldn't do as many. Also, we didn't want to encounter the same bottlenecks at the client side that we were measuring at the server, so we needed a combination of system and user load that behaved as linearly as possible. In other words (and this is somewhat oversimplified), we couldn't simulate only one user per system, but to simulate 100 real-world users on the same box, the client system needs to be 100 times as powerful as a normal end user would have. This way, it can run more than 100 user operations at a time, and still be just as fast as if it were only running one (within a certain tolerance, of course). Remember, too, that most average users are not fully taxing their computers, so there is a lot of unused headroom. Your users may have 200MHz Pentiums on their desks, but they may only be using a tenth (or less) of their total throughput capabilities.

To accomplish this test, we used 15 Tricord clone servers (the PowerFrame DS1000, built on an Intel Extended Express motherboard) as the client/user simulation machines, each with a 60MHz Pentium, 32MB of parity RAM, an Adaptec 2940 SCSI controller, a 425MB (or 1GB) Fujitsu drive, a 3Com EtherLink III (3C597) EISA 100TX NIC, and integrated video. Each system was running NT 3.51 Server (with Service Pack 4).

After some capacity testing against the maximum server configuration (four CPUs, 1GB of RAM) with a single client, we settled on 100 simulated users per physical system. At this load, response times leveled off to an acceptable value on an unloaded server (about one second on average), and still represented a good target for actual users. At the same time, we decided that 100 users per system was a good break point for the simulation, without going beyond what is a reasonable representation of the real world.

At 100 users per physical client system, CPU utilization hovered around 30%, with occasional spikes to 100%. This means that LoadSim is far more memory constrained than CPU constrained-response times didn't become unreasonable until higher user counts were measured. Even then, the CPU was still not fully utilized, while memory usage was peaked.

Server Optimization Factors
We tested the Tricord in a number of different CPU and memory configurations as seen in Table 1A, to find how Exchange/NT scaled with resource allocation, and where Exchange breaks down and how it is optimized (at a set load of 1500 medium usage users).

Table 1A: Tricord Server and Exchange Configurations
# CPUs Total System RAM IS Buffers Directory Buffers Available RAM
1, 2, 4, 8 128MB 8612 2289 10MB
1, 2, 4, 8 256MB 23796 6325 90MB
1, 2, 4, 8 512MB 58403 10000 200MB
1, 2, 4, 8 1024MB 134967 10000 600MB
Exchange Server 4.0 seems to be more memory constrained than it is CPU constrained—it scales fine with additional CPU resources (since it is a multithreaded SMP-aware app), but memory utilization is not always optimal.

The Tricord box was optimized for network applications (which you set with the Control Panel/Network applet under the Services/Server tab) and for background applications (by turning down foreground optimization in the Control Panel/System applet, under the Performance tab) in order to maximize the server's throughput-at least, as far as NT was concerned.

Each time we changed the configuration of the server (varying the number of CPUs or memory using the /NUMPROCS=XX or /MAXMEM=XX switches in the BOOT.INI file, rather than physically adding or removing hardware), we reoptimized the system with the Exchange Optimizer tool that ships with Exchange Server (see Table 1A). You tell the optimizer what your expected user load is, and it analyzes your hardware configuration to find the best combination of memory, disk, and CPU usage, and adjusts software settings to match. As we changed available memory, the optimizer changed how much memory it used by increasing or reducing the number of Information Store buffers (it still left considerable memory unused-as indicated by the Perfmon memory counters; see Table 1B-even when there was much to spare. Exchange is obviously designed to run as a background application on a server that is doing other things at the same time, so Exchange doesn't hog all of the available resources). On the plus side, this meant that Exchange was operating totally within available memory, so paging (swapping) activity was negligible. Changing the number of CPUs didn't affect any of the Exchange settings.

Table 1B: Exchange Server Configuration
Microsoft Exchange Server information store log files Logfile drive
Microsoft Exchange Server Private information store file Data drive
Microsoft Exchange Server Public information store file Data drive
Microsoft Exchange Server Directory service database file Data drive
Microsoft Exchange Server Directory service log files Logfile drive
Microsoft Exchange Server Message Transfer Agent System drive
Microsoft Exchange Server Internet Mail Connector Files System drive
Minimum # of information store threads 10
Maximum # of information store threads 100
# of directory threads 50
Maximum # of cached categorizations 200
Maximum # of cached restrictions 200
# of private information store send threads 2
# of public information store send threads 2
# of information store gateway in threads 2
# of information store gateway out threads 2
# of information store users 1000
# of XAPI MT threads 2
# of XAPI MT queue threads 2
# of dispatcher threads 2
# of transfer threads 2
# of kernel threads 3
# of database data buffers per object 6
# of RTS threads 3

Disk configuration was a major issue in this test, because the Tricord server's disk subsystem is almost as fast as its memory. The four-bus fast and wide SCSI-2 RAID controller (fast and wide delivers a data transfer rate of about 20MB per second) with 17 drives offered outstanding performance (see the sidebar, "RAID Performance and NT"), and disk utilization never maxed out. Not only were the data sets smaller than the available space, but according to bus utilization data gathered by the monitoring console, there was plenty of disk controller bus bandwidth to spare. Disk writes far outweighed disk read activity, so the controller cache was configured accordingly. The cache module was assigned to all RAID volumes equally: the sequential log devices benefited from cache because all writes were immediately acknowledged upon receipt in the cache. The random I/O devices, such as the data volume, did not directly benefit from extra cache. Since the test was write intensive, potential read data was flushed from cache before it was of any use.

Metrics
There was a great deal of data we could have gathered during test runs-in fact, we did a couple with every Perfmon variable turned on as a control data set, but this seriously impacts system performance. We opted for only CPU and memory utilization data from Perfmon, and all other data (bus utilization, disk activity, etc.) was gathered passively by the Intelligent Management Subsystem (see the review, "Tricord: A Mainframe's Little Sibling," on page 55 of the magazine) monitoring console.

LoadSim records response times for all user transactions, and you run a statistics utility called lslog which calculates a final score for the whole test run. We weren't collecting data on transactions per second (TPS), because this is only meaningful if you are looking for absolute maximum scaleability of a server or total environment-which we weren't. You can, if you want, calculate these values by looking at transactions per user per day (in our test, about 67 messages per medium-load user per day plus other activities-around 80 transactions total, for 1500 users on a two-day run, during an actual four-hour period, so around 17 transactions per second), but this doesn't tell us anything since we never varied the user load. When finding the breakpoint of a server, and trying to compare the performance of one server against another, knowing the maximum number of TPS the machines are capable of can be a useful metric. However, since we never completely maxed out the server (except at one CPU at 128MB of RAM), the TPS values for every run would be the same regardless of the number of CPUs or available memory, because the same number of users were simulated doing the same things during the same amount of time. The Windows NT Magazine Lab will be looking at these values in the future, since they are important to know how a system will scale under varying user loads.

Load and Scaleability Results
Graph 1shows our final response time data. Based on comparable reports by Microsoft and Compaq, who used the LoadSim tool for similar capacity testing, we believe that these values are both realistic and that they reveal some interesting points about Exchange scaleability.

However, there are a couple of mitigating factors I should bring up before continuing. First, there were errors during all test runs (timeouts, etc.), but the error frequency was completely consistent across test runs: as memory went up, errors went down, no matter how many CPUs were in use. Second, the one data point for a one-CPU/512MB run would seem to be aberrant-however, a straight-line interpolation between the 1024MB and 256MB runs shows that it is within 10% (or 300 milliseconds) of the expected value. The error is just exaggerated on our graph. Third, because we turned the think time way down (by using a very short simulated day in order to make faster test runs), the simulated load is artificially high, and probably represents a user load closer to 6000 users instead of 1500 users (we used all of the default LoadSim settings for a Medium Usage user, except that we limited the test run to four hours, with a two hour day. This meant that all of the simulated users' work was compressed into a two-hour period per day, rather than letting it take a full eight hours). The CPU and memory utilization for test runs with 8-hour days and a 10-hour run were more than 20% lower than those of a 2-hour day and 4-hour run, with no lingering message queue after the test completed. So, what is the ideal test density and think time? That depends on how long you have to run your tests and what you are trying to prove-we were looking for scaleability information about CPU and memory resources, so dumping as much load on the server as we could was the only way to get resource utilization high enough to believe the numbers we came up with-otherwise, we weren't testing the CPU/memory engine.

But, even with these few considerations, we can draw some conclusions about Exchange/NT as a messaging platform for your enterprise. We found that over 128MB of system RAM isn't necessarily a good thing, because response times went up for each step in memory we made with two and four CPUs-response time did drop with additional memory on the one-CPU runs.

We can also conclude that throwing more CPUs at Exchange is a worthwhile expenditure-more so than memory, because response times dropped significantly with the extra processors. There is a point of diminishing returns-the jump in performance from two to four CPUs was not as great from one to two CPUs-but it depends on how much you are willing to pay for that little performance boost.

Looking at the big picture from the tests we ran, an optimal configuration for a single-function application server (in other words, your server is only running Exchange, or SQL Server, or another BackOffice application) is two CPUs with 256MB of RAM. This configuration offers the best combination of performance (almost twice that of one CPU) and price. However, remember that much of your server's capacity will be taken up by Exchange if you are running this many users, so if you want to add users or add an application, more CPUs will probably be necessary. More memory can also improve server capacity for additional applications.

Memory and CPU Utilization
Our test results bring up questions about how Exchange is optimized, and what is actually going on behind the scenes on your server. Does Exchange take advantage of all available memory? Are all of your CPU resources being used, or is there hidden bandwidth that you can take advantage of?

The LoadSim tool is not a complete answer when analyzing client/server performance. You also need to look at what your server is doing, rather than just at transaction processing times. NT's Perfmon is an ideal tool for looking into specific aspects of your network, client, and server performance. While our tests were running, we used Perfmon to record CPU and memory utilization information to find clues about why the server was performing the way it was (see Table 2, and Screens 1 and 2).

As you may know, NT does not scale linearly-3.51 didn't, and neither does 4.0. Now, this isn't NT's fault: If you are running a uniprocessor system at 90% CPU utilization, adding another CPU doesn't double system performance-you'll have two CPUs running at 50-60%, not two CPUs running at 90% under the same load. Additional CPUs add capacity, and can improve performance, but it is not a geometric relationship (you can't multiply system performance by the number of CPUs you add), and it is bounded, as you can see in Graph 1.

Table 2: Total CPU Utilization (averaged over all CPUs in use)
Memory (MB)
#CPUs 128 256 512 1024
1 100% 100% 100% 100%
2 92 72 88 80
4 69 66 60 63

Memory does not have a huge effect on CPU performance in Exchange, and only caused a minor variance in utilization levels. One possible reason for this variance (and for the increasing response times with additional memory), other than optimization characteristics of Exchange, is that as you throw more memory at NT, the system needs more CPU resources to manage it-although, SQL Server has not exhibited this in other tests we have run so far. Also, there seems to be a direct correlation between CPU utilization and the response times-as response time goes up, so does utilization, but CPU use behaves unpredictably relative to total system memory (Table 2 shows how the utilization values jump around at different memory levels, but always within a close range-processor interrupts also increase as memory decreases).

Memory usage is a different story. Exchange does not release memory once it has used it, probably because it has been assigned permanently (until a system restart) to the Exchange buffer pool (4KB buffers)-but, strangely, Exchange does not make profitable use of large amounts of extra memory when it's available (such as when we had 1GB of RAM on the system). I would expect that on a system with only 128MB of RAM, turning the number of assigned buffers up to the level they are on a 1GB system would offer very similar performance, but would leave very little headroom for adding users or other applications.

Disk Utilization and Performance
The Tricord PowerFrame has a very fast disk I/O subsystem, but it can't change the way NT and Exchange behave. You can minimize certain performance degrading effects by choosing the right disk configuration, such as an appropriate RAID level, number of drives, and so forth (see the sidebar,"RAID Performance and NT"), but you also have to know what is going on to deal with the problems and tune around them.

Once again, Perfmon (a combination of the Diskperf utility you can enable with the command DISKPERF -Y from a DOS Command prompt ) can be invaluable. The Tricord management software chose the right disk configuration for us, without affecting system performance, because the software records data passively off the system bus and passes the data via a serial link to a secondary workstation.

The mix of disk I/O activity changed according to the amount of system memory installed (see Table 3), less so according to available CPUs. As memory increased, disk writes (vs. reads) changed from 60% (at 128MB of RAM) to 97% (at 1024MB of RAM). Looking at the activity as system memory is reduced, we find that the remaining message queue at the end of the test reaches unreasonable levels-this result means that if your Exchange server doesn't have enough RAM, you end up with many messages not being sent until after the current load decreases (this process can take up to an hour on a heavily loaded server!), which can affect people who rely on quick delivery. Also, if you don't have enough memory, a high-usage server will end up queueing more and more messages to disk, and rereading them into memory-instead of merely logging the activity to disk-when they come to the top of the stack again. So, without a fast disk subsystem, as this read activity goes up, system performance will drop, whereas with more memory, more of this activity is cached directly from memory. One mitigating factor to all of this is that disk activity is not really that heavy-only on the order of 200KB per second, not MB per second as you would expect on a heavily loaded database server. The Tricord system had plenty of disk bandwidth left over for heavier I/O activity, such as public folders and groupware applications inside Exchange.

Paging activity on all tests was low and so did not significantly affect system performance, but it minimally increased as memory decreased.

Table 3: Resource/Performance Relationships
Factor
Increases
Decreases
More total system memory Response time Processor interrupts
Exchange buffers Disk I/O read activity
Disk I/O write activity EOT* message queue length
Additional CPUs Response time
CPU utilization
*EOT = End of test

Conclusions
We've opened up a big can of worms here, and I think I may have inadvertently stepped on a couple of them. As far as Exchange Server is concerned, our test results beg the questions, What is the Exchange marketplace? Will it displace existing enterprise UNIX and mainframe messaging systems, or is it aimed merely at migration from MS Mail or other smaller-scale applications? There are corporations that are currently engaged in rolling out 250,000 Exchange clients-this rollout would seem to indicate that Exchange is indeed an enterprise-scale application. However, the scaleability issues we uncovered, such as memory usage, ultimate user response time, and user authentication (logon) considerations show that work remains to be done on the server-side software components. NT and Exchange do scale-as do the hardware platforms they run on-but it comes at a price.

The moral of the story: Know your users, know your workload, know your hardware. Analyze your system performance with all available tools, such as LoadSim, Perfmon, Network Monitor, and any others you can find.

The issues of client/server computing are fabulously complex and frequently expensive to deal with, and they aren't going to get any simpler. User-side issues may get easier, but the administration, the capacity planning, the performance testing, and other MIS issues are only going to become more difficult to grasp as time goes on. As technologies such as clustering, faster server hardware, larger computers, new operating systems, and stuff we haven't even thought of yet come into play, testing will become more complex..

Bear with us. It's gonna be a wild ride.

For more information, read:
Performance: Concurrent Users Per Server (www.microsoft.com/exchange/evalgd/upswpfnl.doc)

LoadSim: Tool usage documentation (on MS Exchange Server CD)

Compaq: Performance of MS Exchange Server 4.0 on Compaq ProLiant Servers (www.compaq.com/support/techpubs/whitepapers/444a0696.html)

Microsoft: MS Exchange Deployment Conference Notes 1996 (contact Microsoft at 206-882-8080 or on the Web, www.microsoft.com)