The right Performance tool counters help you get the job done
The Performance tool that ships with Windows 2000 is a good means for monitoring Microsoft IIS's traffic, stress, and workload. The Performance tool (known as the Performance Monitor in Windows NT) is an open interface that Microsoft and third-party services and products use to report statistical and trend information about their status. Many vendors have added counters to the Performance tool that don't necessarily relate well to one another. Consequently, the information that Performance tool counters collect about IIS can seem inconsistent. To make matters worse, precise definitions for the counters aren't well documented. I attempt to clarify which performance counters are important for monitoring Web services and what those counters really tell you. The information I provide here is based on the Win2K Performance tool; the NT version differs slightly.
The Performance Tool
As I've already mentioned, the Performance tool is a Windows application that provides a GUI for tracking application performance data and statistical information. Each service or application object can expose one or more counters for one or more instances of the object. For example, you can track the Current Connections counter for the www.myserver.com instance of the Web Services object.
The Performance tool uses a standard API call to a service or application to get the counter information. Because the Performance tool requests information only for counters that the user has selected, it doesn't place much load on the application that's reporting information. However, if the application doesn't reply in a timely manner, the Performance tool shows a timeout error. You might be able to decrease such errors by increasing the interval between samples.
To open the Performance tool in Win2K, select Start, Settings, Control Panel, Administrative Tools, Performance. The Performance tool provides three views of the information it gathers: Chart (line graph), Histogram (bar graph), and Report. Icons on the toolbar let you move from one view to another. Most of the time, I use the Chart mode, which Figure 1 shows.
By default, the Performance tool has no counters running. To add counters, click the plus-sign icon on the toolbar to open the Add Counters dialog box , which Figure 2 shows. After you add one or more counters, you can select a counter from the list at the bottom of the screen in the Chart or Histogram views to see the Last (i.e., most recent), Minimum, Maximum, and Average samples and the Duration (i.e., length of the sampling period). By default, the Performance tool samples every second; however, you can set the monitor's properties so that it samples less frequently or you can click the camera icon on the toolbar to force an immediate sample. To highlight a counter's line or bar in the Chart or Histogram view, click the lightbulb icon on the toolbar. Now that you're familiar with some Performance tool basics, all you have to know is which 10 or so counters to add to monitor IIS.
If you have a Web server with multiple sites that call third-party and custom COM objects, you need to keep track of how much of your system resources those Web sites and objects are using so that you'll know when you need a new box or whether you have to make your current box work harder. Watching memory and CPU usage is a great way to tell whether your machine's essential resources are getting tapped out.
If you think that one of your custom objects—a third-party COM object, an Internet Server API (ISAPI) filter, or an ISAPI extension—is leaking memory on your Web server, add the Memory object's Available MBytes counter. Then, click the Properties icon on the toolbar to open the System Monitor Properties dialog box. On the General tab, select the Graph view and set the monitor to update every 60 seconds. Monitor the graph for a 24-hour period when the production server is handling typical traffic or in a test environment that's running a stress tool such as Microsoft Application Center Test, which comes with Visual Studio .NET Enterprise Developer. If the counter decreases steadily, the machine is leaking memory. Make sure that you watch the graph over a period of time at a low refresh rate because memory use fluctuates a lot.
Note that IIS performance slows down when available memory drops below 60 percent of the RAM on the machine, not including the system swap file. Thus, you can encounter performance problems well before you run out of RAM.
To track the system's CPU usage, use the Processor object's % Processor Time counter and _Total instance. The % Processor Time counter tells you what percentage of CPU time the Web server applications and services are using. If you don't have threading problems (i.e., too many threads or thread locking), you can use this counter to judge how hard your machine is working.
Look at the Average field of this counter because the Last field fluctuates a lot. If the average is higher than 60 percent, some latency might be occurring. (If your CPU usage is greater than 60 percent, it probably spikes to 100 percent. When the processor runs at 100 percent, some operations are delayed until the usage drops below 100 percent.) If you have a high CPU load, tune or upgrade your machine until it runs below 60 percent average CPU usage.
If your Web server gets a lot of requests (i.e., more than 50 per second) and the processor peaks a lot, the server probably has thread-lock or bottleneck problems. Thread locking can occur when multiple resources try to access a single-threaded resource—for example, when many users access an Active Server Pages (ASP) page that calls a single-threaded Visual Basic (VB) COM object. The multiple ASP threads lock up the COM object. Other bottlenecks occur when users or processes try to access a database or other shared resource that doesn't respond quickly.
If your Web server is handling only as many ASP or ASP.NET requests per second as the machine has processors and your CPU is working at less than 10 percent capacity, you probably have a thread-lock problem. The COM object can work with only one thread at a time, so the number of processors limits the number of requests the Web server can handle per second.
To get a good picture of your CPU load and Web server response time, you should monitor your system while it's receiving at least 50 requests per second. If your Web server can't reach 50 requests per second no matter how hard you apply stress, you probably have a thread-lock problem.
In summary, the CPU load should average below 60 percent, and the more requests your Web server receives, the more even the CPU load should appear. A server under high load that can't handle lots of requests or has peaks in CPU usage probably has thread-lock or bottleneck problems. Threading problems typically require coding changes to custom COM objects or ASP scripts, or tuning of Microsoft Transaction Server (MTS) deployments and the IIS server.
Another good way to tell whether you have a threading problem is to look at the Thread object's Context Switches/sec counter and _Total/_Total instance. The Context Switches/sec counter tells you how many times per second the processor is switching from one thread to another (i.e., from one context, or thread state, to another) to handle all the threads on the box. Switching from one thread state to another increases processor overhead, so you don't want this counter value to be too high. If the value averages more than 10,000, your processor is doing a lot of work managing threads—and not enough work running them.
Frequent context switching can be the result of large ASP thread pools, mixing COM objects with different threading models, running COM objects that are using the apartment threaded model in MTS, running COM objects in separate processes, or large IIS I/O thread pools. Errant applications spinning too many threads can also cause excessive context switching.
Web Server Traffic
The amount of traffic your Web server handles is probably the best indicator of how hard that server is working. Although you can define traffic in different ways, I define it as the number of requests per second the Web server handles—not the number of connections to the Web server. Because this definition is controversial, I want to take a minute to explain why the number of requests per second is a better indicator of server load.
Thus, I recommend watching the Web Service object's Get Requests/sec counter. It's a better traffic indicator than the object's Current Connections counter. Get Requests/sec tells you how many GET requests the server is handling at a particular second. If your site is POST heavy, you can also look at the Post Requests/sec counter. The Last field is interesting, but to gauge your traffic, you should look at the Average field. As a reference, most Web servers aren't really working until they average more than 50 requests per second. If your Web server can't handle this number of requests, you need to look for threading or program bottlenecks. Well-tuned servers can process more than 250 requests per second on average. Continuous traffic at 50 requests per second amounts to 4.32 million requests, or hits, per day.
You can also monitor the number of bytes the Web server sends and receives. A GET request is typically 100 bytes to 500 bytes; a POST request can be much bigger. Use the Web Service object's Bytes Received/Sec and Bytes Sent/Sec counters and the Average field. Keep in mind that a 10Base-T LAN's maximum throughput is about 9Mbps, or 1.125 MBps; don't expect your Web server to exceed your network limits.
Another useful Web Service object counter is Service Uptime, which you can use with a Web site instance that you specify to report the number of seconds that a Web site has been up. You can use the Service Uptime counter to check whether IISreset (which ships with Internet Information Services—IIS—5.1 and IIS 5.0) or some watchdog utility restarted the Web site in the middle of the night.
IIS has a RAM cache that stores the most recently used static files. (You might not even know this cache exists because it doesn't require any administration.) New in Win2K are performance counters for monitoring this static-file cache. You can use registry settings to store more files, make the cache bigger, or lengthen files' Time to Live (TTL) value.
If you think you might want to change one or more of the registry settings, monitor the cache for a while first to determine how it works. You can use the Internet Information Services Global object's Total Files Cached counter to find out how many files are in the cache. Although you can't force files into the cache, you can tune your Web server to allow more files in the cache. IIS serves cached files faster, so the more files in the cache, the better.
The Internet Information Services Global object's Maximum File Cache Memory Usage counter tells you how much RAM is devoted to the file cache. On machines that have a lot of RAM, assign more RAM to the cache so that it can handle more files.
The Performance tool is a great way to get a handle on what your Web services are doing. The Web Service, Active Server Page, and Internet Information Services Global objects have a lot of great counters to track your Web server. As you run these counters on your servers and gain experience with them, you'll be able to tell at a glance how well your systems are performing.