Find out what's really behind server crashes

Many people who run small and large IIS Web farms believe that IIS is an unstable platform. They're annoyed because they need to restart their IIS server every hour, every 4 hours, or every 24 hours, so they naturally blame the software. In reality, however, IIS is an extremely stable platform that doesn't leak memory, crash unexpectedly, or require frequent reboots to remain stable. The real culprits of IIS downtime are probably your custom static files, Common Gateway Interface (CGI) applications, Internet Server API (ISAPI) extensions, ISAPI filters, Active Server Pages (.asp) files, and COM objects, which can cause IIS to crash if they run inside the same process space. To stabilize your IIS server, you need to take a look at IIS code paths and learn how the code in the above-mentioned file types affects the OS memory and threading.

Code Paths
To understand the causes of stability problems, you need to understand the different code paths that requests can take. Essentially, IIS handles five file types: static files, CGI applications, ISAPI extensions, ISAPI filters, and .asp files.

Static files. Static files are files (e.g., plain .html files, images) that don't need to be executed. They are the easiest of all the files for the Web server to deliver because the server needs only to read the file from disk and use the HTTP protocol to stream the file. If you have only static files on your Web server and a lot of traffic, the server could run for months and perhaps years without experiencing an IIS-related failure. However, you're probably using your server for more than just static files, so you can only imagine such a perfect uptime scenario.

CGI applications. Developers typically write CGI applications in C or C++ and deploy them on the server as executable files. Of the five file types, CGI applications are the least stable because they're typically programmed poorly. Developers who use CGI applications typically don't understand IIS and ignore the advantages of ISAPI applications and built-in scripting languages.

ISAPI extensions. ISAPI extensions are .dll files that expose APIs that IIS calls. You can map these DLLs to a file extension or call them directly. An example of an ISAPI extension is Macromedia's ColdFusion. ISAPI extensions are only as stable as their programmers design them to be. Before you deploy ISAPI extensions, you need to perform extensive testing, including stress testing, to ensure the stability of the ISAPI extensions. ISAPI extensions running in process will crash your IIS server if they overrun the stack or overrun their buffers, or if they divide by zero or perform other types of exceptions. Because of the necessary testing involved, programming a 100-percent stable ISAPI extension is extremely difficult.

ISAPI filters. ISAPI filters, which are different from ISAPI extensions, are DLLs that examine all requests entering the Web server. For example, ISAPI filters in IIS handle the Secure Sockets Layer (SSL) authentication protocol and compression. However, third-party ISAPI filters have the power to modify any of the IIS code paths and cause crashing problems.

.Asp files. Although you might think .asp is just another ISAPI extension, .asp files use a different code path in IIS. The asp.dll and the code in IIS are quite stable. An .asp file with simple ASP code or no ASP code will run without failure for months. However, bad code or—more likely—poorly written COM objects can cause .asp files to crash your server. If you aren't running any CGI applications, ISAPI extensions, or ISAPI filters—but you are running ASP with COM objects—then COM objects are most likely crashing your server.

Divide and Conquer
To track down a stability problem for a given file type, you must examine and test the Web server to categorize the problem into one of six areas:

  • Thread locking—the server can't handle many requests simultaneously.
  • High thread count—the CPU reaches 100 percent usage, but the server isn't doing any work.
  • Memory leak—the Web server is slowly leaking memory. When the available memory diminishes to a certain point, the machine crashes.
  • Runaway thread—the Web server CPU reaches 100 percent usage and never returns to normal.
  • Bad request—a request (e.g., URL, query string, host, language, browser combination) is causing the server to crash.
  • Miscellaneous—other types of stability problems that exist outside the scope of this article.

To determine the problem, you can use Performance Monitor to view the %Processor Time (under Processor), Get Requests/sec (under Web Service), Context Switches/sec (under Thread), and Available Bytes (under Memory) counters. In the event of a server crash, keep an eye out for the following determinants, which can point you toward the type of problem your IIS server is experiencing.

  • If your %Processor Time counter is at 100 percent, you have a runaway thread or high thread count. If the value is low (e.g., less than 40 percent) and your Get Requests/sec counter also shows a low value (e.g., less than 10), you have a thread-locking problem.
  • If your Context Switches/sec counter value averages less than 10,000 and the %Processor Time counter is at 100 percent, you have a runaway thread. If the Context Switches/sec counter value averages more than 10,000, you have a high thread count.
  • If your Available Bytes counter shows a value that's slowly diminishing toward zero, you have a memory leak.
  • To determine whether you have a bad-request problem, you need to first determine that you have none of the above-mentioned problems.

Thread Locking
Your machine probably has a thread-locking problem if the machine is handling only a few requests per second (e.g., one or two for every processor), the machine's CPU isn't working hard, and the pages are returning slowly. The primary reason for a thread-locking problem is that ASP has called a single-threaded COM object. Developers can choose to make COM objects multithreaded or single-threaded. Only multithreaded objects should be called from ASP because single-threaded objects keep threads and users waiting and can tie up your machine. This problem is common with Visual Basic (VB) COM objects, which tend to be single-threaded. VB components can also tie up your machine in other ways—for example, if they aren't compiled for unattended execution and if they're retained in memory. Microsoft offers the vbchkw2k.exe tool, which scans the DLLHost processes for incorrectly compiled VB components.

ADO ships single-threaded to protect the single-thread Microsoft Access database used by the sample site that ships with IIS. If your Web application is using ADO against Microsoft SQL Server or Oracle, you need to make ADO multithreaded. To do so, double-click the adofre15.reg file, which is part of the default ADO installation and resides in the same directory as the ADO files (e.g., \Program Files\Common Files\System\ADO on your server). The adofre15.reg file is a registry file that rewrites the threading properties of the ADO objects that reside in the registry.

High Thread Counts
Generally, a high thread count means that too many threads are running on your machine. Your processor is spending too much time managing threads and doesn't have time to serve pages. One cause of high thread counts is setting PoolThreadLimit, ASP Processor Threads, or MaxThreadPool too high for your hardware/software configuration in an attempt to increase performance. A high thread count can also occur when administrators attempt to combat threading problems by deploying a single-threaded COM object within Microsoft Transaction Server (MTS), which creates a separate thread for every instance of the object.

By reducing the number of running threads on the machine, you can keep the Context Switches/sec counter below an average value of 10,000. The counter can go above 10,000, but the average needs to stay below 10,000.

Leaks and Runaways
To fix problems involving memory leaks and runaway threads, you need to determine which Web site is causing problems. Doing so is relatively easy in IIS 5.0. You can run any Web site inside or outside the inetinfo.exe process, and to identify a problematic Web site, you need to set all your Web sites to run outside the process. In IIS Manager, right-click each Web site, choose Properties, and select the Home Directory tab, which Figure 1 shows. In the Application Protection drop-down list, select High (Isolated). Let each Web site run until Performance Monitor signals a memory leak or 100 percent CPU utilization. Doing so isolates your Web sites into separate processes.

To determine which process is leaking memory or overusing the CPU, open Task Manager on the Web server and go to the Processes tab, which Figure 2 shows. Sort the processes by Image Name and find all instances of DLLHOST.EXE. Determine which instance is using all the memory or the highest percentage of CPU time and record the process's process identifier (PID). Now that you know which process is experiencing problems, you need to figure out which Web site this process represents. Open Component Services manager from the Administrative Tools menu, then navigate the folder tree to Computers, My Computer, COM+ Applications. Right-click the node and choose Details to see all the applications' PIDs. Find your Web site in the list.

Now that you know the Web site, you can track down the component, object, or executable that's causing the problem. The cause of memory leaks and runaway-thread problems will be a COM object, ISAPI extension, or ISAPI filter. Your first step is to remove all third-party ISAPI filters and run your site for a fixed duration. If the site remains stable, add the ISAPI filters back one at a time until you discover the problematic filter. If none of the ISAPI filters are problematic, make a list of all third-party COM objects and the pages from which they're called. Use a stress-testing tool to call those pages repeatedly and determine whether the problem occurs more quickly with the tool than under typical Web traffic. If you have two COM objects on the same page, you can separate them into two testing pages and use the stress-testing method to identify the problematic page. A good free stress-testing utility is the Microsoft Web Application Stress (WAS) tool, which you can download from http://webtool.rte.microsoft.com. After you identify the problematic page, you know which COM object is causing trouble.

Bad Requests
If you don't have a memory leak or runaway-thread problem, you probably have a bad incoming request. When your server crashes, take a look at your IIS log files to determine which request caused the problem. For example, did the requested page call a COM object and were the parameters in the request's query string invalid, too long, not present, or outside the design scope? Poorly written COM objects can cause your Web server to crash, and a common bad habit of developers is failing to evaluate the input to the COM methods or properties before processing that input.

Another tool you can use to find bad incoming requests is IISTracer (available at http://iis-asp-script-real-time-monitor-tracer.pstruh.cz/help/iistrace/iis-monitor.asp), a simple ISAPI filter that displays the requests that the IIS server is handling. IISTracer uses a Web interface to display statistics about the incoming request. Determine which request is bad, then either use code, an ISAPI filter, or a service pack to prevent requests of the same type.

Still Have Problems?
If you still have stability problems after removing all third-party COM objects, ISAPI filters, and ISAPI extensions and are just calling files with ASP code, search the Microsoft Knowledge Base for articles that discuss the problems you're experiencing. Also, be sure to upgrade to the most recent service pack and examine other software (e.g., SQL Server, load-balancing software) that's not directly part of your site but that supports or affects your site. For example, antivirus and replication software can cause stability problems, as can out-of-date Oracle ODBC drivers. For information about a valuable troubleshooting tool that's built in to IIS, see the Web-exclusive sidebar "IIS Reset," http://www.windowswebsolutions.com, InstantDoc ID 25621.

Although Windows .NET Server (Win.NET Server), DNA, or n-tier COM object—dependent architectures are easy to implement, they're difficult to implement correctly. Continue to stress test and improve your COM objects, ASP code, and ISAPI extensions, and keep an eye out for memory leaks or better thread-handling methods.