Downloads
19880.zip

Move Web site log data to a SQL Server database for easier analysis

Microsoft Internet Information Services (IIS) 5.0 logs Web-server events by default. You can choose to turn logging off or ignore log data, but more likely, you want to collect and analyze the data for insight into how users use your Web site. Like Internet Information Server (IIS) 4.0, IIS 5.0 gives you a choice of log-file formats and lets you log to a file or directly to a database. My preferred method of collecting data is to log to World Wide Web Consortium (W3C) Extended Log File Format files, then import the files' data into a Microsoft SQL Server database for analysis. In the paragraphs that follow, I show you how to set up such a logging process and point out some minor differences between IIS 5.0 and IIS 4.0.

Logging to the W3C Format
To configure logging (which is enabled by default), you use Internet Services Manager (ISM) 5.0, which functions similarly to Internet Service Manager (ISM) 4.0. IIS 5.0 logging operations are also quite similar to IIS 4.0, and IIS 5.0 supports the same log formats as IIS 4.0:

  • W3C Extended Log File Format (exyymmdd.log)
  • Microsoft IIS Log File Format (inyymmdd.log)
  • National Center for Supercomputing
  • Applications (NCSA) Common Log
  • File Format (ncyymmdd.log)
  • ODBC Logging format

The filename format for logs closed on a daily basis is in parentheses after the log format. According to the Microsoft article "W3SVC and IIS Log File Names Are Listed with NCSA Format in HTMLA" (http://support.microsoft.com/support/ kb/articles/q240/0/27.asp), the IIS 5.0 documentation shows an incorrect prefix for the W3C and Microsoft IIS filename formats. However, the names are correct in my system's documentation.

One difference between IIS 5.0 and IIS 4.0 is that IIS 5.0 supports the ODBC logging option only in Windows 2000 Server products, not in Win2K Professional. For more information about IIS 4.0 logging and the four logging formats, see "IIS 4.0 Event Logging," March 2001.

The W3C Extended Log File Format is probably the most-used log format because it lets you log more information than the other formats and because it's flexible (i.e., you can specify which information to log). By default, IIS 5.0 W3C logging uses Universal Time Coordinate (UTC) time, which is a new name but is the same time as IIS 4.0's default Greenwich Mean Time (GMT). Thus, IIS 5.0 uses UTC rather than local time to determine when to create a new log file, and all the times in the log file are based on UTC. You might want to use UTC if you have many servers spread across several states or countries and you want to synchronize them on one time.

However, if all your servers are in one location, you probably prefer to use local time. IIS 5.0 gives you an easy way to specify that you want to use local system time for naming and rolling over log files. Open ISM, click the appropriate server, right-click the appropriate Web site, and select Properties. Click the Web Site tab, ensure that the W3C log format is selected, and click Properties. On the General Properties tab, which Figure 1 shows, select the Use local time for file naming and rollover check box. You can use local system time in IIS 4.0 Service Pack 4 (SP4) or later, but to do so, you must edit the registry.

Figure 2 shows the window that you use to choose which of the W3C format's extended properties to log. This IIS 5.0 window displays the properties in a tree format instead of the fixed list that IIS 4.0 uses. The IIS 5.0 format also displays in parentheses the property names as they appear in log files. This cross-reference is really handy when you need to compare log entries with the properties you chose to log.

IIS 5.0's extended properties are almost the same as IIS 4.0's, with a couple of exceptions. IIS 4.0's HTTP Status property is Protocol Status in IIS 5.0, although the IIS 5.0 documentation shows the property as HTTP Status. The IIS 5.0 Host property is new and contains the server name. If you turn on Process Accounting for an IIS 5.0 Web site, the Web server displays an additional set of W3C log entries.

The W3C "Logging Properties Reference" section of the IIS 5.0 online documentation is cryptic and somewhat confusing. I've reproduced its tables here, fixing a few errors, modifying some text, and changing the column headings to make the tables more readable. Table 1 defines the prefixes for the W3C property names as they appear in the logs. Each prefix identifies the type of action the property relates to. Table 2 describes the W3C properties and corrects the Protocol Status property information. Table 3, page 60, shows the new W3C Process Accounting properties, and Table 4, page 60, describes the values for Process Accounting's Process Event property.

When IIS 5.0 generates a W3C log file, it stamps at the start of the file a header like the one that Figure 3, page 61, shows. The header's first line identifies the server type, the second line identifies the log-file version, the third line is the date and time IIS 5.0 created the log file, and the last line is a space-delimited header line containing all the fields in a file row. Be aware that if you change the fields that are logged, IIS 5.0 stamps a new header in the log file, then continues adding rows consisting of the new fields to the file. If you're importing logs into a database, you'll need to make sure the database accommodates all the new log fields.

Importing Logs into SQL Server
If you want to be able to analyze log data, you need to import the data into a database where you can perform queries against it. If you choose to work directly with the log data that IIS 5.0 generates, several options are available to you. You could use the ODBC log format (instead of the W3C format) to log Web site data directly to SQL Server or another ODBC-compliant database. However, this method writes log entries to the database as events happen, so it can be resource-intensive. I prefer to set up a Data Transformation Services (DTS) package in SQL Server to load W3C log files. The package reads from a text file that contains the log data that IIS 5.0 generates and imports the data into SQL Server. Because the package runs only after IIS 5.0 has closed a log, the update can occur in batch mode and not bog down SQL Server or IIS 5.0 with updates each time IIS 5.0 logs an entry.

To routinely import W3C logs into a SQL Server database by using a DTS package, you need to set up a task within SQL Server that performs several jobs. You can schedule IIS 5.0 to close the log file at a certain time each day and schedule the task to run just after the log file is closed. The task must first copy the closed log file to a set input file or rename the log file to the input file.

Next, the task should strip the header information from the input file, deleting completely the first three header lines and deleting "#Fields:" from the fourth header line. The file now has only one header row, which contains the name of each field. Figure 4, page 61, shows the first two lines—the header line and one record line—of an input file with six fields. Finally, the task should run the DTS package to load the input file data into the database.

Before your task can run a DTS package to load data into a SQL Server database, you need to create the database and the DTS package. Actually, you could let the DTS package create the database, but I recommend that you create the table yourself. That way, you can create the columns with the correct data types, making it easier to query on numeric or date fields. You can also name the columns with your own names and, using DTS, map the names from the incoming file to the correct columns. I created a new SQL Server 2000 database named Iislogs for log data. Listing 1, page 61, shows the Data Definition Language (DDL) specifications I wrote to create the Iislogs table Inlogdata.

To create the DTS package, I started the SQL Server 2000 Enterprise Manager utility, expanded Data Transformation Services, right-clicked Local Packages, and selected New Package. These steps opened the DTS Designer. Next, I clicked Connection on the top toolbar, selected the Text File connection type, and selected the input file from the displayed directory structure. Then, I clicked the Properties button to open the connection's properties. On the first property page, I clicked the Delimited button and selected the First row has column names check box, leaving the default values for the other options.

I then clicked Next, and on the second property page I clicked the Other option and entered a space character in the box to the right of Other to specify that DTS should use a space as the column delimiter. At this point, I could see the columns correctly defined at the bottom of the property page. I clicked Finish, then clicked OK to close the connection's properties.

Next, I added a second connection. I selected Microsoft OLE DB Provider for SQL Server as the Data Source, then selected the server from the Server list. I entered a username and password for the database. I selected the Iislogs database and clicked OK.

Finally, I dragged a Transform Data task from the toolbar (on the left side of the interface) and selected the first connection I created as the source and the second connection as the destination. Then, I saved the package. To execute the package immediately, I clicked the Execute button (a right-pointing arrow) on the top toolbar. Figure 5 shows the log data in Enterprise Manager.

Analyzing Log Data
After you've loaded a log into SQL Server, you can use SQL to access and analyze the data. For example, the following SQL statement selects all of the data in the log table:

select * from inlogdata

You can order the log information by server with this SQL statement:

select * from inlogdata
        order by 's-computername'

You don't need to be a database or SQL expert to work with SQL Server data. You can use programs such as Microsoft Excel, Microsoft Access, and Seagate Software's Seagate Crystal Reports to perform queries and analyses of your data.

When using the W3C log-file format, the Win32 Status field is useful for identifying the errors that an application has reported. For example, to determine what a Win32 Status value of 5 means, issue the following command at the command prompt:

net helpmsg 5

The command will return the error message Access is denied.

The Microsoft article "IIS 'Bytes Sent' (Sc-bytes) Logging Property Is 0 for ASP Files" (http://support.microsoft.com/support/kb/articles/q254/7/18.asp) notes that W3C's Bytes Sent field might return 0 bytes when Active Server Pages (ASP) applications have buffering turned on. The article suggests turning buffering off. Doing so might cause the Bytes Sent field to report the correct data, but it almost certainly will significantly slow down your application.

As with other operations, managing and analyzing logs is a chore if you try to do it manually. Combining IIS 5.0 logging with SQL Server's DTS lets you automate the process of loading log data into SQL Server for easier analysis.

You can also use other tools such as Commerce Server 2000 Business Analytics to analyze log data. Another approach is to purchase a tool to automate analysis of your log data. For example, WebTrends Log Analyzer has some great reports that you can use to review what's happening with a site.

Yet another approach to gathering Web site usage information is to turn off logging completely and write Internet Server API (ISAPI) filters to capture from the HTTP header the data that interests you. As you can see, you have many choices in the area of collecting and analyzing Web site information.