Is meticulous Web site analysis worth the effort?

Microsoft Site Server 2.0 is a suite of tools for creating and managing Web sites. This product might appeal to large corporations, but companies that specialize in creating Web sites will find it especially valuable.

Site Server's four basic components are the Site Analyst, Usage Analyst, Personalization System, and Commerce Server. Site Analyst helps map, manage, and maintain Web sites. Usage Analyst imports HTTP log files to a SQL Server database and generates numerous canned reports of site activity. The Personalization System tracks user-specific Web browsing and helps customize the information a user sees on subsequent visits. Commerce Server extends Internet Information Server (IIS) with commerce-specific features to define a storefront, display products, and take orders electronically. These components have predefined views that Web masters can customize. I'll discuss only the Site Analyst and Usage Analyst modules in this article.

Site Analyst
Site Analyst is an impressive tool that you can use to examine your own or others' Web sites. It can map Internet sites and sites you've stored on a local or network drive. You provide the path or URL of the Web site to analyze, and the tool explores and maps every page, link, graphic, sound byte, and video in the site's pages. As you watch, Site Analyst slowly creates a site map on your monitor and can copy the map to your local disk. I felt like a voyeur watching the IRS Web site unfold, and I understand why common Internet etiquette views copying Web sites as unethical.

Site map and site summary. After installing Site Server's sample local site, I decided that a better test of the product would be to point it to a live site on the Internet. I typed in the URL for the Windows NT Magazine Web site and checked the options to verify offsite links and create a summary report.

As I watched Site Analyst create a map of the magazine's Web site, I developed a new respect for Web masters. Screen 1 shows the hypertext Web site summary report. The Windows NT Magazine Web site is 9 levels deep and contains 1244 pages, 4015 links, and 1948 images. Keeping this data (not to mention the Java applets, Lab Cam, and other dynamic applications) current and correct is a demanding job.

The summary report includes links to other detailed reports, such as Pages, Hierarchy, Images, Media, Gateways, InLinks, and Index. On the index report, you can click a button to remove the page icons from an alphabetized list of the contents and thus create the publishable version of a site index. This feature might be valuable for some sites, but as the five Back to the Future entries show in Screen 2, page 88, a ready-to-go index requires each link to have a unique description. For the Windows NT Magazine Web site, the summary report created 202 HTML files and consumed 9.5MB of disk space.

Cyberbolic view. Site Analyst has two display modes: a tree view and a cyberbolic view. Screen 3, page 88, shows both views for the Windows NT Magazine Web site. The left pane presents a traditional folder (tree) view, and the right pane shows the cyberbolic view, which probably looks familiar to FrontPage users. In the cyberbolic view, the magazine's home page is the W in pink at the bottom of the screen. A new feature in Site Analyst is the ability to dynamically move the view to uncover objects at the outer edge of the display (pages that are many levels deep). When you select an object, you can drag it to display links that are otherwise invisible. You can also orient the view so that the home page is in the center or at the left of the screen. Double-clicking a page launches your default browser to display the page.

You might want to view only the pages on a Web site, or you might want to see the graphics, sound bytes, and video. To fine-tune the display, select View, Display Options. Select Enable or Disable for the nine display options (Applications, Audio, Gateways, Images, Internet Services, Text, Unrecognized, Video, and Webmaps). Options are color-coded, so you can easily decipher a complex Web site if you enable all options. Broken links appear in red.

In the cyberbolic view, you can control the type and size of the labeling font. If you want to print this view, you need to increase the font size because the default 4-point type is difficult to read.

Viewing and correcting links. You can easily determine which links a page includes. In the tree or cyberbolic view, right-click an object and choose Links to see all the URLs on the page, or choose InLinks to see links on other pages that point to the selected page. To customize the data that Site Analyst displays for links, click the Columns button and include or exclude fields as needed. When the link view is up, you can highlight other pages in the tree or cyberbolic view to have Site Analyst automatically update information for the new object.

Site Analyst has built-in search options to locate objects by various criteria (e.g., title, URL, hidden objects, load size). The custom search option lets you search by three sort criteria, including numerous HTML tags. You can search the whole site or just displayed pages.

Web masters will appreciate Site Analyst's ability to build a site map and verify links. The mapping option to verify and correct broken onsite links is typical. The option to verify offsite links ensures that your reference sites are correct.

Other Site Analyst features. The Site Analyst User's Guide provides concise information about additional Site Server features. For example, Chapter 8, "Site Management Tips and Techniques," clearly documents the procedures for many tasks that are common to Web management, maintenance, and analysis, including

  • Remapping a site to graphically view customizations you made to the previous map

  • Providing information about objects that are new, changed, or orphaned

  • Comparing maps for two versions of a site to see changes

  • Refining helper applications to launch your favorite HTML editor or to view pages in different browsers

  • Checking image ALT strings or page load sizes

  • Exporting the tree view of the site map to create a hyperlink table of contents

  • Creating a hyperlink site index from the Site Summary Report
Site Server 2.0
Contact: Microsoft * 800-426-9400, Web: http://www.microsoft.com
Price: $1499
System Requirements: Windows NT Server 4.0 with Service Pack 2, Internet Information Server 3.0, SQL Server 6.5 with Service Pack 3, SQL Server 6.5 with Service Pack 3, 64MB of RAM, 400MB of hard disk space, CD-ROM drive, VGA monitor

Usage Analyst
Usage Analyst imports usage logs to a SQL Server database. It supports 25 log file formats and provides many canned reports of Web activity in summary and detail form (e.g., the most popular pages, bandwidth utilization, and geographic breakdown of visitors and organizations). You can easily customize these reports to match your Web site construction or to satisfy your curiosity.

I wanted to use a real-world example, so I asked the Windows NT Magazine's Web master for some usage logs of the magazine's Web site. I received a 17MB log file of visits to the magazine's forums and a 204MB log of visits to the magazine's main Web site.

Defining a log file, server, and site. The first time you start Usage Analyst, the Server Manager prompts you to define a log data source, servers in the log file, and Web sites to analyze. When you define the log data source, you must identify the log file format (e.g., IIS, IIS extended, Apache). To analyze a log format on multiple HTTP servers, enter the servers' domain names to define the servers under one log data source. Each server can host multiple Web sites, so under the server icon you need to define each site you want to analyze. Usage Analyst identifies sites by URL. The Server Manager is a powerful tool for companies that host numerous subscriber Web sites on one server.

For instance, suppose you have three Web sites on one server running IIS and you want to analyze each of them. All the log data is in one IIS log file, so you define one log data source (e.g., d:\logs\webs.log), one server (e.g., winntmag.com), and the Web sites to analyze (e.g., http://www.win ntmag.com) in Server Manager. Screen 4 shows Server Manager's description for the Windows NT Magazine log, with details of two imported log files.

To analyze a log file, start the Import Module and enter the usage log's filename. The Import Module prompts you for the sites you want to analyze. You must choose from the sites you defined in Server Manager. Click the green right-arrow button on the toolbar to start the import operation.

In my test, the 17MB log of 1 week's visits to the Windows NT Magazine forums took 7 minutes and 32 seconds to import. The 204MB log of 1 week's visits to the main Web site took 31 minutes and 36 seconds to import. Site Server's sample log file was hardly a reflection of the real world. It was only 102KB and imported in 6 seconds.

After you import the log file into the SQL Server database, you can generate usage reports. If you want reports based on page titles, domain names, and geographic summaries, run the following options on the Tools menu before you start the Analysis Module: Lookup HTML Titles, Resolve IP Addresses, and Who Is Organizations. You need access to an Internet-based Domain Name System (DNS) server to resolve IP names and WhoIs queries.

Reporting on Web visitors. The Analysis Module's built-in catalog of reports works well for beginners and reviewers. Click File, Open Analysis Catalog, and expand each category to see options for Advertising, Auditing, Detail, and Summary reports. When you highlight a report, an explanation of its contents appears in another window.

To start a report, select Executive Report in the Summary category, and click Next. From the usage log, select the Web site you want to analyze, click Next again, and select the report time period from the options you see on the screen. The final dialog box asks if you want to restrict the report to a portion of the Web site by path name, domain name, or country. Clicking Finish pops up a screen that shows the report fields and details. Right-clicking a field lets you edit the field, filter data values, add a description, and set the font size of the headers. Screen 5 shows the fields in four built-in reports: Executive Summary, Hit Detail, Geography Detail, and Advertisement Detail.

To finalize the report, specify the name and output format. For report output, you can choose HTML, Microsoft Word, Microsoft Excel, or fixed-width text. Click the green right-arrow button on the toolbar to create the report. When the report is complete, Site Server automatically opens Word, Excel, or a browser window to display the results.

After I imported my 204MB log file, the Analysis Module took slightly longer than 11 minutes to generate the Executive Summary report. Thus, if your monthly log file is larger than 1GB, I recommend that you import data and generate reports during off-hours. The scheduler utility in the Tools menu automates import and analysis jobs with an interface similar to NT's command scheduler, winat.exe.

The last page of each report includes definitions of terms in the report. You need to thoroughly review these terms to fully understand the data in the report. For example, Site Server records a hit whenever someone touches a page. A hit is any type of connection to an Internet site. The number of hits is relatively meaningless information, because it includes accidental visits and errors. Site Server records a request when the site responds to a user's query for information. The number of requests is useful information, because it includes only visitors searching for content. Site Server records a visit when a user makes consecutive requests from a Web site.

I recommend creating usage reports in Word format. The HTML version creates graphs that are too large to view easily, and it produces an empty weekly trend graph if you analyze only 1 week's data. If you plan to use the HTML report frequently, you'll want to create a version that omits the weekly trend summary.

My test logs for the Windows NT Magazine Web site were from December 3 to December 13, 1997. The Executive Summary and Hit Detail reports produced the following usage statistics:

  • Daily Web hits average 6900, evenly spread between work and nonwork hours.

  • The number of daily visits ranges from 3000 to 8000.

  • The number of daily visitors varies from 2500 to 6200.

  • Requests for content pages total more than 1 million per week.

    Visitors request an average of 27 pages.

  • Visitors browse the Web site 24 hours a day; daytime traffic is twice as high as nighttime traffic.

  • Visitors spend an average of 7 minutes on the Web site.

Another interesting summary ranked the top 20 organizations to visit the site. Analysis revealed that Microsoft is the number one visitor to the magazine's Web site, followed by UU.NET, America Online (AOL), and CompuServe. Windows NT Magazine employees and associates rank eleventh, and a large per- centage of visitors are employees of Digital Equipment, IBM, HP, MindSpring, and Boeing.

The usage by country summary report showed that visitors from the US account for 83 percent of visits. Visitors from the UK, Australia, and Canada represent 3 percent each. The report showed visitors from all over the world, including Austria, Indonesia, South Korea, Malaysia, Thailand, South Africa, Japan, Brazil, Israel, New Zealand, Taiwan, and Poland. International visits explain why the Web site has activity 24 hours a day.

But Wait--There's More!
Site Server is a complex product with a steep learning curve and numerous customization features. You need to use the software repeatedly to fully understand its options. After using the software for about 60 hours, I grasped the basics and some of Site Analyst's and Usage Analyst's more sophisticated features. I didn't fully explore Site Analyst or Usage Analyst, but I noted some needed improvements for the next version.

The cyberbolic view for large maps needs some fine-tuning. You need a delicate touch to properly line up the dynamic view to expose objects that are many levels deep. Viewing multiple links is difficult, even on a 17" monitor after tweaking the node distance to fit information on the screen. When the node distance is close, the labels overlap and are unreadable. Increasing the node distance reduces the number of objects Site Server displays. The software doesn't display flyover labels long enough for you to read them, especially when an object is many levels deep in the hierarchy.

Site Server also needs an option to concurrently map multiple sites or import multiple log files. Currently, a site map or log import ties up the window until the operation completes. Importing my 204MB log file tied up the screen for 31 minutes.

Another problem is that when you delete a site from Server Manager in the Import Module, you can still select the site in the Analysis Module. You obviously can't analyze a site you've deleted, so you shouldn't be able to select it.

Finally, I couldn't figure out how to report on each visitor's unique address. I suspect Site Server offers this feature, but I didn't spend a lot of time digging for it.

Overall, Site Server is a useful product that helps you maintain good onsite and offsite links. In the cyberbolic view, the software automatically updates the link dialog box as you move from object to object. This feature saves you time because you don't have to continually open new dialog boxes. Different sites typically want different information, and Site Server's canned reports give you a convenient template for generating new reports. If you want nonstandard usage information or simply like treasure hunting on Web sites, you'll want to consider Site Server.