Adding a search engine

Having great content on your Web site is of no value if users can't find it. I've always wanted to have a great search engine on the Windows NT Magazine Web site, but providing this feature is easier said than done. Of course, less than a year ago, we were hard pressed to find a retail search engine product. Some Web servers had built-in search engines, but they didn't offer much. Today, we have our choice of several Web-based search engines.

The trick to successfully implementing the right search engine is to find one that runs on your Web server. I tried the beta release of Microsoft's Index Server (code-named Tripoli) in May 1996. At the time, I passed on using Index Server for our Web site because it required Windows NT 4.0, which was still in beta. I didn't want to go that route on our production machine. I also had problems getting Index Server to stop indexing and serving every document off the Web machine. This behavior explains why Microsoft was promoting it as an intranet search engine. I'm not as concerned about people behind our firewall accessing the company's information, but I'm just not ready to share everything with outside readers and visitors. I also looked at Excite's search engine (Excite for Web Servers--EWS), but it occasionally locked for no apparent reason and brought down the entire Web machine.

Index Server Revisited
I've always had good luck with Microsoft's Internet Information Server's (IIS's) speed and stability. Because Index Server runs on IIS, I decided to look at Microsoft's search engine again and I downloaded version 1.1 from Microsoft's Web site.

Installing Index Server is a breeze. Just a point-and-click here and name a directory there, and you're finished. I think downloading the program took longer than the installation.

The installer asks you for three pieces of information: Where do you want to store your indexes? Where do you want to store your scripts? Where do you want to store your sample files? The installer then goes off and starts indexing. When you run your first query, you realize just how many document types this server indexes. The types of documents you can query include HTML and plain text and Microsoft's Word, Excel, and PowerPoint documents. However, this ability is a double-edged sword. Index Server makes all your documents available for searching, including documents other than the HTML files in your Web directories.

Because I create most of our Web site dynamically with Cold Fusion (as you know from my December 1996 WebDev, "The FAQs of Web Forums"), most of my pages are templates. So when Index Server pulls up a page in the query summary, the page looks like a text document full of garbage. This situation means I have to be careful about what directories I index.

Index Server 1.1 Features
Index Server 1.1 has several nice features. The query results page, as you see in Screen 1, shows you the number of matches Index Server finds based on your query and links each match to the original document. The results page also provides an abstract of the document to let you decide whether the document is even close to what you wanted. At the end of each query, you can view the file size for each document and the date the document was last updated.

Index Server 1.1 includes new highlighting features. After you post a query, you can view the contents of each document that the query returns, and the search criteria appear highlighted. Index Server displays a page from your query with every occurrence of the search criteria highlighted in the document so you can scan quickly to find each instance. And each highlighted word has a backward and forward symbol next to it so you can click through each occurrence of your queried word. This output is just text--no images.

One area where Index Server 1.1 beats most other packages is in the number of ways you can query an index. Index Server lets you search according to file size, modification time, and author property. The modification option is especially handy. Suppose you want to see all the documents about SQL we've put online in the last month. You enter SQL in the query box, select in the last month, and start your search. Index Server returns only the specified type of documents that fit the given timeframe.

Configuring Index Server
Although installing the software is easy, configuring Index Server can be tricky. Index Server can easily share every piece of information on your Web server to your intranet. I had problems restricting which directories I wanted Index Server to index and make available for query.

Here's the short version of how to limit what information Index Server can access and share. From the Administration page, you can select View/Update indexing of virtual roots to select which directories Index Server can access. The trick is to have the right virtual directories set up in IIS. As you can see in Screen 2, I set up some virtual directories (i.e., roots) to speed the HTML process by not having to hard code every link to every page. Index Server uses these paths to determine what directories it indexes.

By default, Index Server selects all the directories on your Web server. I wanted only the online magazine articles to be searchable, so I deselected all the directories except /issues and clicked the Submit changes button.

After you establish your virtual roots, you go back to the Administration page and select Force scan virtual roots. You then tell Index Server to do a full scan so that it can index all the documents in those directories. This step limits Index Server's scope.

A more complex way of limiting Index Server's scope is to set the NT permissions for every directory. You just remove the IIS user account rights from a directory, and Index Server will skip right over it.

I have NT 4.0 Server and IIS Internet Server Manager running on my laptop. To make the changes, I connected to the LAN with Remote Access Service (RAS). First I set the permissions on the search administration directory so that I could access it via the Web. Next I fired up the IIS Internet Server Manager and created all the virtual directories I needed. Then I accessed the Index Server Administrator page with Internet Explorer 3.01 (funny how I can't get Netscape Navigator to work when I do this) and started configuring Index Server.

Now that Index Server 1.1 is set up the way I want, I really like it. Be aware that Index Server's documentation is weak. (At least the guys on the Index Server team are quick to fess up to the quality of the documentation.) For example, when I went to find out about changing the query options, the Help file said that I needed to change the .idq file. Unfortunately, Index Server has about a dozen of these files, and each one complains rather quickly when you change it.

Index Server makes managing new directories and updating existing ones easy. Between Index Server's ease of administration and all that I saw IIS 3.0 do at the Microsoft Site Builder conference last October, I've decided to switch from our current setup back to IIS.

Index Server 1.1
Microsoft
206-882-8080
Web: http://www.microsoft.com/windows/common/contentNTSIAC03.htm
Price: Free