Downloads
21278.zip

Ever since the Internet spawned Web traffic, systems administrators have been trying to find creative ways to decrease network traffic. With the flurry of graphics, Java scripts, and other assorted Web-based goodies that make up the Internet, administrators have been pulling out their hair in frustration.

Modern-day applications use cookies and Internet caching to optimize network resources. However, cookies and caching can be burdensome. Users who surf the Web might be inadvertently providing more information about your company to the outside world than you'd like through the use of cookies. Similarly, users can inadvertently set the cache size too large, which can cause hard disks to fill up with potentially unneeded content. In this column, I look at cookies and how you can view and change them with Perl scripts. In my next column, I'll discuss caching.

Cookies and the Library
In its heyday, Netscape introduced a new concept called cookies. Here's an overly simplistic look at the cookie process. When a Web browser contacts a Web server for the first time, the server provides the browser with a special phrase. Thereafter, whenever the browser connects to the server, the browser says, "Hey, last time I was here you told me to mention the special phrase xxxxxx." When the server hears this special phrase, it responds by saying "Ahhh, now I remember you. You're contacting me for user X." Cookies are the special phrases that servers use to identify users. A cookie can be literally any combination of characters, from just one word in plaintext to a chunk of encrypted binary data. Typically, the server has a database in which it stores cookies and their related user information. Then when browsers with cookies connect to the server, the server looks in the database for the related user information.

In the mid-1990s, Microsoft decided that the Internet was important enough to warrant a general-purpose library that would provide functions to access the Internet. Microsoft developed this library and called it WinInet (wininet.dll), short for Windows Internet. WinInet provides a way for developers and administrators to add Internet capabilities to any application. Soon after the release of this library, applications that had Internet capabilities flourished. Implementations included the obvious (e.g., Web browsers, FTP clients) and the not-so-obvious (e.g., word processors, spreadsheet applications). However, the latest build of Mozilla (i.e., the open-source Netscape browser) doesn't appear to use the WinInet library.

WinInet includes functions that support cookies and the Internet cache. Although Perl's Win32::Internet extension provides access to WinInet, this extension doesn't expose the cookie and cache functions. As a result, you need to use the Win32::API extension to interact with the library directly. For many non-C programmers, Win32::API is daunting to use. Thus, in this column's code, I use the Win32::API::Prototype module to work with Win32::API. You can find the Win32::API::Prototype module at http://www.cpan.org or http://www.roth.net/perl/packages.

Fetching a Cookie
Let's start with the script GetCookie.pl, which Listing 1 shows. If your eyes roll to the back of your head when you look at this script, pay close attention: This code isn't difficult—it just looks that way. After you read how the code works, it'll make sense to you.

You can use GetCookie.pl to view the cookie that your Web browser uses when connecting to a particular Web site. When you launch the script, you just specify that site's URL. For example, when I ran GetCookie.pl with MSN's main URL (http://www.msn.com) with the command

perl GetCookie.pl
http://www.msn.com

I obtained the cookie value that Figure 1 shows. My Web browser sends this value to MSN's Web servers every time I go to the MSN site. I don't have the faintest idea what this value means, but the MSN servers do. My guess is that the value is a globally unique ID (GUID) that MSN uses to identify me.

The value that Figure 1 shows actually consists of two cookies delimited by a semicolon. The msn.com server might have set one cookie, and the www.msn.com server might have set the other cookie. Cookie values are inclusive. When users connect to MSN, they send cookies for both the msn.com and www.msn.com servers.

Now that you've seen the results, let's see how GetCookie.pl works. This script interacts with the WinInet library to obtain cookies. Callouts A and B in Listing 1 contain key code.

The code at callout A begins by calling the Win32::API::Prototype's exported ApiLink() function to create a link to WinInet. I specified the name of the library followed by a string that represents the library's C-language function for obtaining a cookie. I simply copied this InternetGetCookie function from the Microsoft Developer Network (MSDN) Online Web Workshop (http://msdn.microsoft.com/workshop/networking/wininet/reference/functions/internetgetcookie.asp) and pasted it into the ApiLink() call.

The code at callout B in Listing 1 begins by packing an unsigned long value (i.e., a 32-bit number) with a 0 value. The next line calls the InternetGetCookie function; the last parameter in this call is the packed value. InternetGetCookie will replace this 0 value with another value that specifies how large the cookie value is, thereby enabling the script to create a string large enough hold the cookie.

Let's take a closer look at how this process works. In the InternetGetCookie call, note that the two middle parameters are undef values. Specifying undef tells Win32::API to use NULL pointers for these particular parameters. The NULL pointers, in turn, tell InternetGetCookie that I'm only interested in learning how large the cookie value will be.

If all goes well, InternetGetCookie returns a value of FALSE, which indicates that the function failed to fetch the cookie value. (This result is desired—you'll see why shortly.) Next, the code determines that the last generated Win32 error was ERROR_INSUFFICIENT_BUFFER, which has a value of 122. This error indicates that the function failed because the string buffer wasn't large enough.

Although the InternetGetCookie function failed, the $pdwSize variable now has the required buffer size. The script determines the buffer size by unpacking the unsigned long value that the function had set before it failed. Now that the script has the required buffer size, it creates a new string buffer by calling the Win32::API::Prototype's NewString() function. This result is why you want the first InternetGetCookie call to fail—its failure lets you discover how large the passed-in buffer must be.

Finally, the script calls the InternetGetCookie function again. However, this time the value of the third parameter is the cookie string buffer ($Cookie) instead of undef. (The second parameter still needs to be undef because WinInet doesn't support that parameter yet.) InternetGetCookie then obtains the requested cookie value and sets it to $Cookie.

Setting a Cookie
You can use Perl code to not only fetch but also set cookies, as the script SetCookie.pl in Listing 2 shows. To use this script, you use the command

perl SetCookie.pl Url
NewCookie Expire

where Url is the URL for which you want to change the cookie, NewCookie is the cookie value you want to set, and Expire is a string that specifies the expiration date for the cookie value in the format

"Wed, 09-May-2001 01:23:45 GMT"

Notice that the day and month are abbreviated so that they're both only three characters long. (The script's source code contains a list of the abbreviations.) Also notice that the time and date follow Greenwich Mean Time (GMT).

The NewCookie and Expire parameters are optional. SetCookie.pl performs different tasks based on whether you include none, one, or both of these parameters (more on this shortly).

Like GetCookie.pl, SetCookie.pl uses the ApiLink() function to link to WinInet. SetCookie.pl, however, links to the library's InternetSetCookie function, as the code at callout A in Listing 2 shows. I copied this function from the MSDN Online Web Workshop (http://msdn.microsoft.com/workshop/networking/wininet/reference/functions/internetsetcookie.asp).

The code at callout B in Listing 2 highlights the real meat in this script. This code determines whether you passed in an expiration date and a new cookie when you launched the script. Typically, you don't need to supply an expiration date. You supply an expiration date when you want the cookie to live for only a certain time period. If you pass in an expiration date, the code sets that date to $ExpireString and calls the InternetSetCookie function to set the cookie value. If you don't pass in an expiration date, the code decides what to do based on whether you passed in a cookie value.

If you passed in a cookie value but no expiration date, the code uses an arbitrary expiration date that's far in the future. This expiration date effectively guarantees that the cookie will be valid for quite some time. If you don't pass in a cookie value or an expiration date, the code assigns an expiration date of the end of the year 1969. This expiration date forces the current cookie to automatically expire—the equivalent of deleting the cookie.

Interestingly, if you pass in a cookie value and an expiration date of an empty string (""), the script sets the cookie as a session cookie. Session cookies are like temporary variables—after the script finishes, the need for them no longer exists, so the system removes them. If you were to write a script that connects to a Web server several times, the script can set a new session cookie so that the server receives it each time a request occurs. But after the script ends, the system removes the cookie from WinInet's cookie storage.

What's Next?
You might be wondering what cookies have to do with the Internet cache. After all, I did mention the cache earlier. Internet-cached Web pages, cookies, and a user's URL history are all maintained by the same WinInet database. Next month, I'll show you how to use Perl and WinInet to interact with this database so that you can clear the cache, prestuff it, or even track URL histories to determine whether users are visiting sites they shouldn't be accessing.