An often-overlooked aspect of IIS administration is Web page caching. Sometimes, cached-page content can become a disadvantage to a dynamic Web site, affecting both page-load-time performance and the content that users view. Caching can cause difficult-to-isolate problems when clients try to read recent database updates or use shopping-cart applications. Selected items might disappear from a client's cart, or items previously purchased from another session might suddenly appear in the cart. Site log files might not show a server's actual traffic because a portion—or all—of the page visits are cached.
You have some control over what the system caches and how long content remains in the cache. (For information about basic cache settings in Microsoft Internet Explorer—IE—see the Web sidebar "Caching in IE," which you can access at http://www.windowswebsolutions.com, InstantDoc ID 24271.) However, although the ability to force users into a specific cache setting is beneficial, you can do so only in corporate installations, and users might still alter settings from the browser distribution package (although companies with Windows 2000 deployments can use Group Policies to enforce some control over browser settings). Corporate installations also complicate matters with proxy servers and firewalls. Further, on the Internet, you can't control the browser type and settings that your viewing audience might use.
You need to go beyond such basic caching configurations. To gain more control over what your system caches and what your users view, you can use cache metatags, HTTP headers, Active Server Pages (ASP) code, and cache-buster routines.
Metatags are optional informational HTML tags that reside in an HTML document's header section, which appears at the beginning of a document. Typically, an HTML tag identifies a Web page's contents (e.g., page description, keywords for search engines, copyright data). But you can use metatags to control caching behavior.
Although metatags are easy to implement, they're the least effective method of controlling caching. Typically, only the browser cache recognizes and honors metatags. Proxy servers rarely recognize them because the servers don't usually read a page's HTML content but simply cache the page for subsequent client hits.
Figure 1 shows some sample metatags. The "Pragma no-cache" tag prevents caching over Secure Sockets Layer (SSL) connections but doesn't prevent caching on a public or non-SSL connection. The "Expires -1" tag performs identically to a no-cache tag on a nonsecure connection and doesn't prevent caching. The system caches the page but marks the page to expire immediately.
In addition, using the "Pragma no-cache" tag in IE 4.0 is problematic—even over SSL connections—because IE 4.0 doesn't cache a page until 50 percent of the client's 64KB buffer is filled. Because the metatag appears at the top of an HTML page and the browser parses the page from top to bottom, the buffer isn't yet half full when the browser checks for the existence of a cached version of the page. The Microsoft article "'Pragma: No-cache' Tag May Not Prevent Page from Being Cached" (http://support.microsoft.com/default.aspx?scid=kb;en-us;q222064) describes a fix: Place another header section at the end of the HTML page that contains a "Pragma no-cache" header.
In general, however, using the "Pragma no-cache" tag is discouraged because, according to HTTP 1.1 and HTTP 1.0 specifications, this header is defined in the context of a request only—not a response. The "Pragma no-cache" tag is primarily intended for proxy servers and might prevent important requests from reaching the destination Web server. For more information about using HTTP headers to control caching, see the Microsoft article "HOWTO:Prevent Caching in Internet Explorer" (http://support.microsoft.com/default.aspx?scid=kb;en-us;q234067).
HTTP headers are more effective and offer more control than metatags. HTTP headers don't appear in the HTML <HEADER> or <BODY> code; rather, the Web server automatically generates HTTP headers (according to configured settings) and sends them before sending the HTML page. The HTTP header is visible to the client browser and all intermediate caches but isn't displayed. Figure 2, page 12, shows an example of a typical HTTP server-response header sent from a server to a client. Figure 3, page 12, shows an example of a client-request header sent to the server.
To control caching, you can use Pragma, Expires, or Cache-Control HTTP server-response headers. The HTTP specification for the Pragma header offers only loose guidelines, so most caches don't recognize this header type.
The Expires header is a more reliable means of controlling caching, and nearly all proxy servers and client browsers support it. The only value that appears in an Expires header is a date and time—for example
The time can be a relative time based on the client's most recent access time or a fixed date and hour. To configure an Expires header in IIS, right-click the Default Web Site in the Computer Management Console, select Properties, then click the HTTP Headers tab, which Figure 4 shows. Select the Enable Content Expiration check box and choose your expiration preferences. You can perform this configuration at the directory or file level.
Cache-Control server-response headers offer many more options than simply an absolute or relative time until expiration. Some of the primary attributes that you can use in a Cache-Control server-response header are
- max-age=seconds—The amount of time the system considers an object to be fresh.
- s-maxage=seconds—The same functionality as max-age but for proxy caches.
- private—Documents can be stored only in a nonshared cache, not a public or shared cache.
- public—Makes the response cacheable, even over a secure connection.
- no-cache—Forces a proxy server or client to submit a request to the server for validation before purging cached pages.
- no-store—Cache won't store any portion of the request or response.
- must-revalidate—Informs client cache to honor object freshness information in the header.
- proxy-revalidate—Same functionality as must-revalidate but for proxy caches.
An example of a Cache-Control server-response header is
The must-revalidate attribute speeds up the caching process by rendering the repeated download of entire pages or objects unnecessary. If the data's freshness is unknown, the browser can contact the server to determine whether an update has occurred. If no changes to the object have occurred, the browser uses the cached copy.
To implement Cache-Control headers, open the Default Web Site's Properties dialog box. Then, click the HTTP Headers tab and click Add to access the Add/Edit Custom HTTP Header dialog box. In the Custom Header Name field, enter Cache-Control. In the Custom Header Value field, enter a cache-control header attribute.
ASP Response Headers
You can place response headers at the top of ASP to control whether the system caches an object or how long an object remains fresh. The only shortcoming of these headers is that they display on screen as text in Netscape browsers. Table 1 shows some examples of ASP response headers and their functions. When you use ASP response headers, be sure to place the code before any HTML code or content-generating ASP code.
A cache buster is a piece of HTML or script code that prevents caching by appending a random number to a URL. This number fools the browser into believing it's accessing different URLs when it's actually accessing the same page. Cache busters are common on sites that have banner advertising because revenue comes from client page hits, and any caching would make visits undetectable.
Cache-buster code is most often in the form of a Common Gateway Interface (CGI), Perl, or other server-side script. A server-side cache buster is effective only if the client is already contacting the server for page updates. You can also implement client-side cache-busting code. If a client has already read a page that doesn't have client-side code, that page would have a static link and would be cached. A page that has a client-side cache-buster script is ineligible for caching, regardless of whether the server could be recontacted.
Two examples of simple server-side scripts that produce random numbers, then append the random segments to a URL, are
These two cache-buster examples come from "Cache No More by Phil Paxton" (http://www.learnasp.com/learn/cachenomore.asp). Figure 5 shows a sample cache-busting script.
I don't encourage using server-side cache busters because they don't guarantee the delivery of a fresh page to the client. Further, they'll fill up the client's cache with multiple copies of the same page, possibly exceeding the cache disk-space limit and busting the caching mechanism.
Scratching the Surface
I've covered only some of the caching factors that you need to consider. (For additional caching tips, see the Web sidebar "More Caching Tips," InstantDoc ID 24272.) I also haven't scratched the surface of application cache tuning. Remember that the better you understand your users, their average access speeds, the pages they view, and the browsers they use, the better cache-tuning decisions you can make.