ROUND ROBIN can boost a hard-working Web server's performance while EASING its burden

A commercial I saw on TV the other day amused me. In an IBM e-Business commercial, two network engineer-types become overwhelmed with flame email because their Web server is too slow. Their conversation goes something like this:
Engineer #1: We should have upgraded the server.
Engineer #2: We can't. It's not scalable.
Engineer #1: Well, what should we do?
\[Dramatic pause\]
Engineer #2: Lock the door.

Although I'm sure IBM delivers good solutions, Big Blue is playing on people's fears in this commercial. Even if you buy a Web server that isn't scalable, you can boost its responsiveness by distributing the load across several systems. If you use basic technologies available in Windows NT, load distribution is easy.

Build a Better Mousetrap
Somebody once said, "If you build a better mousetrap, the world will beat a path to your door." Well, let's say the world consists of more than 30 million Internet users and your door is a single server on one IP address. With that setup, the world will blow your door off its hinges if everyone visits your site at once. However, if I must have a problem, I'd like it to be too much Web traffic.

What do you do if your Web server is overloaded? If you're not following the KISS (Keep It Simple, Sysadmin) principle with your Web server, start there: Pull off all non-Web-related services and put them on other servers. I am amazed at the number of newsgroup postings from people who are running a Primary Domain Controller (PDC) with Exchange Server, Proxy Server, SQL Server, Internet Information Server (IIS), Remote Access Service (RAS), Dynamic Host Configuration Protocol (DHCP), Windows Internet Naming Service (WINS), and Domain Name System (DNS) on one server with 64MB of RAM and wondering why they get lackluster performance. One great thing about NT and its domain-model infrastructure is that you can spread out tasks among different servers in the domain. By all means, take advantage of this capability.

What if you've stripped your Web server to the bare bones and you're still having performance problems? You must either scale your server or distribute your traffic load to lighten the burden on the server (for an example of how Windows NT Magazine's Web master handles distribution load on the magazine's Web server, see T.J. Harty, "Web Structure and Infrastructure," November 1997).

You don't need to consider high-end solutions such as clustering to accomplish basic load sharing. In this article I'll show you how to build load sharing into your system for just the cost of an additional server. Purchase a configuration that is identical to your existing Web server and distribute the load of your Web traffic evenly to give your second server about the same power and capacity as your first.

Round Robin DNS
If your Web site is configured the way most Web sites are, it probably looks like the setup in Figure 1, page 192, with one server handling all the site visitors. This common configuration works well initially, but as the site grows, the server starts to strain under the load of increased traffic.

You can't simply build another Web server that is identical to your first server because your two machines can't share the same IP address. Suppose your company has a DNS entry for www.yourcompany.com that is pointing to an IP address of 10.0.0.2. Making a new entry called www1.yourcompany.com and then pointing it to your new server at IP address 10.0.0.4 won't do you much good. What you need is one DNS name that points to two (or more) systems.

NT's DNS service can give you this capability. When you use DNS's round-robin entry function, you can list multiple IP addresses for any given site. The round-robin system lets groups of similarly configured systems act with clusterlike behavior without requiring significant technological investment or system retooling. DNS makes this functionality possible by keeping records on multiple systems that exist under the same host name but carry different IP addresses.

Here's how round robin works. When the DNS server receives a query for a given host name (e.g., www), it responds with all the available A (signifying Address) record addresses in sequence. When the server receives a second query for the same host name, the server shuffles the addresses before it replies: The first address from the first response moves to the end of the list, the second address from the first response becomes the first address in the second response, and all the other addresses move down the list in order. Suppose you have three Web servers, such as those shown in Figure 2, with the IP addresses 10.0.0.2, 10.0.0.4, and 10.0.0.6. If you define all these systems as A records with a host name of www, your DNS server responds to an initial query with all three addresses in the order listed in Figure 3. When the DNS server receives a second query for the same system, it shuffles the addresses, responding as shown in Figure 4.

Microsoft's white paper, "DNS and Microsoft Windows NT 4.0," states: "If you make a \[DNS\] query via some mechanism ... the DNS server will send both IP addresses back, but the client will always use the first one." I performed some informal testing with a packet sniffer and a few bogus DNS entries, and I discovered that the statement in Microsoft's white paper doesn't always hold true. True, some applications, such as PING, will use only the first IP address a DNS server gives to them. However, Internet Explorer (IEĀ­and presumably Netscape) attempts to use the first address it gets but will cycle through the remaining addresses if the first system isn't online or doesn't respond.

Although the ability of a client machine to cycle through IP addresses is a great fault-tolerant feature, keep in mind that this ability is a function of the client, not the server. Making round-robin DNS entries doesn't mean your system is fault tolerant. NT's DNS server doesn't know whether one of the systems assigned to a specific IP address is down--it will continue to hand out all the A record addresses it has and shuffle them with each additional request. If your client application (such as NT's command line-based FTP client) doesn't know how to deal with multiple addresses, and if the host listed in the first IP address of the DNS response isn't available, the client won't connect.

To set up a round-robin DNS entry for your Web site, enter two or more A record entries for the same host name within your site, and give them different IP addresses. Screen 1 shows an example of two Web servers, one at IP address 10.0.0.2 and another at address 10.0.0.4, that are both listed with the same www DNS entry.

If you're not using NT's DNS, you can probably still set up a round-robin configuration. Because round robin is defined in Request for Comments (RFC) 1794, it is a widely accepted function for most DNS servers. You can also use round robin if you use Berkeley Internet Name Domain (BIND) 4.9.3 or a later BIND version. Or, if your Internet Service Provider (ISP) provides name service for you, ask your ISP to set up a round-robin entry for your Web server.

What's in a CNAME?
Defining Canonical Name (CNAME) entries for Web servers is a common practice on the Internet. CNAMEs are often referred to as alias records, because they don't point directly to an IP address but to another fully qualified domain name (FQDN). It's not unusual to find configurations with Web servers defined with host names like www-server1, www-server2, and www-server3 and referenced by a CNAME record of www (Screen 2 shows this configuration). Microsoft recommends this configuration in "DNS and Microsoft Windows NT 4.0."

Although NT's DNS server uses round robin to shuffle CNAME records, it does so differently than it does for A records. NT's DNS server rotates through the available CNAME entries when it receives a query but transmits only one record at a time. Figure 5 and Figure 6, page 194, show this approach. The DNS server transmits the CNAME to the requester with the associated A record for the same system. In response to the first query, the server submits the record for www-server1 (as Figure 5 shows), and in response to the second query, it submits the record for www-server2 (as Figure 6 shows).

Through additional informal testing, I discovered that browsers don't behave in a fault-tolerant manner with CNAME records because they receive only one address in response to a request. For my test, I created a bogus www entry of 10.0.0.1 and then fired up a browser to search for it. When the browser detected that no Web server existed, it displayed the Internet Explorer cannot open the Internet site http://www.netarchitect.com error message. But when I used A records to create round-robin entries, the browser cycled through all the addresses in the list to resolve the Web page request. Because browsers do not react to CNAME records with fault tolerance, define your Web servers using A records.

Implementing Round Robin
Before you try the round-robin solution for your Web server, note the following considerations. When you implement round robin, you'll change the behavior of your system, and such a change brings subtle side effects.

Time to live (TTL). First, consider how DNS servers cache host entries. When a name server tries to resolve a client's DNS request, the server might have to send out several requests to find the DNS server that is authoritative for the host specified. To help keep Internet traffic down, DNS servers cache the answers to their requests for a host. Thus, every DNS record of a host carries a TTL value that dictates how long a DNS server can keep that record in the cache. When the TTL expires, the DNS server must flush the record and submit a new request the next time it receives a query for the host information the discarded record contained.

If the DNS server that caches your information supports round-robin entries, TTL won't be a problem. The caching DNS server will rotate through the entries for one address, just as your DNS server does. But if the caching DNS server doesn't support round-robin entries, it will give out the same single cached address for all requests until the TTL expires. After the TTL expires, the DNS server will resolve and cache the address again when it receives another request for it.

Microsoft sets a default TTL of 60 minutes for DNS servers. However, the DNS bible, DNS and BIND, by Paul Albitz and Cricket Liu (O'Reilly & Associates), does not recommend a 60-minute TTL value because it increases DNS traffic. You probably won't want to set a TTL value any smaller than 60 minutes. However, if your company has already implemented DNS and has set a very large TTL value (a week is the largest value DNS and BIND recommends), consider decreasing the value if you're going to implement round-robin entries. If you don't, and one of your systems goes offline, a DNS server that doesn't support round robin could cache the dead IP address and keep feeding it to clients for up to a week. Similarly, if you use CNAMEs, nonresponsive CNAME address records could sit in a DNS server's cache for a week.

Web logs. The second consideration is your log files. Because most Web servers store log files locally on the machine running the HTTP service process, your log files could end up distributed among multiple systems. Obviously, that situation can make analysis more difficult. Fortunately, most client systems stick with one IP address after they resolve the name of your site, so you don't have to worry about individual user sessions being fragmented throughout your log files. However, to get an accurate picture of your site's overall traffic, you must either point all your Web servers to a centralized log file, find out whether your statistics program can compensate for the multiple log files, or manually join the files before you analyze them. Modern log analysis tools, such as the products available from WebTrends (http://www.webtrends.com), suggest handling this problem by storing logs in an Open Database Connectivity (ODBC) data source that you can sort before the analysis.

Consistent data. A final consideration is the consistency of your data. You need consistent data across all the systems you define in a round-robin configuration: Users will be confused if they get varying results from your Web site every time they visit. However, how you organize data on your systems is important when you are working with Web servers. For example, file dates and times can be important in the interaction between your Web servers and your visitors, depending on how visitors' browsers are configured. Say you have two index.html files on two Web servers, and the files are similar except that the last modified dates don't match. If a Web visitor finds your first server, which has the index.html file with the older date on it, the visitor's browser will load this file as normal. However, if this visitor comes back later and tries to load the index.html page from your second server, the more recent date on the second index.html file will cause the browser to conclude that the entire page is new, and it will download the whole file again. Although this situation isn't critical in terms of the delivery of correct data, it can eat up bandwidth unnecessarily.

You can solve the problem of data inconsistency by creating a central HTML location and then copying data to all your other Web servers from that location. When you do so, the last modified dates for all your HTML, .jpg, and .gif files will be identical, and browsers that cache locally will help reduce your bandwidth requirements.

If It's Good Enough for Microsoft...
Overall, round-robin DNS entries are a great way to enhance Web server functionality at little or no cost. Microsoft uses round robin with its Web servers. The last time I checked, the company was running about 16 servers (including a server in Europe and a server in Japan), all of which responded to http://www.microsoft.com. When you consider that Microsoft's Web site gets over 55 million hits a day, it's surprising and impressive that a mere 16 systems can handle all that traffic.