As the e-commerce industry continues to grow, more businesses rely on their Web sites to communicate with customers. A high-performance Web site that quickly and reliably delivers content gains and retains customers and is crucial to a successful and competitive e-business. Few potential customers will return to a frustratingly slow Web site if the customer experiences significant delays or failure. Thus, as part of your organization's Web infrastructure planning and implementation, you need to seriously consider how to improve your Web site's performance.
You can use several methods to improve Web performance: expand Internet bandwidth, use fast network equipment, design efficient Web applications, optimize and upgrade Web server software and hardware, and use Web-caching technology. (For information about Web caching, see "Surfing Web-Caching Technology, Part 1," September 1999 and "Surfing Web-Caching Technology, Part 2," October 1999.) In addition to these options, you can improve your Web site's performance by adding Web servers and sites and mirroring content across all servers and sites. This method lets you share the overall load among servers and sites and reduce the information turnaround time involved in a server's internal processing of client requests. In addition, you can preserve your existing servers rather than retire them to make way for new servers.
Load sharing, or balancing, on multiple servers ensures that Web traffic doesn't overload one server while other servers sit idle. To load balance Web servers, traditionally you use the DNS round-robin feature to evenly distribute Web server IP addresses to clients; thus, your Web servers are equally accessible. However, this mechanism doesn't provide load balancing in an environment in which the Web servers have different hardware and software capacities. For example, a Windows 2000 Server (Win2K Server) system with two 450MHz Pentium III processors and 1GB of memory should handle more load in a load-balanced environment than a Windows NT Server system with one 300MHz Pentium II processor and 256MB of memory. However, DNS's round-robin feature treats these two systems equally; it doesn't know a Web server's availability because the round-robin feature doesn't detect whether the server is up or down. (For more information about the DNS round-robin feature's limitations, see Douglas Toombs "Load Sharing for Your NT Web Server," April 1998.)
Recently, vendors developed load balancers, which are products that balance load evenly across multiple servers. In addition, a load balancer ensures Web servers' fault tolerance by redirecting traffic and clients to another server or site in case of failure. Therefore, clients experience fewer delays and no failures. You can use load balancers in single Web site and multiple Web site scenarios. Knowing what a load balancer is and how it works will help you identify important features to consider and evaluate when choosing a load balancer.
What Is a Load Balancer?
A Web server load balancer is a tool that directs a client to the least busy or most appropriate Web server among several servers that contain mirrored contents. The client transparently accesses this set of servers as one virtual server. For example, suppose you have two Web servers in a single Web site scenario: web1.acme.com with the IP address 188.8.131.52 and web2.acme.com with the IP address 184.108.40.206, as Figure 1 shows. The load balancer uses a virtual host name (e.g., www.acme.com) and virtual IP (VIP) address (e.g., 220.127.116.11) to represent the Web site. You associate the virtual host name and the corresponding VIP address with the two Web servers by publishing the virtual host name and its VIP address in your DNS server. The load balancer constantly monitors the load and availability of each Web server. When a client accesses www.acme.com, the request goes to the load balancer instead of a Web server. Based on the load of each monitored server and conditions and policies you've defined, the load balancer decides which server should receive the request. The load balancer then redirects the request from the client to the server and usually redirects the reply from the server to the client depending on the implementation.
Load balancers can also support load balancing across multiple Web sites. Implementing multiple sites places mirrored Web servers close to customers and reduces delay between your Web site and customers. In addition, multiple Web sites provide load balancing, high availability, and fault tolerance in case of a complete site failure (e.g., a power or Internet connection outage at a data center). In a multisite scenario, which Figure 2 shows, every load balancer at every site has the same virtual host name but a different VIP address. For example, load balancer 1 at site 1 in New York has the virtual host name www.acme.com and VIP address 18.104.22.168. Load balancer 2 at site 2 in Los Angeles has the same virtual host name but a different VIP address—22.214.171.124. You associate each load balancer with its local Web servers using the same method you use in a single-site scenario. In addition to monitoring the load of local servers, the load balancers exchange site configuration and load information and check site availability with load balancers at other sites. Thus, each load balancer has the global load and availability information locally. Load balancers in multisite scenarios also often work as the DNS server for the virtual host name. When a load balancer receives a DNS lookup from a client for the virtual host name, the load balancer returns to the client the VIP address of the best site according to the current site load, client proximity, and other conditions. The client then transparently accesses that site.
Three types of load balancers exist: hardware appliances, network switches, and software. A hardware appliance-based load balancer is a closed box that is usually an Intel-based machine running a vendor's load-balancing software and a UNIX or proprietary OS. Hardware appliance-based load balancers provide a Plug and Play (PnP) solution for Web administrators. A network switch-based load balancer uses a Layer2 or Layer3 switch to integrate the load-balancing service. This device doesn't require an add-on box between the switch and the Web servers, but the appliance-based load balancer requires an add-on box. A software-based load balancer doesn't require you to modify network connectivity or equipment when you introduce the load-balancing service to your Web server farm. You can install the software on existing Web servers or dedicated load-balancing servers. (For a list of Web server load balancer vendors and products, see "Web Server Load Balancer Resources," page 70. The sidebar "Microsoft's Load-Balancing Services," explores Microsoft's load-balancing solutions.)
Regardless of which product category a load balancer belongs to, it fulfills the following three functions: monitoring server load and health, selecting the right server for a client, and redirecting traffic between the client and server. Let's look at each of these functions and learn how load balancers implement them.
A load balancer constantly monitors the load and health of managed Web servers so that it can use the load and health information to select the best available server to respond to a client request. Load balancers use two methods to monitor servers: external monitoring and internal monitoring.
To externally monitor a server, the load balancer calculates a server's response time by inputting a request to the server and waiting for a response. Using an Internet Control Message Protocol (ICMP) ping is the simplest way for a load balancer to externally monitor a server. An ICMP ping tests a server's availability and the round-trip time between the server and load balancer. If the load balancer doesn't receive a response from the server after several consecutive pings, the load balancer assumes that the server isn't available. Administrators usually connect Web servers directly to the load balancer, so if the round-trip response time is long, the load balancer knows that the server is very busy.
However, the ICMP ping tests only a server's IP stack but can't monitor the health of the TCP stack that HTTP uses. To verify that a server's TCP stack works, the load balancer attempts to establish a TCP connection, which requires a three-way handshake, with the server. In a three-way handshake, the load balancer sends the server a TCP packet with the SYN bit set to 1. If the load balancer receives back from the server a TCP packet with the SYN bit set to 1 and the ACK bit set to 1, the load balancer sends another TCP packet with the SYN bit set to 0 and the ACK bit set to 1. A completed handshake means that the server's TCP stack is healthy. After completing the handshake, the load balancer immediately drops the connection to save server resources. The load balancer can estimate a server's TCP connection performance based on the time a completed three-way handshake takes to complete.
In addition to testing the protocol stacks, a sophisticated load balancer can monitor the response time and availability of a Web server and its applications by making an HTTP request for content or a URL. For example, suppose web1.acme.com's home page is index.htm. The load balancer in Figure 1 can initiate an HTTP Get command asking for the content of index.htm on web1.acme.com. If the load balancer receives from the Web server a return code of 200, the home page on web1.acme.com is available. The load balancer measures the response time by measuring the time between sending the content request and receiving the return code.
Although external monitoring lets you ascertain useful information, it provides limited or no information about several important aspects of a server's status, including CPU, memory, system bus, I/O bus, NIC, and other system and application resources. Only internal monitoring can provide such detailed server load information. To internally monitor a server, the load balancer uses an internal-monitoring agent, which physically resides in each server, monitors a server's status, and reports the status to the load balancer. Some vendors provide scripting tools that let you write internal monitoring utilities for your Web applications. Internal monitoring is common for software-based load balancers, but few appliance- and switch-based load balancers use internal monitoring.
A load balancer can use the information from externally and internally monitoring a server to select which server is best for handling a client request. If all servers have the same hardware and software capacity, you can configure the load balancer to use a round-robin system to select a server based on the servers' status. However, if a load balancer manages a server with a Pentium III processor and a server with a Pentium Pro processor, you can configure the load balancer to redirect more traffic to the more powerful server. This setup is a weighted round-robin configuration.
A sophisticated load balancer lets you specify a custom policy of server selection. For example, you can configure the policy to include CPU utilization, memory utilization, number of open TCP connections, and number of packets transferred on a server's NIC. Your load balancer's load formula might look like (10 ¥ CPU utilization) + (3 ¥ memory utilization) + (6 ¥ the number of open TCP connections) + (3 ¥ the number of transferred packets) = a server's load. When it receives a client request, the load balancer calculates the load for each server according to the formula and redirects the request to the server with the lightest load.
In some cases, after the load balancer assigns a server to a client and the server and client make an initial connection, an application requires the load balancer to persistently send the client's traffic to that server. This connection is a persistent or sticky connection. For example, a user is shopping in an online bookstore and puts three books in the shopping cart. If the server that processes that client's request caches the shopping cart information locally, the load balancer can't switch the client's new traffic to another server even if the load becomes unbalanced across a site's servers. Otherwise, the three books in the client's shopping cart will be lost because the new server doesn't have the client's shopping cart information. Therefore, the load balancer must remember which client is accessing which server for a certain amount of time that you define based on your customers' behavior and applications. If you enable a load balancer's persistent feature, this feature will always override other load-balancing policies.
The key to maintaining a persistent connection is to find out a client's identity and bind this identity to a destination server. The load balancer usually uses the client's source IP address as the client's identity. However, the client's source address might not be the client's real IP address. Many companies and ISPs use proxy servers to control Web traffic and hide their users' IP addresses. Thus, if 500 clients access your Web site from America Online (AOL) and 10 clients access your Web site from another company, the server load will be unbalanced because the load balancer will bind all 500 AOL clients that have the same source address to one server and the other 10 clients to another server. To overcome this disadvantage, a load balancer that supports source IP address and TCP port number binding can distinguish clients even if the clients are using the same proxy server. The load balancer can make this distinction because each TCP connection has a unique source IP address and TCP port number. Another way to identify a client if the client is using a secure HTTP session is to monitor a Secure Sockets Layer (SSL) session ID. The SSL protocol assigns an ID to an established SSL session, and online shopping applications often use SSL. The most recent innovation to support a persistent connection is the Web cookie, which contains a client's identity and other information, such as which server the client last accessed. By examining Web cookies' content, a load balancer can better identify clients and select the appropriate server for them. Cookie-aware load balancer vendors include Alteon WebSystems, ArrowPoint Communications, F5 Networks, and Resonate.
In another server-selection method, immediate binding, load balancers can choose a server for a client and send the client to the server as soon as the load balancer receives the client's TCP SYN packet. A load balancer bases the server selection on server load-balancing policies and the IP address and TCP port numbers in the client's TCP SYN packet. Although this method is fast, a load balancer doesn't have time to ascertain other information, such as the SSL session ID, cookie, URL, or application data. To learn more about the client and make a better decision, the load balancer needs time to peek into application-layer information. In the delayed-binding method of server selection, the load balancer waits to make a server selection until the TCP three-way handshake is complete and the load balancer and client establish a connection. The load balancer becomes content-aware by examining the application-layer information before selecting a server.
A load balancer can use several methods to redirect client traffic to the chosen server: media access control (MAC) address translation (MAT), Network Address Translation (NAT), or, for delayed binding, a TCP gateway mechanism. Let's explore how load balancers use each method to redirect traffic.
MAT. A load balancer that uses this method requires each Web server to use the load balancer's VIP address as a loopback interface address, in addition to the Web server's physical IP address. When the load balancer receives a client packet and makes a server selection, the load balancer replaces the destination MAC address in the client packet with the chosen server's MAC address and sends the packet to the server. The packet contains the client's IP address, so to directly reply to the client, the server uses the original client IP address as the destination IP address. However, the server uses the load balancer's VIP address as the source IP address, as if the traffic to the client is from the load balancer. In this way, the client's next packet goes to the load balancer rather than to the server that replied to the client.
NAT. Using the NAT method, a load balancer substitutes a received client packet's destination address (i.e., the load balancer's VIP address) for the chosen server's IP address and the source IP address for the load balancer's VIP address before the load balancer redirects the packet to the chosen server. When the load balancer redirects a server packet to the client, the load balancer replaces the destination IP address with the client's IP address and the source IP address with the load balancer's VIP address. This method hides the Web server's IP addresses from clients, so the Web servers can use any IP addresses, including private addresses. The Web servers don't need to directly connect to the load balancer (i.e., use the same LAN segment) as long as the servers and the load balancer can reach one another through a static-routing or network-routing protocol.
TCP gateway. For immediate binding, load balancers can use the MAT or NAT method to redirect traffic at Layer2 or Layer3. However, for delayed binding, load balancers have to redirect traffic at the TCP layer and above. For delayed binding, the load balancer and client establish a TCP connection so that the load balancer can receive application data before it makes a server selection. Next, the load balancer sets up a TCP connection with the chosen server and passes the client request to the server through this connection. The load balancer then passes the server's response to the client through the load balancer and client TCP connection. This function is referred to as a TCP gateway. Resonate implements this function in its load balancer product through an agent on the server that permits a direct TCP connection between the client and the server that is acting as the load balancer. The vendor calls this implementation TCP connection hop.
|WEB SERVER LOAD BALANCER RESOURCES|
| HARDWARE APPLIANCES:|
BIG/ip and 3DNS
F5 Networks * 206-505-0800 or 888-882-4447
LocalDirector and DistributedDirector
Cisco Systems * 800-553-6387
Web Server Director product family
RADWARE * 888-234-5763
ACEdirector, Alteon 180, and Alteon 700 Series
Alteon WebSystems * 408-360-5500 or 888-258-3661
CS-100 and CS-800
ArrowPoint Communications * 978-206-3000
Foundry Networks * 408-586-1700
Central Dispatch and Global Dispatch
Resonate * 408-548-5500
Lightspeed Systems * 661-324-4291
Windows NT Load Balancing Service, Network Load Balancing, Application Center Server
Microsoft * 425-882-8080
"The 2000 Internet Traffic Management Report"
Internet Research Group
"Virtual Resource Management: Key Technologies, Tricks of the Trade, and Application Requirements" and "Virtual Resource Management: Which Vendor is Right For You?"
Acuitive * 925-456-3210
In a multiple-mirrored site scenario, the load balancer (aka the global load balancer) uses the same server-selection mechanisms as in a single-site scenario to choose the best site for a client. In addition, a global load balancer can use client proximity (i.e., network hops and network latency) between the site and the client as an element in site selection. To make this selection, the load balancer often uses an intelligent DNS function to redirect the client traffic to the appropriate site.
For example, www.acme.com has two sites, one load balancer in New York and one in Los Angeles, that work as DNS servers for www.acme.com. The authoritative DNS server for the Internet domain acme.com provides name resolution for FTP, mail, and other Internet servers and hosts. You can delegate the subdomain www.acme.com of the acme.com Internet domain to each load balancer; these load balancers become name servers for www.acme.com. To set up this configuration, define a DNS entry of www.acme.com in each load balancer and map the entry to the load balancer's local VIP address. The two global load balancers exchange configuration and load information, so both load balancers are aware that two VIP addresses (i.e., two sites) exist for www.acme.com. Thus, they know the load and availability of each site.
As Figure 3 shows, when a client at AOL tries to access www.acme.com, the client requests that AOL's local DNS server look up the IP address of the host name www.acme.com. If AOL's local DNS server doesn't have cached information about the requested host IP address, the server sends the request to acme.com's authoritative DNS server. Acme.com's DNS server delegated www.acme.com to two load balancers, so acme.com returns to AOL's local DNS server the two load balancer's IP addresses as www.acme.com's name server. (In Figure 3, I used a separate box to highlight the intelligent DNS server service. Some vendors implement this technology in a separate server.) AOL's local DNS server then sends the DNS lookup request to one of the two load balancers. The two load balancers are name servers, so AOL's local DNS server will resend the request to the other server if the first one doesn't respond. The load balancer returns to AOL's local DNS server a VIP address based on the site load-balancing criteria. After the client receives a VIP address for www.acme.com from AOL's local DNS server, the client sends the HTTP traffic to the load balancer of the chosen site (e.g., New York). The load balancer in New York then selects the local server for the client. Because the local DNS server caches a resolved DNS record according to the record's Time to Live (TTL) value, most vendors suggest that you keep the TTL value of a VIP low so that clients can quickly receive a new VIP address and switch to another available site.
Alternatively, load balancers can use HTTP redirection for global site selection and traffic redirection. This method doesn't use the load balancer's DNS function. Instead, following the www.acme.com example, you define in your authoritative acme.com DNS server the www.acme.com DNS record and its VIP addresses. When a client resolves www.acme.com and sends the HTTP request to a load balancer, the load balancer chooses the best site for the client. If the chosen site isn't remote, the load balancer sends an HTTP redirection command to the client's browser, which accesses that site. This method lets the load balancer learn more about the client (e.g., the client's IP address) before the load balancer makes a site selection. However, the client might try to use a returned VIP address from the DNS server to access a failed site.
In addition to dynamically assigning a site to a client, load balancers can use a static mapping method to bind a specific client to a specific site. For example, suppose you have a mirrored Web site in Europe. You want European clients to access only the European site unless the site is down and the load balancer fails over the European traffic to your US site. In the load balancer, you can statically define that a request from a European IP address goes to the European site first. (To configure this setup, you must manually enter the European IP address blocks in the load balancer.) When the load balancer sees a European address, it redirects the traffic to the European site before it applies other rules.
Load Balancer Redundancy
A load balancer has the potential to become a single point of failure in a Web site because it serves as a front end for the back-end Web servers. When you design and implement a load-balancing solution, consider the load balancer's fault tolerance and choose a fast load balancer for good performance. You can choose between the two types of load-balancer redundancy: active-and-standby and active-and-active. Both methods use two load balancers at one site.
In the active-and-standby method, a backup load balancer constantly monitors the primary load balancer. When the primary load balancer is unavailable, the backup load balancer takes over the function of the primary load balancer (i.e., the backup load balancer handles traffic). When the primary load balancer comes back online, the backup load balancer transfers traffic to the primary load balancer and returns to standby mode.
In the active-and-active setup, both load balancers serve traffic and back each other up. For example, suppose you have four Web servers at a site. The first load balancer serves two Web servers, and the second load balancer serves the other two servers. When one load balancer is down, the other load balancer serves all four Web servers. This method fully utilizes load balancer resources and improves performance.
Balance Your Environment
Web hosting and e-services companies are not the only organizations that are using load balancers to direct traffic and maintain order. Many companies have adopted load balancers for their Web sites to improve Web performance and availability. Through their ability to monitor server load and health, select the best available server for clients, and redirect traffic in local sites and global environments, load balancers have become an important avenue to meet the demands of the competitive e-business market.