HTTP 1.1 Support
The most recent version of HTTP improves performance significantly over
previous versions by supporting persistent connections. In the widely
implemented HTTP 1.0, retrieving multiple objects on a Web page (e.g., text,
graphics, audio clips) requires a separate TCP connection for each object. HTTP
1.1 improves on this activity through the use of persistent connections, in
which you can retrieve multiple objects with one TCP connection. This reduced
overhead can improve end-to-end performance dramatically (assuming that HTTP was
a bottleneck). A proxy has two connections, one from the Web client to the proxy
and another from the proxy to the Web server. Because the proxy is an active
participant in mapping between the two connections, Proxy Server must support
the new version of the protocol to reap the benefits.
Multiserver Configurations
Caching improves the performance the client sees, primarily by cutting down
on the number of requests the client needs to generate to servers on the
external network. Proxy Server 1.0 lets you run multiple proxies in an
enterprise, but it has no mechanism for coordinating the caching between them.
With multiple proxies, you often end up, over time, with multiple versions of
the same cached objects on different proxies. Also, Proxy Server offers no way
to intelligently share loading between the proxies. One proxy can be scrambling
to keep up with client requests while others stand idle.
Microsoft has responded to these problems with the Cache Array Routing
Protocol. CARP uses two types of intelligent routing--distributed and
hierarchical--between proxy servers. Distributed routing occurs between members
of a proxy server array; hierarchical routing occurs between proxy servers
configured in a chain.
An array is a group of proxy servers that you administer as one logical
entity. All members of the array keep an array membership list. Each proxy
server updates the list regularly to account for proxies coming online or going
down. Array members are peers and communicate with one another to cooperatively
service requests from clients. Proxy Server uses a hash, a common
algorithm in searching and sorting, to determine which member of the array
services the request. (For a discussion of hash functions, see Mark Minasi, "Windows
NT Logons," June 1997.)
Array members feed each combination of proxy server name and URL name into
the hash algorithm to generate a score. The highest score determines which proxy
server will service requests for that specific URL. Each proxy server runs the
algorithm and keeps scores in a hash table. The algorithm is deterministic--the
hash table entries are the same in all proxy servers, without their
communicating with one another. This scheme addresses a drawback of an earlier
cache routing scheme called the Internet Cache Protocol (ICP), which used a
query protocol between proxies to find a specific URL. Besides minimizing
protocol chatter between proxies, the hash scheme is good for load-balancing
because it has positive scalability. The more members in the array list, the
more evenly distributed the load.
A chain is a hierarchical grouping of proxies. A proxy server that is a
member of a chain forwards client requests that it can't service to the next
higher-level proxy in the chain. The downstream proxy in the chain is closest to
the client; the furthest upstream proxy is closest to the Internet. Requests
flow only upstream or among members of an array.
You can combine chains and arrays. In a chain, the upstream entity can be
one proxy server or an array. Downstream proxies can obtain a copy of the
upstream array list by polling. With the array list, downstream proxies can
create a hash table for an array to determine which member of the array needs to
respond to a request for a URL.
Figure 1 shows an example of proxies distributed between a branch office
and a corporate site. Clients in the branch access the Internet through Proxy
Server Z, over a leased line to the corporate net, then through the Proxy Array,
following these steps:
- The client requests a URL from Proxy Z.
- Proxy Z does not find the URL in its cache so it uses the hash function
to forward the request to Proxy A in the array CORP.
- Proxy A receives the URL request from Proxy Z. Proxy A checks its cache
and does not find the URL.
- Proxy A runs the hash function and determines that Proxy C is the proper
location for the URL. Proxy A then forwards the request to Proxy C.
- Proxy C finds the URL cached and returns a response to Proxy Z.
- Proxy Z returns the response to the client and caches the URL locally
for future use.
I have two observations about this example. In step 4, if Proxy C does not
have the URL cached, Proxy C looks for the URL on the Internet. Second, because
Proxy Z caches the URL, two copies are cached. You gain a performance advantage
because users in the branch now have a local copy cached, and they don't have to
chew up any more leased-line bandwidth to retrieve it from the corporate net. If
you implement chains properly, they can put the cache close to the users who
need it.
Multiserver Administration
You can add or remove array members via the Array property screen. Screen 3
shows two members of the CORP array, MSCPDC and WEBSTER. Although this example
uses NetBIOS names, you also can use fully qualified domain names and the DNS.
The system will propagate changes made here to other array members to keep them
in sync.
To configure chains, use the Routing tab on the Web Proxy Service
Properties screen, as shown in Screen 4. In the Upstream Routing section, the
last upstream Proxy Server selects Use direct connection. Downstream
proxies choose Use Web Proxy or array. Select Modify to get to Advanced
routing options (shown in Screen 5), where you can add the name of the next
upstream proxy. Note that an upstream proxy can be running Proxy Server 1.0 or a
third-party proxy gateway, because the downstream proxy is acting as a client
with respect to the upstream proxy. If the upstream proxy is an array, you can
automatically poll for the array configuration. From this dialog box, you can
also select proxy-to-proxy authentication for the chain; this choice requires an
account with Administrator privileges on the upstream machine.
You can configure a backup route from the Enable backup route
section of Screen 4. The fault-tolerance process is dynamic. The system uses the
backup if the primary route is down; but the system periodically polls the
primary and uses it again when it comes back up.
And That's Not All...
Proxy Server 2.0 has many improvements over the previous release. In
addition to the features I've discussed, Proxy Server 2.0 includes client
configuration scripts, server proxying, and domain filtering. You can also
extend the product via third-party applications that use the Internet Server API
(ISAPI). Some third-party enhancements already available are Trend Micro
InterScan Web Protect for virus scanning, Cyber Patrol Proxy for content
filtering, and Market Wave Hit List and TELEMATE.Net for reporting. Depending on
your situation, Proxy Server could fulfill a significant part of your needs for
secure Internet access.