Are hidden name-resolution problems threatening your network clients' stability?

If you asked me, as a network consultant, to choose the most problematic area of Microsoft Windows networking, my unblinking answer would be name resolution. You can directly relate ailments such as sluggish performance, client computers' or applications' inability to connect to servers, incomplete browse lists, and obscure error messages to problems resolving computers' names to their IP addresses. To help you overcome these maladies, I offer a name-resolution primer, discuss some of the lesser-known causes behind name-resolution problems, and resolve a common name-resolution misunderstanding.

An Equal Opportunity Annoyer
In the Internet's DNS-driven namespace, name resolution is fairly straightforward. To resolve names, clients query DNS servers, which either provide the IP address of the host in question or inform the client that no such name exists—a generally transparent process. Not so with Windows-based networks' name-resolution process, which involves additional layers and services, including WINS servers and clients, broadcasting, and static name-to-IP address mapping files (i.e., HOSTS and LMHOSTS files). These additional name-resolution methods complicate the process and make diagnosing and resolving problems trickier. Before I jump into the complicated mechanics, I want to cover the fundamentals of name resolution in a TCP/IP environment.

Resolution Rudiments
Unlike protocols that rely exclusively on broadcasting (e.g., IPX/SPX, NetBEUI) to resolve names, TCP/IP in a Windows environment can also use point-to-point name server queries. Point-to-point queries are preferable because they're more reliable, can easily cross routers, and use less network bandwidth than broadcast queries. (A point-to-point query is like a friend calling you on the telephone to ask a question, whereas broadcasting is like a friend standing in front of your house yelling into a megaphone to ask a question. In the former case, the communication involves only your friend and you; in the latter case, all your neighbors become involved.)

The Four Node Types
Clients of all Windows OSs—except Windows 2000 (Win2K)—require the presence of the NetBIOS session layer with a network transport protocol. (Win2K doesn't rely exclusively on NetBIOS to support networking activities: Win2K's TCP/IP implementation can use DNS to locate network servers and services.) So, the OSs' TCP/IP protocol stacks support a modified form of TCP/IP called NetBIOS over TCP/IP (NetBT). Windows clients that use NetBT (i.e., WINS clients) can attempt name resolution by using one of four NetBT node types: b-node, p-node, m-node, and h-node.

B-node. B-node clients use broadcasts to register their names and resolve the names of other machines on the network. B-node is undesirable for most networks because IP routers typically can't forward broadcasts, so b-node limits a client's scope of resolution to one IP subnet (i.e., network segment). And because b-node relies on network-saturating broadcasts, it uses more bandwidth than point-to-point methods. B-node is the default for Windows clients that don't have a configured NetBIOS Name Server (NBNS—the generic name for a WINS server, which is typically a Win2K- or Windows NT-based server). NT also supports a slightly altered form of b-node called modified b-node. A modified-b-node client first checks its local name cache for a name-to-address mapping. If the client doesn't find the address in memory, it resorts to broadcasting. If neither of those methods works, the client checks the local LMHOSTS file (if it exists) in the \%systemroot%\system32\drivers\etc folder on Win2K or NT systems or in the Windows folder on Windows 9x systems.

P-node. P-node clients register their names with a WINS server and use that server to resolve names. P-node has a major drawback: If the server can't resolve a name or if you've configured the client with incorrect name-server addresses, the client can't log on to the network because it can't discover the name of a domain controller to process the logon request.

M-node. M-node resolution is one of two hybrid node types. When attempting to resolve names, m-node clients use b-node-style broadcasts then point-to-point WINS server (i.e., p-node) queries, in that order. M-node is optimal for WAN subnets that have no local name server and connect to the WAN by low-bandwidth links. In these cases, using broadcasts to resolve names helps minimize traffic on the WAN links.

H-node. H-node resolution is the default for Windows clients that you've configured (either manually or with information from a DHCP server) to register with one or more WINS servers. H-node is essentially the opposite of m-node: Clients register their names directly with WINS servers and use these servers first when attempting to resolve names. If the name server queries fail, the client then uses b-node-style broadcasts.

How do you determine a WINS client's node type? By default, a client uses b-node unless you configure it to register with at least one WINS server, in which case the client will use h-node. However, you can override this behavior. For example, you can use a DHCP scope option (i.e., a configuration parameter that passes to a DHCP client) to configure clients to use a specific node type. (You can also use a Registry entry to manually configure the WINS node type setting on most Windows versions. However, I don't recommend this method because the manual Registry configuration overrides even a dynamic DHCP-assigned node type, making this method less flexible.)

Because a majority of NT networks use WINS servers, a majority of Windows-based LAN clients on these networks use h-node resolution. If neither point-to-point nor broadcasting is successful, h-node WINS clients then use other methods. H-node uses the following progression to attempt name resolution:

  1. Check whether the queried name belongs to the local machine.
  2. Check the local name cache, which by default retains resolved names for 10 minutes.
  3. Direct a point-to-point query to the configured WINS server or servers.
  4. Use a broadcast query.
  5. Use a local LMHOSTS file, if one exists.
  6. Use a local HOSTS file, if one exists. (A HOSTS file resides in the \%systemroot%\system32\drivers\etc folder on Win2K or NT systems or in the Windows folder on Win9x systems.)
  7. Issue a point-to-point query to any configured DNS server or servers. In NT, h-node WINS clients will attempt this step only when you select the Use DNS for Windows name resolution check box in the TCP/IP Configuration dialog box. This setting maps to the EnableDNS Registry subkey of type REG_DWORD under the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NetBT\Parameters subkey in NT or the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VXD\MSTCP subkey in Win9x. Despite its name, the EnableDNS subkey controls only whether a client uses DNS servers during NetBT name resolution and doesn't affect the use of DNS for other purposes (e.g., resolving Web site names in a browser). By default, NT disables the setting (i.e., sets the value to 0), whereas Win9x enables the setting (i.e., sets the value to 1).

Now you understand some of the basics. But you still need to examine why name resolution can cause networking problems.

DNS Where You Least Expect It
DNS plays a much larger role in name resolution than the previous method order implies. H-node WINS clients query DNS servers only after all other name-resolution attempts have failed and only if you've enabled the client to use DNS for Windows name resolution. However, h-node WINS clients attempt to use a DNS server as if it were a WINS server: The client uses a WINS-style NetBIOS name query, rather than a DNS-client name query, to send a name-resolution request to the DNS server. As a result, unless your DNS server happens to be your WINS server and has a registration in its database for the queried name, this h-node attempt is likely to fail.

The NetBT h-node resolution process isn't the only time that clients query DNS servers during a name-resolution attempt. If you've enabled DNS on a client and configured the client to use one or more DNS servers, the client will use DNS client (aka Domain Name Resolver—DNR) queries through the standard UDP port 53 to send name queries to DNS servers. The client will send these DNS-centric queries after checking for a local HOSTS file and before any NetBT name-resolution attempts. To create a Fully Qualified Domain Name (FQDN—for example, myserver.mybiz.com) for the query, the client will append any default names that you've configured in the DNS section of its TCP/IP configuration dialog box to the queried name. (I used a network-monitoring tool to observe this behavior. For information about how to view client name resolution and other network traffic, see the sidebar "Getting Under the Hood with a Network Monitor.") The following is a complete and accurate order in which Windows LAN clients attempt to resolve names:

  1. Check whether the queried name belongs to the local machine.
  2. Use a local HOSTS file if one exists.
  3. Query any configured DNS servers using DNR queries on UDP port 53.
  4. Use the appropriate NetBT node type resolution method, according to the client's NetBT node type (as I described earlier).

This list paints quite a different picture from the previous list. Most administrators I talk to are surprised to discover that Windows LAN clients consistently query DNS servers before they query WINS servers. (This misunderstanding is common because most Microsoft documentation and training materials that discuss NetBT resolution order neglect to mention these preliminary DNS queries.) Your WINS-enabled LAN clients are probably making an exorbitant number of failed DNS name-resolution queries on a regular basis. A DNS server that doesn't have the information simply responds that it can't resolve the queried name; this acknowledgment doesn't take much time, so users seldom notice any delay. However, if the client can't reach the DNS server or servers for some reason (e.g., you've misconfigured the server addresses, the servers aren't reachable from the client's IP address, the servers are down), the client must wait for successive DNS server queries to time out before it can attempt an NetBT method. Depending on the client OS and service pack revision level, this wait can range from 5 seconds to a whopping 2 minutes.

PPTP and Other Problems
DNS-first behavior can prove problematic in several situations. For example, remote laptop clients with Ethernet connections and configured DNS-server addresses might experience long delays when they try to connect to Registry-cached DNS server addresses configured through the primary connection (e.g., Ethernet connection, initial RAS modem connection). This situation can occur when the clients access a corporate LAN over a dial-up connection, but the corporate LAN's configuration doesn't permit access to the DNS servers configured on the primary connection. As a result, the client has to wait for each DNS server query to time out. These timeout periods are longer than WINS-associated timeouts because they take into account that the client might be connecting over the Internet—an unpredictable network compared with the private LAN or WAN that houses a WINS server. This problem occurs even with a RAS client that inherits reachable DNS servers from a RAS server because the RAS client places these additions at the end of its DNS server list and tries them only after it tries the original (i.e., unreachable) DNS servers.

RAS clients that use PPTP to dial in to a corporate VPN also experience the DNS-delay problem. (For more information about RAS name resolution, see "Related Articles in Previous Issues," page 99.) By default, RAS uses the most recent connection as the client's default gateway. (RAS makes the most recent connection's default gateway-route metric lower than the previous connection's metric, as Screen 1 shows.) But PPTP connections are by nature secondary connections. As a result, the PPTP connection effectively orphans the primary (i.e., RAS) connection because all nonlocal traffic travels over the secondary (i.e., PPTP) connection by default. Unless the PPTP connection serves a network that permits the client to connect to the DNS servers that you configured through the primary connection, the client can't access those servers. However, the client doesn't know about this limitation, and the result is a significant slowdown on the client system as the client tries each DNS server and times out. Most DNS-enabled clients try to reach each configured DNS server four times before giving up. In a worst-case scenario, this timeout can cause a 2-minute delay.

A potential workaround for this problem is to clear the Use default gateway on remote network check box for the PPTP connection in the DUN client's phonebook entry. Only traffic destined for the PPTP-served network will pass over the PPTP DUN connection, and other traffic, such as DNS queries to Internet-based DNS servers, will pass successfully over the primary connection. This approach works well when the client connects to a single-LAN network. However, if the client uses a PPTP connection to connect to a WAN and needs to contact a machine that is across a router from the WAN's RAS server, this method won't work because the client uses the default gateway on its primary connection instead of the default gateway on its PPTP connection.

Related Articles in Previous Issues
You can obtain the following articles from Windows 2000 Magazine's Web site at http://www.win2000mag.com/articles.

DOMAIN NAME RESOLUTION: MICHAEL D. REILLY
Getting Started with NT, "Domain Name Resolution with DNS," June 1999, InstantDoc ID 5408
Getting Started with NT, "Implementing WINS," May 1999, InstantDoc ID 5212

RAS NAME RESOLUTION: SEAN DAILY
Watch Your RAS, "WINS Weirdness Strikes Again," April 2000, InstantDoc ID 8317
Watch Your RAS, "Need a Name-Resolution Solution?" September 1999, InstantDoc ID 7115
Watch Your RAS, "The Case of the Nonbrowsing RAS Client," May 1999, InstantDoc ID 5215
Watch Your RAS, "WINS and RAS Foibles," April 1999, InstantDoc ID 5079
Exceptions to the Rule
While researching the DNS-delay problem, I discovered that the delays didn't occur consistently on all Windows clients. To find out why, I conducted an extensive battery of tests, using every version of Windows from Win95 to Win2K. I found that the long DNS timeouts were a problem with all versions of Windows except Win2K, NT 4.0 Service Pack 4 (SP4) and later, and Windows 98 Second Edition (Win98SE). As it turns out, Microsoft made significant changes in these versions to the way that clients perform DNS queries in the TCP/IP stacks. Specifically, Microsoft changed the algorithm that the DNR component uses to query DNS servers, thus effectively shortening the maximum possible timeout in worst-case scenarios. This change reduces client name-resolution times from 120 seconds to 19 seconds, with an average of 5 seconds in most cases (the exact number of seconds depends on the number of configured DNS servers and several other factors). Although the client still queries DNS servers first, the modifications effectively render the problem moot in most situations.

But Wait—There's More
Understanding name-resolution methods and how DNS queries can cause problems for your clients is the first step on the road to solving name-resolution problems. But if neither of the earlier workarounds applies to your situation, you'll need to consider other options. In part 2, I'll show you how to take control of name resolution and improve the stability and performance of your network.