With great DNS wisdom comes great troubleshooting capability
My network recently developed an intermittent DNS name-resolution problem. I'd rather do about a dozen other things with my time than hunt down name-resolution bugs, and unfortunately my DNS troubleshooting skills had grown rusty over time. DNS is easy to forget about when it's working like it's supposed to: Everything just works—from your browser, to your email client, to your mail server, to your domain controllers (DCs). It had been years since I'd even needed to think about DNS troubleshooting, so I looked at my current problem as an opportunity to brush up on my skills.
Because DNS has become the cornerstone of a properly functioning Active Directory (AD) environment, and because DNS is the glue that holds the Internet together, the ability to quickly spot and solve DNS problems on your network is essential. Let's take a look at the intricacies of DNS troubleshooting outside of Active Directory (AD), then take a look at the complexities that AD adds to the mix.
The entire DNS hierarchy is held together by a root domain, and this root domain is maintained by 13 separate servers around the world, managed by commercial, governmental, and educational organizations. Ultimately, these root servers are involved in the process of resolving all public Internet names. Suppose a workstation on your network is attempting to resolve the host name download.beta.example.com to an IP address. This process could take as many as 10 separate DNS messages to resolve, starting with the first message, which is the query from the workstation to the server configured as its DNS server. Figure 1 depicts a standard DNS name-resolution process.
As you can see in Figure 1, the local DNS server takes on the task of resolving the necessary IP address through recursive queries—one of the two types of DNS query messages, the other being iterative. Each public DNS server queried along the way will either give the final answer (if it knows it) or send a referral to the next best step in the recursion process. Because the root DNS servers obviously wouldn't know about an individual host within the beta. example.com domain, it responds that it doesn't know the answer to the query but advises checking with the server responsible for handling .com for a better answer. As the recursion process continues its steps, resolution will occur one way or another—either an IP address or a negative response will be returned.
Given this resolution process, you might think that the root domain servers must either be the largest computers known to man or must often crash because of the sheer load placed on them. In reality, the root servers are spared such torture thanks to the second key component in the name-resolution process: caching.
Consider Figure 1's resolution process again, but this time suppose the local DNS server had already looked up the mail server information for the example.com domain a few minutes before getting the query for download.beta.example.com. In this case, the local DNS server already knows where to find the authoritative DNS server for the example.com domain—at least, as of a few minutes earlier—so it can go directly to that server to attempt to resolve the name download.beta.example.com instead of starting at the root domain. Therefore, steps 2, 3, 4, and 5 of the resolution process are unnecessary, providing a 40 percent decrease in communication traffic.
Caching takes place throughout the entire hierarchy of the DNS infrastructure. Going a step further, if anyone else on the local network happens to query for the same host—download.beta. example.com—the local DNS server can serve up a response out of its local cache because it recently found that host, thereby leaving only steps 1 and 10 in Figure 1's communication process—an 80 percent reduction in communication traffic.
Not only DNS servers cache records. Clients also perform caching, so any workstation that has recently cached a record for a host name will keep that record in its cache for a period of time. If an application (e.g., Web browser, email client) on the host has occasion to request that DNS record again, Windows uses its locally cached copy instead of initiating a DNS query, resulting in zero network communication.
This caching hierarchy, which takes place on every server and client involved with DNS—keeps DNS alive on the Internet. However, caching can also throw a wrench into your troubleshooting techniques.
Understanding how DNS communication and caching work when they're operating properly can reduce the time you spend troubleshooting problems. Let's look at how Windows' DNS resolver works when it attempts to resolve a DNS name to an IP address. As you can see in Figure 2, when tasked to resolve a host name into an IP address, the DNS resolver first checks its local cache to see if it already knows the answer to the query. If it has the answer in its cache, it returns the response and generates no traffic on the network; otherwise, it continues through the rest of the name-resolution process. That process sounds simple, but you need to understand a few things about what's actually going on with the cache.
First, the cache is populated by two main types of entries: entries that have been cached because they were resolved by querying the DNS server for the information, and entries that have been preloaded into the \%systemroot%\ system32\drivers\etc\hosts file. The first type of entries expire at an interval defined by the Time To Live (TTL) value that came embedded in the DNS response the first time the query occurred.
To view the contents of the cache and the time left before the records expire, you can use the Ipconfig /displaydns command at a command prompt. As an example, I issued a DNS query for www.google.com, then checked Ipconfig /displaydns. As you can see in Figure 3, the record has a TTL value of 248 seconds remaining. At the time of the DNS query, the google.com domain's domain information had a period of 5 minutes configured as the TTL for the record for "www"—not particularly surprising for an organization with a large and dynamically changing Web presence. However, more static organizations will typically use longer values, such as 1 day (86,400 seconds). Either way, it's important to understand that during the 5 minutes that this record is cached, if I query for www.google.com again, Windows won't send a query to my DNS sever—it will simply resolve the name from the cache.
In addition to caching positive responses, Windows caches negative responses. Negative responses are responses from a DNS server that sees itself as authoritative for a given domain but has no host record matching the query. Although this type of response has no TTL value attached to it, Windows caches negative responses by default for a period of 5 to 15 minutes, depending on which version of Windows you're using and how it's configured. To learn how to control this caching behavior through the registry, see the Web-exclusive sidebar "Controlling Positive and Negative Caching," http://www.windowsitpro.com, InstantDoc ID 48528.
If you're troubleshooting a name-resolution problem on your network, you can flush the DNS cache by using the Ipconfig /flushdns command at the command prompt, instead of modifying the registry. Flushing the DNS cache is a one-time operation that dumps the entire cache from memory and starts over from scratch. You can repeat the procedure as often as you want. Also, keep in mind that if you're using your local DNS server on your network as a resolution point for client workstations, it will likely be caching everything it handles as well. If you need to clear the cache on your DNS servers, you can do so through the Microsoft Management Console (MMC) DNS snap-in at Start, Programs, Administrative Tools, DNS. Right-click the DNS server's name and select Clear Cache.
Flushing the DNS cache is always smart if you're testing anything on your network that involves name resolution and if changes might have occurred in the past 5 to 15 minutes. However, as part of the process of clearing the cache, Windows will immediately preload the \%systemroot%\system32\drivers\etc\hosts file from your system into the cache.
Although the cache can sometimes be a hindrance, it can also be quite helpful. Remember that the cache contains both cached copies of records that have been resolved and static entries defined in the hosts file on your local system. I've found the hosts file to be a useful troubleshooting tool when I want to control the behavior of the DNS resolver.
For example, when I'm working on a problem that involves multiple servers responding to one name, and I want to make sure that my system connects to a specific one, I turn to the hosts file. Consider the case of multiple front-end Microsoft Outlook Web Access (OWA) servers that all resolve to the same common URL, as defined in DNS. If your users are complaining about intermittent OWA problems, how would you know which front-end server to investigate? The hosts file lets you preempt the response that your DNS resolver would have normally returned and put your own answer in place. You can force the DNS resolver to always return a specific value by placing that value in the hosts file, which will be loaded into the cache and remain there permanently. The format is simple: You define the address and the name on one line. The DNS resolver cache updates automatically whenever you save any changes to the hosts file, so its effects are immediate.
Multi-Server/ Multi-Adapter Situations
Let's refer back to Figure 2's flowchart. We've discussed what happens when the local cache can resolve a query. But if the local cache can't resolve a query, how does the resolution process continue? Windows continues the name-resolution process by issuing a recursive DNS query to the server specified as the Preferred DNS server in the preferred network adapter's Internet Protocol (TCP/IP) Properties, which Figure 4 shows. If Windows receives no response (positive or negative) from the preferred server within 1 second, the OS issues the same query to the same DNS server—but through all the remaining eligible network adapters in the system—and wait 2 seconds for a response. If there's still no response, Windows issues three more query attempts to get an answer. Each query has a longer timeout than the previous one (2 seconds, 4 seconds, and 8 seconds, respectively) and goes to all the defined DNS servers through all the eligible adapters. The total time for a DNS-resolution process should be no more than 17 seconds.
As far as Windows is concerned, what makes an adapter "preferred" or "eligible"? (The Microsoft term is "under consideration.") In some of its technical documentation, Microsoft has been vague about this aspect of the name-resolution process. For example, if all the DNS servers on a specific adapter are queried and none of them reply, that adapter is taken out of consideration for a period of 30 seconds. It's safe to assume that the adapter is now removed from the "eligible" category for any future queries during that time period—although the documentation doesn't specifically state that. Also, Microsoft's documentation states that "the \[DNS\] resolver keeps track of which servers answer queries more quickly, and might move servers up or down on the list based on how quickly they reply to queries"—likely a strong determiner for preferred adapters.
Microsoft's assertion that the resolver can change the order of the DNS servers it queries based on its own formulas contradicts the settings you'll find in a network adapter's advanced DNS configuration interface, which lets you choose DNS server addresses in order of use. In much of its documentation, Microsoft clearly has other ideas. Therefore, I no longer trust the order in which the DNS resolver will attempt to look up IP addresses. When I'm troubleshooting, I typically use simple command-line sniffers such as Network Grep (Ngrep) and WinDump to see the DNS queries leaving my system, as well as the DNS servers they're destined for.
In an upcoming article, I'll dive more deeply into these tools, as well as a few others that might be new to you. Also, for another indispensable resource of DNS-related tools, see the Web-exclusive sidebar "An Invaluable DNSTroubleshooting Resource," InstantDoc ID 48529.
Once you understand how a DNS query works and how the DNS resolver sends DNS queries out of its various network adapters, you're ready to start working with the command-line utility Nslookup. This utility is, without a doubt, the Swiss Army knife of DNS resolution and troubleshooting.
You can use Nslookup as a non-interactive command, so you can use it to look up hosts through the standard resolution process that the Windows DNS resolver would normally perform. For example,
Alternatively, you can tweak the resolution process and direct your DNS query to a specific server (instead of the servers configured locally) by adding the specific DNS server's IP address to the end of the command line. For example,
nslookup www.windowsitpro.com 10.0.0.1
This option is helpful if you want to make sure you're getting responses from a specific DNS server that might be problematic.
If you want to get even deeper into the resolution process, you can simply use Nslookup by itself and go to interactive mode, which lets you control much more of the resolution process, such as the server to use, the query type (recursive vs. iterative), and the level of debugging information to provide. Let's take a look at a few troubleshooting scenarios.
As I mentioned earlier, in some circumstances, the DNS-resolution process might need to go all the way to the root domain servers on the Internet, should no other servers along the way have an answer that's cached and still available within the record's defined TTL. You might also want to do so yourself (and check the responses each step of the way) to determine where the resolution process is breaking down. To simulate this process with Nslookup, you can issue iterative (not recursive) lookup queries for a target domain— but by starting with any of the root domain servers listed in Table 1 as the target DNS server, then manually following each referral that you receive until you get a final answer.
I performed a lookup for the fully qualified domain name (FQDN) www.windowsitpro.com by configuring Nslookup to use iterative queries. I used the Set Norecurse option at the prompt, then started my query at the root servers by using the Server option to tell Nslookup where to send the query. By following the referrals I received down the line, changing my target server each time, I could reach the answer by iterating through the entire process manually. This troubleshooting technique provides significantly more detail about the resolution process than simply issuing a query to your local DNS server and accepting whatever comes back as a response.
If you don't need to go through the entire iterative querying process but would simply like to see more detail about the queries going out of your system and the answers coming back, you can use the Set Debug or Set D2 options to get debug-level detail about the DNS query process. Figure 5 shows a sample query for www.windowsitpro.com. Also, by using Nslookup with the Set Type option (and specifying a domain name), you can quickly search for certain types of records within a domain by specifying their type—for example, MX (mail exchange) and NS (name servers) records.
For more information about Nslookup, see the Learning Path.
What AD Adds to the Mix
Once you've mastered the concepts of caching, iterative and recursive lookups, and troubleshooting and diagnosing DNS resolution problems across the Internet, you'll be able to tackle everything that AD adds to the table without too much difficulty. Integration between DNS and AD occurs on two levels: First, DNS is the primary mechanism with which systems on your network will find other hosts within the AD environment; second, DNS data—the listing of hosts that exist in a given domain, and their IP addresses—is replicated between DNS servers in your organization through AD multi-master replication. We've covered AD replication at length in these pages, so let's discuss the additional records that you'll typically find in DNS in an AD environment.
The records you'll find in an AD environment are dynamic registration records, which are automatically created by a client system (server or workstation) within AD and contain the system's host name and IP address. The DHCP client service on a workstation or server performs the registration process when the service starts up— even if you're using a manually assigned address. In its IP Properties, the DHCP client service will register its address with the DNS servers that it's configured to use. If you have certain network interfaces that are designed for specific purposes (e.g., a dedicated tape backup network) and don't respond to client requests coming in on those interfaces, this automatic registration process can present DNS name-resolution problems on the network.
For example, these specific adapters could be registered with IP addresses within DNS when you don't want those IP addresses handed out as possible response answers. If you find yourself in this situation, you can disable the registration of an interface by editing the advanced DNS properties for that interface and clearing the Register this connection's addresses in DNS check box, which Figure 6 shows. Otherwise, Windows will generally attempt to register every interface it can with DNS.
In addition to registering these host records (i.e., "A" records) automatically, Windows registers an additional record type—server records ("SRV" records)—for DCs. SRV records determine how systems participate within AD to handle authentication tasks. SRV records aren't specific to AD; rather, they're a standard DNS record type that defines the services available within a domain, the hosts on which to find those services, and the ports and protocols to use. Much as mail-exchange records ("MX" records) specify that SMTP services can be found at a specific port (i.e., port 25) of a certain server, SRV records can provide a referral to any type of service on any system. For example, an SRV record that would define the example.com Web site might look like
_http._tcp.example.com SRV 0 0 80 www.example.com
We can intuit a few things from this example—namely, that a TCP service known as HTTP is available for the example.com domain and that it can be found on port 80 of the host named www.example.com. In an AD environment, a DC registers four types of SRV records with the DNS servers it's configured to use:
_ldap._tcp.example.com SRV 0 0 389 dc.example.com _kerberos._tcp.example.com SRV 0 0 88 dc.example.com _ldap._tcp.dc._msdcs.example.com SRV 0 0 389 dc.example.com _kerberos._tcp.dc._msdcs.example.com SRV 0 0 88 dc.example.com
These records let AD-enabled clients know where to find the necessary LDAP and Kerberos services necessary for the example .com domain to find other AD resources and authenticate to those resources. These four sample SRV records effectively point the ADaware clients to the dc.example.com system (a hypothetical DC for example.com) for all of their authentication needs.
As part of any AD-and DNS-related diagnosis process, you should make sure that these records are available through the MMC DNS snap-in for the DNS servers in your organization. You should also be able to look them up from client systems by using the Nslookup utility.
Time for a Brush-up?
After brushing up on Nslookup, I quickly solved the problem on my network—a cache error involving my ISP and the data that should have been returned for a specific DNS query. After I determined the root cause of the problem—by performing the complete iterative lookup process myself—I was able to quickly implement an alternative solution for resolving public DNS names while my ISP worked on solving its problems. The problem was solved quickly, and my clients were back to having fully functional name resolution again. With a good understanding of how DNS works, as well as a strong set of tools with which to troubleshoot it when it's misbehaving, you can quickly resolve many DNS problems.
Douglas Toombs (firstname.lastname@example.org) is a contributing editor for Windows IT Pro and the author of Keeping Your Business Safe from Attack: Monitoring and Managing Your Network Security (Windows IT Pro eBooks).