How the DNS lookup process can fail

I get many reader questions that go something like this: "I set up a test computer called DC1 at 192.168.1.4 to play with Active Directory (AD). This computer runs Windows 2000 Server and acts as both the DNS server and the first AD domain controller (DC) on my test domain, which I call acme.com. Everything works fine with that DC—I can create users and run all the AD administrative tools. Then, I put Win2K Server on another computer named DC2 at 192.168.1.5 and run Dcpromo to make that computer the second DC in the domain. But Dcpromo says that it's unable to contact the domain and can't make DC2 into a DC for acme.com. Yet I can ping DC1 from DC2 and vice versa, and the DCs are on the same subnet, so why can't they see each other?"

What causes this problem? To see how things go wrong, let's reconstruct what happens.

Finding a DNS Server
When you run Dcpromo and instruct it to make your computer into an additional DC, Dcpromo displays a Network Credentials panel that requests the name, domain, and password of an account that has the authority to add a DC to the existing domain (e.g., acme.com). Dcpromo then looks for an acme.com DC and uses the name and password to try to log on. But first, Dcpromo must be able to find a DC.

Remember that DNS acts as the central naming service for an AD domain. An essential DNS function in a Win2K network is to help computers find DCs. To use DNS, your system needs to have two pieces of software: the server software, which runs on the DNS server, and the DNS client software, which runs on your workstation. Your workstation uses the DNS client software to resolve names. But the client can't help unless it can find a DNS server. To find a DC that will authenticate you, Dcpromo says to the DNS client software, "Find a DNS server for acme.com, and ask it for the names and IP addresses of the DCs in the domain." The client does so by querying the DNS server for an SRV record.

To determine which DNS server your system queries, you can open a command line and type

ipconfig /all

Within the output, you'll see a list of the IP addresses for all the domain's DNS servers. When the DNS client software needs to resolve a DNS query, the client tries to contact the first machine on the list. If a DNS server is at that location, the client will address all DNS queries to that machine. The client doesn't query another DNS server from the list unless the client gets no DNS server response from the preferred server.

When a Win2K system needs to find an AD DC to log the system on, the system prefers to use a local DC. So, a Win2K system often does multiple DNS queries when looking for a DC: first, "Tell me about the local DCs for acme.com," then if that query fails, "Tell me about all the DCs for acme.com."

Troubleshooting AD Authentication
A powerful utility for troubleshooting AD authentication problems is Nslookup, which lets you mimic the behavior of a Win2K system that's trying to log on to an AD domain. On a command line, type

nslookup

The response gives the default server and its IP address and tells you that Nslookup did two things: It successfully contacted a DNS server, and it asked the server to reverse-resolve the server's IP address into a DNS name. Reverse resolution is unnecessary for AD logons, but if a functional DNS server doesn't reverse-resolve, Nslookup returns a message that makes you think something is seriously wrong:

DNS request timed out.
  timeout was 2 seconds.
*** Can't find server name
  for address 200.200.10.10:
  Timed out
*** Default servers are not
  available
Default Server:  UnKnown
Address:  200.200.10.20

However, if you type an Internet address (e.g., www.win2000mag.com) at the Nslookup prompt, the response reveals that Nslookup easily resolves that name:

www.win2000mag.com
Server:  ns1.yourisp.com
Address:  200.200.10.10

Name:    www.win2000mag.com
Address:  63.88.172.66

Nslookup also complains if the DNS server that your DNS client is supposed to use is dead or has a problem:

*** Can't find server name
 for address 200.200.10.10:
 No response from server
*** Default servers are not
 available
Default Server:  UnKnown
Address:  200.200.10.10

Do you see a significant difference between this response and the one that indicates a failed reverse resolution? No? Neither do I. So, how can you determine whether the server has a real problem or Nslookup is complaining about reverse resolution? Your best bet is to simply try to resolve an Internet address. If the problem exists at the server, the response will instead resemble

Server:  UnKnown
Address:  200.200.10.20

*** UnKnown can't find
  www.win2000mag.com: No
  response from server

Now let's ask Nslookup to simulate using a local DC to log on. You'll need to know the name of your domain and your AD site. You'll always have at least one site: When you create the first DC in a forest, Dcpromo creates a site called, by default, Default-First-Site-Name. Type the Nslookup command on a command line. At the prompt, enter two commands, using the syntax

set type=srv
_kerberos._tcp.
    ._sites.dc._msdcs.
   

Be sure to include the underscores, and type the second command on one line without spaces. For example, for a site named HQ in the acme.com domain, you'd type

set type=srv
_kerberos._tcp.hq._sites.dc
  ._msdcs.acme.com

If you type the commands correctly, you might get a response like the one Figure 1 shows (don't worry if you don't—that's why we're troubleshooting). But if those commands don't work for you, then Dcpromo won't be able to log on either. Dcpromo would then say to DNS, "Tell me about all the DCs in the world for acme.com." You can also use Nslookup to simulate that query:

_kerberos._tcp.dc._msdcs
   .acme.com

But if your first query failed, this one likely will too, leading to a message resembling ns1.yourisp.com can't find kerberos._tcp.dc._msdcs.acme.com: Non-existent domain.

The reason for this problem in most test networks is that your AD domain (e.g., acme.com) has the same name as a registered Internet domain, and that name conflict poses problems. To see why, let's roll back the clock to when you created the first DC in your test forest.

The Origin of the Problem
When you use Dcpromo to create a new forest with a first domain named acme.com, Dcpromo needs to write several records into the writable copy of the acme.com zone, which lives on the domain's primary DNS server. So, Dcpromo queries the local DNS server for the address of the primary DNS server for acme.com. Because you're just playing around from home, your Win2K server likely connects to the Internet through DSL or a cable modem and thus points to some DNS server on the Internet. That DNS server happens to know about a registered Internet domain named acme.com and responds that the primary acme.com server is some UNIX box on the Internet.

Dcpromo then says to that distant server, "Hi. I'm about to write a bunch of new SRV records to you with dynamic DNS (DDNS). That's all right, isn't it?" The UNIX server responds, "No! I don't know you, and I'm not about to let you write records into my zone." That response, of course, leaves Dcpromo in a quandary.

Instead of telling you that it has found the acme.com DNS server but the server doesn't accept updates, Dcpromo fibs by reporting that it couldn't find the DNS server for acme.com and offers an alternative: "Would you like me to configure a DNS server for this domain?" You say "Yes," grateful that Dcpromo is such a can-do kind of program. So, Dcpromo sets up the soon-to-be DC as a DNS server, creates an acme.com zone on that server, uses that zone to set up acme.com, then reboots.

That's when the trouble starts, as I'll explain in my next column. See you then.