A complicated—but improved—process

The connection between a client computer and the domain controller that handles the client’s authentication is a crucial infrastructure component in Windows 2000 and Windows NT environments. If the connection isn’t reliable and fast, the client might experience delays or be unable to access network resources. The domain controller selection process decides which domain controller a client will use to handle Win2K or NT authentication. Connection-based problems can occur in NT 4.0 because the NT 4.0 client/server architecture can’t account for a physical network’s complexities. In NT 4.0, all domain controllers have the same opportunity to establish trusts with all clients regardless of how close the domain controllers are to a client. For example, clients and member servers in an NT 4.0 domain that spans the company’s WAN might establish secure channel trusts with domain controllers located far away across the corporate network—an undesirable circumstance. To address the shortcomings of NT 4.0’s domain controller selection process, Microsoft made Win2K Professional’s process considerably more sophisticated than NT 4.0’s process. Understanding Win2K Pro’s domain controller selection process can help you to predict your Win2K domain design’s consequences at every network location and to troubleshoot client logon problems.

Process Changes
In NT 4.0, an NT Workstation client in a domain depends on WINS to find a suitable domain controller with which the client can authenticate. (In routed networks, Microsoft recommends WINS for NetBIOS name resolution. In a single subnet, the client can use broadcasting to resolve names. For more information about name resolution, see Sean Daily, "Navigating Name Resolution, Part 1," June 2000, and "Navigating Name Resolution, Part 2," July 2000.) At the boot, the client receives its primary WINS server’s IP address from DHCP, transmits a registration request to the WINS server to register the client’s NetBIOS name, then requests a <1C> list from the WINS server. (<1C> is a NetBIOS suffix. NetBIOS naming convention permits 16 characters; Microsoft uses the sixteenth character as a suffix that identifies network services and other functions. For more information about NETBIOS name suffixes, see Mark Minasi, Inside Out, "Knowing the Angles of NetBIOS Suffixes," February 1997.) The list contains the IP addresses of the PDC and as many as 24 BDCs in the client’s domain.

After receiving the <1C> list, the NT 4.0 client sends a directed logon request to each listed domain controller IP address, in listed order. The client usually chooses the domain controller that is physically closest to the client because such a controller responds more quickly than controllers that are farther away, but domain controllers higher in the list often are more frequently used than ones lower in the list because they’re always contacted first... But if network or router congestion exists, NT 4.0 clients might authenticate with distant domain controllers because nearby controllers can’t respond quickly enough to the logon request. The problem is that NT 4.0 doesn’t understand the network’s physical topology: WINS bases the list of domain controllers on the logical domain structure rather than the physical network structure. NT Service Pack 4 (SP4) Fixes utilities such as Setprfdc or the Randomize1clist feature for WINS servers might work around the problem by forcing a preferential domain controller or by ensuring that the returned list of domain controllers is in a different order every time. However, the underlying problem still exists. (For more information about Setprfdc, see Mark Minasi, This Old Resource Kit, "SETPRFDC," February 1999. For more information about Randomize1clist, see the Microsoft article "WINS Randomize1cList Feature Aids Load-Balancing Between DCs".)

Two big changes take place in the Win2K domain controller selection process: Win2K Pro uses DNS instead of WINS to find domain controllers, and Win2K uses Active Directory (AD) sites to understand the physical network. (For more information about sites, see "AD Sites, Part 1," June 2000 and "AD Sites, Part 2," July 2000.) The dynamic DNS (DDNS) standard, as available in Win2K (through DNS) and UNIX (through the Internet Software Consortium's—ISC's—Berkeley Internet Name Domain—BIND—8.2), provides the flexibility that the selection process needs. DDNS lets domain controllers and clients automatically register their names, IP address, and services. AD sites let Win2K clients find the closest domain controllers in the network topology, whereas NT 4.0 clients require manual intervention to accomplish.

Win2K Pro’s Selection Process
NT and Windows 9x clients in a Win2K domain still use WINS to choose a domain controller; you need a native-mode Win2K server infrastructure to gain the full benefit of Win2K Professional’s selection process. Therefore, the process I describe applies specifically to Win2K clients that are members of a Win2K domain. For a graphical step-by-step view of the process, see the sidebar "Stepping Through the Process." For an easy analogy of the process, see the sidebar "Let’s Make a Selection."

Before a user logs on, the Win2K domain controller infrastructure has already performed a lot of work. When a Win2K domain controller boots, it dynamically registers many records (e.g., address records, name server records, service resource records—SRV RRs) in its DNS zone (i.e., a section of the overall DNS namespace) and the forest root domain. Request for Comments (RFC) 2052 defines an SRV RR as a new DNS resource record (RR) that specifies the location of various services for a given protocol and domain. SRV RRs contain information about domain controllers, Global Catalog (GC) servers, and Kerberos Key Distribution Centers (KDCs); the domains and sites in which these servers are located; and the ports that these services use. The registration of SRV RRs lets the Win2K client query DNS and locate resources and services that the client needs for authentication. SRV RRs can have other uses, such as locating other services or selecting a subset of domain controllers with special characteristics.

Some examples of important SRV RRs are _gc, which contains information about the GC service; _kdc, which contains information about the Kerberos KDC service; and _ldap, which contains information about Lightweight Directory Access Protocol (LDAP—AD’s primary directory access protocol). As Figure 1 shows, these SRV RRs reside under the _tcp folder in the Microsoft Management Console (MMC) DNS snap-in. Figure 1 also shows that plshome001 and plshome003 have KDC and LDAP protocols, which means these servers are domain controllers; plshome001 has a GC protocol, which means that the server is a GC server. Site membership is located under the _sites container and will be used in the selection process to determine the closest domain controller to the client.

If the network uses DHCP, the client goes through a Discover/Offer/Request/Acknowledgement sequence with the DHCP server when the client boots. The DHCP server assigns the client an IP address, subnet mask, default gateway, primary and secondary WINS servers, and a list of DNS servers for its TCP/IP configuration. For a step-by-step illustration of the process, see the sidebar "Stepping Through the Process."

The first time you boot up a new or upgraded Win2K client, the client has no idea what site it is in. The client issues a DNS query (i.e., the DsGetDcName API to enumerate the SRV RRs for _ldap._tcp.domain.name.com records in the DNS database; domain.name is the fully-qualified name for the client’s domain). In other words, the client says, "Give me a list of domain controllers that are in my domain," and the DNS server replies with a list of SVR RRs for the client’s domain. Unlike WINS, which restricts the list to 25 servers, DNS replies with the entire list, using the 512-byte UDP protocol—unless the list of servers won't fit into the 512-byte packet that DNS requires. In that case, it the DNS server uses multiple TCP packets to send the entire list. Earlier BIND versions can’t gracefully handle an extended SRV RR list (i.e., a packet larger than 512 bytes); these versions cut off the list at 512 bytes. If the SRV RRs are weighted or prioritized by DNS, the client will sort the list by priority, then by weight, and then randomize within these rankings.

Upon receiving the SRV RR list, the client begins domain controller selection. As Figure B (Step 4) shows, the client issues LDAP-over-UDP pings to all the servers each server on the list, waiting up to100 milliseconds (ms—a very long time and different than the resource kit’s published value of 10 ms) 10 msec for a response before it moves on to the next domain controller on the list. This wait has a dramatic effect on the selection process. In a network with even moderately good connectivity, most domain controllers will respond to a client’s query in well under 100 ms. As a result, a client might (rarely) get past the first or even the second domain controller in the SRV RR list, essentially wiping out the closest domain controller in the list responds first mechanism. A wait time of 10 ms or a tunable parameter is more appropriate. (For simplicity’s sake, the figures assume the client and servers are all in the same domain.) The DsGetDcName API, which issues the pings and the original DNS query, has 16 flags that you can choose among to specify a domain controller (e.g., a writeable domain controller, any domain controller, a domain controller that is masquerading as an NT 4.0 PDC). The client caches the IP address of the first domain controller that responds and uses that domain controller for the next step in the selection process. In a large site, these steps increase the possibility that the client will use a nearby domain controller. The client ignores subsequent responses from other pinged domain controllers.

The selected domain controller returns three pieces of information to the client: the domain controller’s site, the client’s site, and the closest bit value. To determine the domain controller’s site, the domain controller compares its subnet (which it derives from its IP address and subnet mask) with the contents of the Subnets container that resides under the Sites container in the Active Directory Sites & Services Manager snap-in. To determine the client’s site, the domain controller uses the same process it used to determine its own site. (When the domain controller informs the client of the client’s site, the client stores this site in its local Registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services\Netlogon\Parameters\DynamicSiteName.) If the domain controller is in the client’s site, the domain controller returns a closest bit value of 1; otherwise, the closest bit value is 0. For example, as Figure B (Step 5) shows, the domain controller Site1-DC responded first but isn’t in the client’s site, so the closest bit value is 0.

If the domain controller returns a closest bit with a value of 1, the client caches the domain controller‘s IP address and uses that domain controller for authentication. If the domain controller returns a closest bit value of 0 (i.e., the domain controller tells the client, "I’m not in your site"), the client immediately queries DNS again—ignoring the first domain controller’s responses—for a more appropriate domain controller, as Figure C, Step 6 shows. This time the client knows its site, so it can be more specific. The client asks DNS to enumerate the SRV RRs for _ldap._tcp.sitename._sites.dc._msdcs.domain.name.com records in the DNS database; sitename is the client’s site, and domain.name is the client’s domain. In other words, the client says, "Give me a list of domain controllers in my domain and site." The client’s DNS server returns a list of SRV RRs in the client’s domain and site, and the client repeats the pinging process and closest bit retrieval described earlier.

Unlike NT 4.0’s WINS <1C> list, Win2K’s DNS domain controller selection process ensures that the client queries only network-proximate domain controllers—assuming you’ve correctly configured your sites. After the client selects a domain controller in the client’s site (or in the closest site that the client can find), the Kerberos authentication process begins...but that’s another story. (For more information about Kerberos, see Jan De Clerq, "Kerberos in Win2K," October 1999.)

Additional Notes
Several factors, such as DNS’s method of prioritizing and weighting the servers in the SRV RR list, affect Win2K Pro’s selection process. You might find the following information useful when you design your sites.

Clients in a new site. Because a Win2K client stores its site value in its Registry, a client that changes sites (e.g., a traveling laptop client) still thinks its in the old site upon first booting in a new location. When the client asks the DNS server for a list of domain controllers in its domain at the old site, that domain controller uses the subnet comparison process to determine the client’s current site. The client then updates its site in its Registry and queries the DNS server again, this time asking for a list of domain controllers in the new site. Because the client subsequently queries domain controllers in its site only, the first domain controller to respond to the client’s query returns a closest bit value of 0 or 1, and the selection procedure continues.

Server priority and weight. Unlike the WINS <1C> list of domain controllers, you can use DNS to prioritize and weight some domain controllers more favorably than others, but first you must carefully think through these changes’ implications because DNS controls apply across the entire domain. For example, you can use priority in SRV RRs to control which domain controllers in another site a client will failover to if the client’s on-site domain controller fails. Imagine a domain with site A (a large site with multiple domain controllers, centrally located in the network topology) and site B (a small site with one domain controller). You can give site A’s domain controllers a higher priority than site B’s domain controllers. When a client can’t find a domain controller active in its site, it asks for a list of domain controllers in its domain, and the returned list will place site A’s domain controllers at the top. However, you must be careful to set all the site A domain controllers to the same (i.e., higher) priority. If you set only the most powerful site A domain controllers with higher priority, site A’s clients will also tend to pick those domain controllers over other perfectly good choices in site A.

When the client’s site information is correct, DNS returns a list of domain controllers within that site only. The Win2K client randomly shuffles the list, then sorts the list according to the priority and weight that DNS assigns to each server; the client randomly orders the domain controller selection within a site, without regard to how close the domain controller is to the client. Therefore, given an equal DNS priority and weight between domain controllers, you can’t predict which domain controller within a site the client will choose.

Server rotation. If the primary DNS server in the client’s DNS list doesn't respond to the client’s queries, that server rotates to the bottom of the list. (DNS rotates the SRV RRs in the list that it returns so that clients don’t hit the same servers every time.)

Time limits. If a client can’t find a domain controller in the client’s site within 5 seconds, the client again queries DNS and searches for a domain controller in the client’s domain. If the client can’t find a domain controller in the client’s domain after 10 seconds, the client returns a "Can’t find a domain controller" error message to the user.

Server list and SRV RR view. You can use DNS Manager to see the list of servers that DNS will return in its SRV RR list, as Figure 2 shows. Double-click each _ldap object in the DNS server’s /forward lookup zones/domain.name/_msdcs/dc/__sites/sitename/__tcp. You can also use the diagnostic tool Nslookup to examine an SRV RR. To obtain the list of domain controllers within a domain, run Nslookup from a command prompt, then type

set type=srv

then

_ldap._tcp.<i>domain</i>

In the file \%systemroot%\winnt\system32\config\netlogon.dns, you can find every record (i.e., not only SRV RRs) that the domain controller registers with DNS.

Site coverage. If no domain controller is registered for the client’s site, the site coverage process takes place. Site coverage ensures that every site has a domain controller so that the domain controller selection process can proceed. In site coverage, a domain controller that determines it is closest to an empty site adds SRV RRs to DNS to advertise itself as a domain controller for that site. How does a domain controller determine that it is closest to an empty site? The domain controller looks at site-link costs and determines whether the site in which the domain controller exists has the lowest-cost connection to the empty site. (For more information about site-link costs, see "AD Sites, Part 2.")

The Win2K domain controller selection process is complicated, but when you combine it with intelligent site design, the process provides a more reliable method than NT does for establishing good communications between a Win2K client and a domain controller.