I've found that the best foundation for troubleshooting technology problems is an understanding of what happens behind the scenes. Such a foundation is especially useful if you manage a distributed infrastructure with many different kinds of services interacting across wide-ranging networks and disparate platforms. In the Windows world, many services, such as Active Directory (AD), require study if you want to truly understand the technology behind them.

This article looks in depth at two important network interactions that involve AD. First, I look at what happens when a workstation or server that's a member of an AD domain boots up, and I describe the process that a Windows 2000 device uses to authenticate to the domain and present a logon dialog to the user. Second, I examine what occurs on the network when two AD domain controllers (DCs) synchronize with each other and exchange directory update information. To collect all the data, I used Network Monitor in Win2K Service Pack 3 (SP3) on both server and client.

The Protocols
The key to understanding AD network interactions is to understand the protocols involved in facilitating those interactions. Table 1, page 34, lists the protocols we'll consider and the ports that they use.

Notice that Table 1 doesn't list any NetBIOS-based protocols. I assume that NetBIOS over TCP/IP (NetBT) is disabled on your clients and servers. If NetBIOS is enabled in your environment, you'll see some differences in your network interactions; where appropriate, I point out those differences.

A Win2K Client in an AD Domain
Let's look at what happens when you start a workstation or server that's a member of an AD domain. The process begins with DNS requests for SRV-type DNS records.

  1. The workstation looks for a suitable AD DC to authenticate it to the domain. Computer accounts are just a special type of user account; they authenticate to AD the same way users do.
  2. The workstation performs a DNS request (through UDP port 53) for all SRV records that are registered in the zone ldap._tcp.<site>._sites.dc._msdcs.<domain>, where site corresponds to the AD site the workstation belongs to and domain is the domain the workstation belongs to. The workstation caches site information in the HKEY_LOCAL_MACHINE\SYSTEM\ControlSet\Services\Netlogon\Parameters\DynamicSiteName subkey. If the site's name changes between machine reboots or the site no longer exists, the query will fail, and the workstation will then query the DNS zone ldap._tcp.dc._msdcs.<domain>. This zone typically lists all the available DCs in a particular domain, any one of which could authenticate the machine.
  3. The DNS server returns to the workstation a list of DCs for either the site (if known) or the domain.
  4. The client makes a Lightweight Directory Access Protocol (LDAP—UDP port 389) search request, which asks one of the listed DCs to validate that the client exists in AD. The request asks AD to match the workstation's name, the name of the domain to which the workstation belongs, the domain's globally unique identifier (GUID), and the workstation's machine account name. (The machine account name is the system name followed by a dollar sign—$; e.g., a workstation called ws1 has the machine account name ws1$.) This LDAP request is unauthenticated (i.e., anonymous).
  5. When the DC finds a match to the workstation's query, it sends a "success" response to the LDAP request.
  6. The workstation then pings each DC in the list that the DNS server returned in Step 3. The first DC to respond handles the workstation's authentication requests.
  7. From this point on, the interaction between the workstation (or member server) and DC involves authenticating the workstation to the domain and performing the tasks that are required to present a logon dialog to the user.

  8. The workstation begins a series of TCP connections with the DC for specific services, such as remote procedure call (RPC) and Server Message Block (SMB). (RPC communications require this process, in which the client makes a request to the server on port 135 to find out which port a particular RPC server service is listening on. After the client gets that information, it opens a TCP connection on that port so that it can perform RPC communications.) First, the client opens a TCP connection to the RPC port mapper service (TCP port 135) running on the DC. This action makes a request to the AD logon service. The workstation then opens an SMB connection (TCP port 445) to the DC, an action that sets up a connection to a file share. The source and destination ports for RPC traffic are in the range above 1023. Therefore, if you're passing RPC communication through a firewall, you'll need to open these higher-numbered ports unless you can restrict which ports the RPC service in question uses.
  9. After the workstation establishes these connections, the client uses Kerberos over UDP (port 88) to send an authentication request to the DC. If you want, you can force Kerberos to use TCP. (For details, see the Microsoft article "How to Force Kerberos to Use TCP Instead of UDP," http://support.microsoft.com/?kbid=244474.)
  10. The workstation uses the information it received from the RPC port mapper in Step 7 to open a TCP RPC connection to the AD Logon RPC service, which is identified by its GUID (also referred to as a universally unique identifier—UUID): 12345678-1234-ABCD-EF00-01234567CFFB. You can see this process in the Network Monitor packet trace, as Figure 1 shows. In this trace, SERVER222 is the workstation's (or member server's) name and SPLEXORA-DC1 is the DC's name. All RPC traffic during this phase of communication between the workstation and DC is encrypted, so you can't see the data that the client and server are exchanging; you see only the RPC header information shown in the Network Monitor trace.
  11. The DC responds to the workstation's Kerberos authentication request in Step 8 with the workstation's ticket-granting ticket (TGT), which is the session key the workstation will use for the life of its authentication session with the DC.
  12. Using the TGT and the RPC connection that it opened in Step 9, the workstation authenticates to AD by issuing a series of RPC R_Logon requests to the DC. The DC satisfies the requests when the passed Kerberos session information is correct for the workstation.
  13. The workstation uses the SMB connection it created in Step 7 to connect to the IPC$ share on the DC. The workstation also asks the DC whether it knows of any Microsoft Dfs referrals for the share it's connecting to (in this case, IPC$). This step automatically occurs whenever a client connects to a server share; later, we'll see how this step comes in handy.
  14. After the SMB connection is open, the workstation performs a slow-link test to determine whether the DC it's authenticating to meets the criteria for a slow network link. This test determines future behavior, such as whether certain Group Policy extensions are processed or how (or whether) a roaming profile is downloaded when a user logs on. The slow-link test consists of a 60-byte ping that is sent and timed, followed by a 1514-byte ping that is sent and timed. The workstation repeats this series three times and uses the latency of each ping to determine the link speed and whether the speed falls below the workstation's preset slow-link threshold. This threshold is set to 500Kbps by default, but you can use Group Policy Administrative Template settings to modify the threshold at the client.
  15. The workstation then performs a TCP-based LDAP request to get information about the domain, including the list of naming contexts (NCs) that the DC supports and each NC's LDAP distinguished name (DN).
  16. The workstation then performs an authenticated LDAP bind, again over TCP to the DC. The workstation sends a Generic Security Service (GSS)­SPNEGO request to the DC to authenticate. This GSS-SPNEGO request indicates that the workstation is negotiating the authentication protocol with the DC. When both client and server support it, Kerberos is the preferred authentication mechanism for binding the workstation to LDAP. Figure 2 shows the packet trace for the LDAP negotiation process.
  17. Then, the workstation figures out what Group Policy Objects (GPOs) it must process. It uses LDAP to determine which container objects (sites, domains, and organizational units—OUs) it's a member of. More specifically, after the workstation knows its place in the AD hierarchy, it sends a request to the DC to return the gpLink and gpOptions attributes for each container object.
  18. The DC then returns the DNs for each GPO that the workstation must process. These DNs take the form of a path to the GPO's Group Policy Container (GPC). A sample DN might look like LDAP://CN=\{31B2F340-016D-11D2-945F-0C04FB984F9\},CN=Policies,CN=System,DC=mycompany,DC=org.
  19. The workstation next performs an SMB request to the SYSVOL share on the domain (e.g., \\DC.mycompany.org\sysvol) to read the GPC portion of each GPO. Again, the workstation asks for any Dfs referral information before asking for a connection to this share. In fact, the original request the workstation makes is for \\mycompany.org\sysvol, and the DC determines the closest Dfs replica before sending its return information.
  20. The workstation uses SMB to read the contents of each GPO's Group Policy Template (GPT) and processes the required policy.
  21. The workstation then uses Network Time Protocol (NTP) to perform a time synchronization with its partner DC.
  22. Finally, the workstation determines whether it needs to know about any public key certificate services. Using LDAP, it queries the domain for information in the CN=Public.Key.Services,CN=Services,CN=Configuration,DC=mycompany,dc=org container, asking for objects of type NTAuthCertificates and caCertificate.

Not all interactions between two machines within an AD infrastructure are as complicated or lengthy as the process I've described. Although this process seems fairly involved, the interaction results in a small number of network packets. On my network, the workstation processed three to four GPOs; a typical startup sequence involved about 300 packets and took approximately 35 seconds to complete. If I enabled NetBIOS on the client (but not the server), the packet count increased by approximately 80. This count was the result of additional NetBIOS-related services, including WINS lookups, machine registrations in WINS and registration with the browser service, and SMB sessions that used TCP port 139 in addition to port 445. Nevertheless, the overall delta between pure IP and NetBIOS/TCP was less than I would have expected.

We've covered the process a workstation or member server uses to boot and authenticate to an AD domain, and you can use this information for troubleshooting. For example, suppose you find that your workstations are registering in the wrong AD site. Using the list of DCs obtained in Step 2, you can verify entries in the DNS server, your AD sites, and the registry to figure out what's happening. Similarly, suppose some member servers on a network that has a secure demilitarized zone (DMZ) are having trouble authenticating with an AD domain in the trusted network. You can use the information in the trace to determine which ports and protocols need to pass through your firewalls so that authentication can be successful.

DC-to-DC Synchronization
Now, let's look at the replication process between two AD DCs. This process is less complicated than the logon process but still provides some interesting observations. Again, I disabled all NetBIOS traffic, so the process I describe is pure IP-based DC synchronization. Also, I used DCs on different sites, and I used the ReplMon utility from the Win2K Support Tools to force intersite replication.

First, let's review a couple of concepts. Win2K uses two protocols to support intersite AD synchronization. You can use RPC over TCP/IP for both intra- and intersite replication, or you can use SMTP for intersite replication. My network uses RPC over TCP/IP; I didn't enable SMTP replication. Also, keep in mind that one AD connection object performs one-way pull replication. For example, if a connection object goes from DC1 to DC2, DC2 will pull data from DC1 during replication.

With that groundwork, let's look at what a replication event looks like on the network. Let's assume that DC2 has a one-way replication connection from DC1. So, if I add a new AD object on DC1, during the next intersite replication, DC2 will initiate the replication event and pull the new information from DC1.

  1. First, DC2 gets the appropriate Kerberos tickets so that it can authenticate to domain resources, specifically DC1. Because DC2 is a DC and runs the Kerberos Key Distribution Center (KDC) service, it can get a valid Kerberos ticket for the domain without venturing onto the network.
  2. DC2 then makes a DNS request to its DNS server to locate a Service Principal Name (SPN) entry for DC1. The entry, which looks like a GUID followed by a DNS name (e.g., db4dd6e7-9b40-4c97-83f5-6e68b1ebcf47._msdcs.mycompany.com), is unique for each server and isn't the same as the object GUID by which AD identifies the server. An SPN is what the name implies—it identifies a service running on a system. The DNS server returns the name of the server referred to by this SPN (the SPN is a DNS alias to the corresponding server), in this case DC1.
  3. Next, DC2 makes an RPC port mapper request (TCP port 135) to DC1 to start an RPC communication.
  4. DC2 gets a response from DC1 to the requested RPC endpoint, which in this case is for the interface GUID e3514235-4b06-11d1-ab04-00c04fc2dcd2. This GUID corresponds to the AD replicator service. DC2 returns a port (in the range above 1023) to use, and DC1 then initiates an RPC connection, with DC2 passing the Kerberos session data it retrieved in Step 1 to DC1.
  5. DC1 and DC2 then exchange a set of RPC requests and responses to replicate the directory data from DC2 to DC1. This traffic is encrypted, so you can't directly view the directory data as it's updated. Figure 3 shows proof of a replication event in a directory services event-log entry. This log entry shows a new user object called Jane Finance being created on the destination server. To view such verbose AD replication logging, you need to enable it on your DC's registry. (For instructions, see the Microsoft article "How to Enable Diagnostic Event Logging for Active Directory Services," http://support.microsoft.com/?kbid=220940.)

DC replication is simple and to the point. For this article, I created new users and groups and added a new user to a new group. I also ran several password-reset scenarios. In all cases, the interchange between DCs involved no more than 40 packets and about 15KB of data on the network.

This article looked at two AD network interactions that systems administrators are likely to encounter on a daily basis. Armed with this information, you can troubleshoot problems with DNS name resolution and protocol transport through firewalls and ensure that your infrastructure has everything it needs for fast and efficient data interchange.