How replication works and what to do when it doesn't

Ahead of the pack, your company has already deployed Windows 2000 and Active Directory (AD). Everything worked perfectly for a while, but you're beginning to realize that Win2K doesn't perform exactly as Microsoft promised it would. Many administrators are unprepared to troubleshoot problems that arise when good directories go bad. Unfortunately, AD replication is one of the least understood functionalities in Win2K. Now's the time to develop a better understanding of replication's inner workings and discover the troubleshooting tools that are available to you.

Replication Overview
AD is a database. By default, each domain controller (DC) stores a copy of this database as ntds.dit in its \winnt\ntds folder. The database is logically divided into three directory partitions, or naming contexts (NCs)—the Schema NC, the Configuration NC, and the Domain NC. All DCs in the forest contain the same Schema NC and Configuration NC because this information is defined forestwide. Each DC in an AD domain holds the same copy of the domain's Domain NC. If the DC is designated as a Global Catalog (GC) server, then that DC also holds a partial copy of every other domain's Domain NC. This partial copy includes all the objects from the respective domains, but only a subset of the attributes.

Replication is the mechanism that AD uses to synchronize all this information across all the DCs in the domain or forest that hold the information. AD uses the Knowledge Consistency Checker (KCC), sites, site links, and connection objects to accomplish this replication.

The KCC, a built-in process that runs on all DCs, creates the forest's replication topology. You use sites to group well-connected DCs that are within close network proximity. Your network and your AD architects determine whether a DC is well connected. Many companies consider DCs connected at network speeds of 10Mbps bandwidth to be well connected. To create a site, you configure AD with your network's IP subnet addresses. If one or more subnets are well connected, you can group them into a site. Replication between DCs in one site is called intrasite replication. To establish an intrasite replication topology, the KCC automatically creates connection objects between the DCs in a site. Connection objects are one-way connectors that link DCs across a site. Each of these links—like a traffic lane—represents an in-bound connection from the source DC to the destination DC. Before two DCs in a site can replicate directory data between themselves, you must establish two separate connection objects.

If some of your DCs aren't well connected, you need to create multiple sites. Replication between separate sites is called intersite replication. Your AD administrator uses the Microsoft Management Console (MMC) Active Directory Sites and Services snap-in to create site links, which provide roadways between sites. After the AD administrator establishes these pathways, the KCC creates connection objects between the linked sites. Typically, not all DCs share the same information. (For example, DCs in separate domains might maintain different data.) Therefore, the KCC might need to establish multiple connection objects to ensure that each NC replicates completely throughout the enterprise. In Figure 1, which shows an example of intersite connection objects, Site A and Site B are connected by a manually created site link. In this example, the KCC has created two one-way in-bound connection objects to replicate the three NCs between two DCs from the same domain.

The bridgehead server is another component of the replication topology. If you've worked with Microsoft Exchange Server, you're familiar with this server role. To increase the efficiency of replication, the KCC doesn't create individual connection objects between all the DCs in one site with all the DCs in another site. Instead, the KCC uses a store-and-forward mechanism that replicates information between two bridgehead servers—one in each site. The bridgehead server then uses intrasite replication to replicate the information to the rest of the DCs in its site. For more information about bridgehead servers, see the sidebar "Bridgehead Servers," page 48.

What Gets Replicated?
Considering its ability to make changes to a DC's directory objects, AD needs an efficient way to determine which objects have changed and whether to replicate those changes to the DC's replication partners. AD uses update sequence numbers (USNs) to track when changes occur in the directory. USNs are 64-bit counters that AD assigns locally per DC. When AD, users, administrators, or applications update an attribute, the DC looks at the attribute's current USN value for the directory, increments it, and assigns the new value to the updated object as its local USN.

Within the AD replication topology, replication partners use a high-watermark value to keep track of the most recent changes they receive from source DCs. When a destination DC requests changes from the source DC, the destination DC sends its high-watermark value to the source DC as a benchmark for sending back changes. As a result, the source DC will send only directory-object changes that have a value higher than the high-watermark value, thereby eliminating any unnecessary flow of replication data across the wire.

The up-to-dateness vector works in conjunction with the high-watermark value to minimize the amount of replicated data. Whereas the high-watermark value concerns objects, the up-to-dateness vector concerns attributes. Otherwise, the two values have similar functionality. During an exchange of replication data, a destination DC sends its up-to-dateness vector to the source DC, which uses this value to determine whether the destination DC has an up-to-date value for a particular attribute. If the value is up-to-date, the source DC filters the value from the data it sends to the requesting DC.

6 Essential Tools
After you deploy AD, you need to load your toolbelt with the necessary utilities to solve any problems that occur. The Microsoft Windows 2000 Server Resource Kit contains many such tools, about 50 of which are also available in the Win2K Server CD-ROM's \support\tools folder. To tackle any replication problems that might arise, you might need to use some of these tools simultaneously.

Event Viewer. Windows' default event-log viewer is available under Start, Programs, Administrative Tools. Event Viewer typically gives your first indication when something goes wrong. In an AD deployment, DCs have a new log called Directory Services. To keep an eye on most replication-related occurrences, you should monitor your DCs' Directory Services logs. Although you can connect to other DCs to view their logs, Event Viewer can display logs for only one server at a time. To obtain a report that contains replication-related event entries from all your DCs, I recommend that you use the Replication Monitor utility.

Replication Monitor. Replmon.exe is a GUI utility that's part of the Win2K Server CD-ROM's Support Tools. You can use Replmon to obtain an overview of replication in your enterprise. You can gather much information from DCs to determine whether replication is occurring, when it's occurring, between which DCs traffic is flowing, and more.

Replmon provides one source for collecting replication-related event-log entries from all your DCs. To launch Replmon, click Start, Run and enter

replmon

The utility opens without populating the interface with any servers. You need to manually add the servers that you want to monitor. A quick way to add servers is to create an .ini file that contains a list of your DCs, one on each line. Then, from the Replmon menu bar, click File, Open Script, navigate to the .ini file you created, and click Open. The interface will display your DCs and the NCs they hold. In the example that Figure 2 shows, I've added the testdc01 server. Underneath the entry for testdc01, you can see the three NCs that this DC holds. To see the tasks that you can use Replmon to perform and that Figure 2 shows, right-click the server name. Familiarize yourself with each of these tasks and the information they can provide in the event that you need to troubleshoot NC replication problems.

One particularly useful Replmon feature is the ability to quickly gather Directory Services event-log entries that are related to replication failures. To launch a separate window from which you can collect this data, go to the Replmon menu bar and click Action, Domain, Search Domain Controllers for Replication Errors. Click Run Search. Replmon prompts you to enter the domain for which you want to perform the search. Enter the DNS domain name, then click OK. The utility queries the domain's DCs and collects errors related to replication failures, displaying the DC on which the error event originated, the affected directory partition or NC, the replication partner involved, and the failure code and reason. Replmon queries each DC individually rather than gathering all the directory information from one DC.

Domain Controller Diagnostics. Dcdiag.exe, a command-line tool that you'll find on the Win2K Server CD-ROM, is a powerful diagnostics tool that provides an enormous amount of data about the DC that you run it on. Dcdiag runs tests against the DC to determine the status of connectivity, replication, topology integrity, user permissions, locator functionality, intersite health, trust verification, the File Replication Service (FRS), and critical services running on the DC.

If you run Dcdiag, be prepared to sift through lots of information. I recommend that you use the syntax

dcdiag > .txt

to push the output to a text file. You can then open the text file in Notepad or Microsoft Word and browse the information more easily. When you're troubleshooting a problem, scour the log for any failures or error notices that might provide clues about the nature of the problem.

Repadmin. Repadmin.exe, a command-line tool that you'll find on the Win2K Server CD-ROM, will probably be the hammer in your replication toolbelt. This tool's commands can reveal replication's inner workings and help you troubleshoot and repair problems. To view all your Repadmin command-line options, type

repadmin /?

One particularly helpful use of Repadmin is to show a DC's replication partners. To do so, type

repadmin /showreps

Figure 3 shows the results of running this command against a DC called testdc01. You can see that testdc01 has inbound connection objects from testdc02 for the Schema, Configuration, and testdomain.com NCs. Figure 3 also shows that the most recent attempts to replicate each of these NCs occurred on June 3 at various times and that each attempt was successful. The bottom half of the screen shows the outbound replication neighbors to which testdc01 will send replication notifications when it has directory data to replicate.

You can also use Repadmin to view detailed information about a particular object in AD. Suppose you want to see an object's local USN and the source of the most recent update to the object's properties. To do so, you can use Repadmin to reveal that object's metadata:

repadmin /showmeta

For example, to show the metadata for the administrator account in testdomain.com, you would type

repadmin /showmeta "CN=administrator,OU=users,dc=testdomain,dc=com" testdc01.testdomain.com

Figure 4 shows the results of this query. In the far-right column, you can see each of the attributes that make up the Administrator user. In the far-left column, you can see each attribute's local USN. Notice that the nTSecurityDe and adminCount attributes have a higher local USN than the other attributes do. This higher number indicates that these attributes' values have changed more recently than the others.

Group Policy Verification. Gpotool.exe, a command-line resource kit tool, can help you troubleshoot the replication of Group Policy Objects (GPOs) between DCs. For more information about GPO replication, see the sidebar "Win2K's File Replication Service."

Directory Service Agent Statistics. Dsastat.exe, a command-line tool included on the Win2K Server CD-ROM, compares and reports differences between directory NCs on DCs. This tool is handy when replication appears to be working but you see different views of directory data on different DCs. Such an occurrence might mean that the database is corrupted. Dsastat provides high-level data-consistency information by comparing directory statistics such as objects per server, bytes per server, and bytes per object between two DCs.

Patience Is a Virtue
Even in the smallest environments, AD requires attention and expertise. If you take the time to understand how replication works, you'll have a significant advantage when you need to address AD problems. If you work for a large company that has many sites, the best troubleshooting advice I can give you is this: Be patient. Remember that AD operates in a multimaster fashion. When you make a change on one DC, that change might take hours to propagate to the other DCs in the forest. When you focus on a problem, try to make one change at a time, document that change, and give AD enough time to replicate and respond to the modification.