Consider these recommendations before attempting your grand design

You can find no end of articles and white papers and even books emphasizing the importance of proper planning before implementing Windows 2000 Active Directory (AD) in your infrastructure. Indeed, if you think AD is just an incremental change from the way you do things in your existing Windows NT 4.0 domain environment, you're in for an unpleasant surprise. A directory service such as AD significantly increases the manageability and complexity of your network infrastructure. Far from being just an extension of NT 4.0 domains, AD provides features such as delegated administration and Group Policy-based desktop management and could even serve as a critical business platform for developing directory-enabled applications. Proper planning of this infrastructure is not only crucial, it's required. Let's look at some of the technical considerations and challenges involved in planning an AD implementation—from laying out your namespace to designing a replication topology.

The Logical Namespace
Planning an AD infrastructure starts with deciding how to lay out your namespace (i.e., how to organize your network resources within AD). In NT 4.0, your namespace choices are simple and few. Domains support only two levels of hierarchy, no delegation boundaries exist within a domain, and NetBIOS doesn't support hierarchical naming. With the advent of AD—a hierarchical directory service based on X.500 concepts and using DNS for its name service—your choices are much more complex. The AD namespace has three main tiers: domains, domain trees, and forests.

Domains. A domain is the security boundary in AD, just as it is in NT 4.0. An AD domain shares a common security policy and the same security groups, such as domain local and global groups. A domain is also a replication boundary—AD replicates Domain A objects only to Domain A domain controllers.

Domain trees. Win2K introduces a new concept: the domain tree. A domain tree is a hierarchy of domains that are part of a contiguous DNS namespace. For example, the top-level domain mycompany.com might have two child domains: east.mycompany.com and west .mycompany.com. The three domains form an AD domain tree. Mycompany .com might also create the subsidiary yourcompany.com and build a separate domain tree with the DNS namespace yourcompany.com.

Forests. Forests are another new AD feature. A forest is a collection of one or more domain trees that share a schema and a Kerberos security boundary. Each forest can have only one schema, which defines AD's objects and properties. Transitive Kerberos trusts connect all domains within a forest. A forest treats domains outside itself the same way NT 4.0 domains treat one another with respect to trusts. So, if you build two forests in your enterprise and want to share resources between them, you must use old-style NT 4.0 nontransitive trust relationships to do so. In addition, you currently can't merge two forests.

Figure 1 shows the relationships between domains, domain trees, and forests in AD. Note the 2-way Kerberos transitive trust in place between my company.com and yourcompany.com. A distinguishing feature of AD is that transitive trusts connect all domains within a forest.

Designing the logical namespace is an exercise in deciding how many domains, domain trees, and forests you need and how to name them. If you have an existing NT 4.0 infrastructure, you must also decide whether to reproduce or improve that domain structure in the new namespace. Given AD's ability to delegate administration through organizational units (OUs), you should need far fewer domains in Win2K than you do in NT 4.0. In addition, the need for a new domain is driven less by the need to delegate administration and more by replication and security concerns (I discuss these concerns shortly).

Factors other than your existing NT 4.0 domain model will influence your namespace design. As you go through the process of deciding how many domains your AD implementation requires and whether you need one or more domain trees or forests, you must also consider political and organizational factors, geographic factors, and technical factors.

Political and organizational factors. Will your namespace design respect and preserve your organization's existing political boundaries? If not, you might quickly learn that the fewer domains you want to have, the greater your diplomatic skills must be. Don't underestimate the political ramifications of collapsing several existing domains into one.

Your AD namespace design should attempt to "abstract" the organization so that the namespace can weather the vagaries of frequent organizational reshuffling. For example, if much of your East Coast sales department becomes part of the West Coast sales department, you shouldn't need to move OUs or users across domains. Rather, you should be able to simply switch users from one user group to another. Another factor to consider is that Win2K makes it difficult, if not impossible, to rename domains and absolutely impossible to rename the forest root domain. So, if your namespace depends on the ability to change domain names, you'll need to reconsider your approach.

The technical support model that your company uses—centralized or decentralized—affects your OU design. To create more granular delegation, you can either build more OUs or use security groups within an OU. If you choose to build more OUs, you potentially increase your effort each time you need to make a change that applies to all OUs and you increase the complexity of your AD namespace. Using security groups to control delegation requires you to thoroughly understand the AD security model and doesn't give you as clear a picture onscreen of where delegation lines are drawn as separate OUs do.

Geographical factors. If you work for a large multinational company or a company with multinational aspirations, try to design your namespace with an eye toward how your AD might grow across national borders. How will you handle new acquisitions or separate support organizations?

Technical factors. Microsoft has done a reasonable job of implementing a full-featured directory service in Win2K, but some technical challenges remain that will point your AD namespace design in one direction or another. I will detail some of these shortly, but for now, be aware that you should have a good working knowledge of AD's limitations before designing your namespace.

You might also find yourself designing around certain Win2K features. For example, the way you use Group Policy Objects (GPOs) might influence how you implement your AD namespace. At the very least, before you finalize your namespace decisions, you should know how Group Policy functions and how it might affect your design.

Domains and Forests—One or Many?
I've heard people talk about the nirvana of one AD domain for an entire enterprise. Some might indeed achieve this dream in Win2K, but if you don't anticipate getting there anytime soon, don't worry. Just keep the following multiple-domain design considerations in mind.

Keep the forest root empty. If you plan to have multiple domains, you might want to keep the forest root (the first domain you build as an infrastructure container) free of production users. Because this domain has a special role in your infrastructure—housing the Enterprise Admins and Schema Admins groups—reserving it for a few trusted administrators is common.

Limit trips to other trusts. When users access resources in a domain other than the one they belong to, they traverse a trust relationship (even in Win2K) and incur a performance penalty. Thus, a multiple-domain design that requires users to make frequent trips across trust relationships can slow response time, especially if you create GPOs in one domain that users must link to from another domain. To mitigate performance concerns in a large forest, you can create "shortcut" trusts between frequently accessed domains to reduce the number of hops a cross-domain communication needs to make.

Define a security policy for each domain. The domain is still the security boundary in Win2K. Thus, you must explicitly define any security policy (e.g., account lockout behavior, password length) in each domain—don't define the policy for one domain and expect that policy to protect all domains.

Limit replication between domain controllers. Because the domain is a replication boundary, you might also need to consider creating new domains to limit the amount of data replicated between domain controllers—especially across slow WAN links. You can control data replication to some degree through your site design (I describe how later), but you might need to partition really large domains in any case because of network bandwidth constraints.

How many forests you should have is an easy question to answer. In almost every case, you should be driving toward one forest in your production AD infrastructure. (You might also have a test or development forest that you use to test changes to AD.) The reasons behind striving for one forest are pretty straightforward. Win2K today offers no easy way to integrate multiple forests within an environment. Remember that a Win2K AD forest shares a common schema, a common Global Catalog (GC), and common Kerberos security trusts. If you build a second forest, it's a completely foreign environment. You must build explicit, nontransitive, NT 4.0-style trusts to allow sharing between the two forests.

Sizing Domain Controllers
After you settle on the namespace's logical layout, the next step is to figure out how to physically implement the design. This task isn't as trivial as it might sound because at the physical level, you need to consider factors ranging from your domain controllers' hardware requirements to the site topology for AD replication. Along the way, you must ensure that your physical implementation takes into account AD's current limitations. Last, you need to consider your DNS implementation. DNS server availability is crucial to proper functioning of AD replication and client logons.

When sizing your AD domain controllers, you might ask yourself: How big is big enough? If you decide to collapse 10 NT 4.0 account domains comprising 100,000 users into one AD domain, chances are that each domain controller in the new domain needs more horsepower and disk space than it previously required. But how much more? Microsoft provides a tool for quickly determining your domain controller requirements; see the sidebar "Active Directory Sizer," page 60, for more information.

Each new AD object you create requires some disk space on the domain controller. (The main AD database file, ntds.dit, resides on each domain controller in a domain.) The more objects— and the more object attributes—the larger your AD database will be. AD is much more scalable than an NT 4.0 domain. However, an AD database also consumes much more disk space than the NT 4.0 SAM because AD includes many more object types (e.g., volumes, Group Policies) and because AD objects can have many attributes (e.g., user objects can contain phone numbers, addresses, and email addresses). I upgraded an NT 4.0 domain containing roughly 3000 user accounts and machine accounts and several hundred user groups, and the resulting ntds .dit file was approximately 38MB. Using the AD Sizer tool, I estimated that an NT 4.0 20,000-user domain with multiple extended attributes (and that included the use of Microsoft Exchange 2000 Server) would translate to a 500MB ntds.dit file. Multigigabyte ntds.dit files are common for large or complex AD domains.

Table 1, page 61, shows the typical size of a few different types of AD objects. A user object with the minimum number of attributes is 3.7KB. If you use the AD Users and Computers Microsoft Management Console (MMC) snap-in to create a user, the snap-in sets the minimum attributes, but you'll probably use many other attributes in your AD objects (especially if you plan to implement Exchange 2000).

Some less-obvious practices also affect AD size. For example, each access control entry (ACE) on an AD object's security ACL consumes about 70 bytes. Given the inheritance model that AD uses when applying security, you can easily make an ACL change that automatically adds new ACEs to thousands of objects. Be careful as you delegate control of AD objects in your infrastructure. When possible, delegate to groups of users rather than to individual users. This approach minimizes the number of ACEs a particular object or attribute requires.

Sites and Site Links
Designing a site topology is perhaps the most challenging part of planning an AD implementation. Before you begin designing, you need to be familiar with the AD concepts of sites and site links, naming contexts, the GC, and connection objects. Sites are AD objects that determine how AD replicates data across your network. Sites are associated with subnet objects that you define, and subnet objects correspond to the TCP/IP subnets in your physical network.

Sites control when and how often domain controllers replicate with one another. Domain controllers within a site (i.e., intrasite) replicate with one another on a fixed schedule—every 5 minutes by default. Domain controllers across sites (i.e., intersite) replicate on a schedule that you decide upon—but no more frequently than every 15 minutes.

In addition to grouping domain controllers for the purpose of scheduling AD-information replication, sites help workstations and member servers locate resources that are physically close on the network. For example, a workstation authenticating to an AD domain first examines its own IP address and subnet mask to determine which subnet it's on. Having determined its subnet and its site (through a query to AD), the workstation queries DNS to find a domain controller in the same site.

Sites belong to site links—groups of sites in which network connections of roughly equal bandwidth link the sites. For example, Figure 2 shows a company with four regional distribution centers. Each of the centers has a high-speed LAN connecting multiple workstations and domain controllers. A T1 line connects each center to each of the other centers. Each center is a site, but the AD administrator has placed them in the same site link in this case because the connections between the sites have the same bandwidth. When you define a site link, you can specify a schedule for the sites within that site link and a cost. The schedule controls how frequently (and at what time) replication occurs between sites in the site link, and the cost is an arbitrary value you assign to the connections between the sites.

All sites in a site link replicate with one another on the same schedule. A site can belong to multiple site links, which is where the cost metric comes into play. In the distribution center example, you might add "dial-on-demand" links between two of the centers as a failover precaution. You could build a new site link that includes the sites these two distribution centers represent and give it a higher cost than the T1 site link that includes all the distribution centers. This action gives AD replication two paths between the two distribution centers: the lower cost T1 paths that the sites usually use and the higher cost dial-up paths that the sites use when the T1 lines are out of order.

Naming Contexts
In "Active Directory in Windows 2000," Winter 1999/2000, I introduced the concept of AD naming contexts (I've also seen them referred to as partitions), which are different paths Win2K uses to replicate different types of information between domain controllers in a forest. For each domain in the forest, Win2K replicates the domain naming context to all the domain controllers within that domain.

Win2K replicates the schema naming context (i.e., the AD schema) and the configuration naming context (i.e., site and subnet configuration information and other replication meta data) to all domain controllers in the forest. In addition, the GC, which is a partial replica of all objects in a forest, replicates to domain controllers that you designate as GC servers. The schema and configuration naming contexts contain information that isn't likely to change often for most enterprises, so these two naming contexts don't affect your site topology as much as the domain naming context and GC do.

Microsoft supports two replication protocols in AD. Standard RPC, the more common protocol by far, supports replicating the three naming contexts and the GC and can compress data for intersite replication. Standard SMTP is only for intersite replication of the schema and configuration naming contexts and the GC. SMTP is useful for intersite connections that are slow, unreliable, or even unavailable for large parts of the day. However, SMTP replication sends at least twice as many bytes across your network as RPC replication does. You use the AD Sites and Services MMC snap-in to define site links and specify which protocol to use for a given site link.

Connection Objects
After you decide on a site topology, Win2K creates the replication connections for you. AD provides a service called the Knowledge Consistency Checker (KCC) that runs on all domain controllers and builds connection objects between all domain controllers in your forest. Connection objects handle replication traffic between domain controllers. The KCC builds intrasite connection objects such that no more than three replication hops exist between any two domain controllers. Figure 3, page 62, shows the AD Sites and Services snap-in window, with a connection object (in the right pane) that the KCC generated on SERVERA in the Branch site.

Connection objects are one-way paths; if you have two domain controllers, each has a separate connection object to the other. However, if you have large numbers of domain controllers replicating with one another, not all the domain controllers might have two one-way connections with every other domain controller. A server initiating an intrasite replication event notifies the server at the other end of its connection object that the initiating server has changes. The target server then pulls the changed data from the initiating server.

The KCC also builds connection objects for intersite replication. When you create a site, the KCC picks a server to act as the bridgehead server for communication between the new site and remote sites. The bridgehead server uses its connection objects to replicate to remote sites at the times you specify in the site link object. Another server takes over the bridgehead responsibility if the regular bridgehead fails.

Designing a Site Topology
To design an effective AD site topology, you need to know how your company will use AD and how the different features in the Win2K infrastructure affect replication traffic. You can examine how you use your NT 4.0 domains to estimate your AD use. For example, look at the number of user accounts you create per day, the frequency of password changes, the average number of users in your user groups, and the frequency of user logons. Your unique traffic requirements should drive your site topology design.

Microsoft can tell you how many bytes of data are generated on the network when you create a new user or when users change their passwords (this information is available from resources such as AD Sizer and the Microsoft Windows 2000 Resource Kit). However, only you know how often you create new users and how often they change passwords. AD replication can take place at the attribute level (e.g., when a user changes his or her password, AD replicates only the password attribute, not the entire user object). But AD provides many more attributes per object than the NT 4.0 SAM provides, so depending on how you use objects, AD might generate more or less traffic than the SAM.

Your site topology design must answer the following questions:

  • At which points do I need to establish site boundaries?
  • How much bandwidth do I need to provide low latency of AD replication?
  • At what point do I need to deploy a local domain controller instead of having users authenticate remotely?

Earlier, I stated that sites control when and how often AD replication takes place. (Intrasite replication takes place at 5-minute intervals; intersite replication takes place at 15-minute or longer intervals.) People often ask how slow a network link between two domain controllers can be before each controller needs to have its own site. No hard and fast rule exists. When a link to a remote location becomes saturated with intrasite AD replication and other traffic during typical operations, build a new site for one of the domain controllers and schedule replication at a longer interval. Intersite replication uses compression when a transaction is larger than 32KB. This compression can be very efficient, resulting in as much as a 90 percent decrease in data size (password changes don't benefit much from compression because they're encrypted).

After you roughly calculate how much data AD will replicate across your network, you can set replication frequency for your site links. Remember that site links are collections of sites in which the sites are connected by network links of roughly equal bandwidth. So, if sites A, B, and C are in one site link, they all replicate on the schedule you set for that site link. Set your replication schedule to take advantage of intersite compression while minimizing latency between domain controllers. For example, you might be tempted to set all of your site links to replicate at the minimum interval—every 15 minutes—to keep latency low. However, if you don't generate many changes, the amount of data transmitted in each replication might not be sufficient to trigger compression, and you might end up sending more data per replication than if you had spaced your replications further apart.

In addition to replication traffic that AD domain naming context changes cause, you must consider other bandwidth consumers in your site topology design. Here are some of the more obvious ones.

GC servers. Servers you designate as GC servers receive changes from every domain in your forest. The amount of data these servers receive is slightly less than the sum of changes from each individual domain because the GC holds only a partial replica of each domain's domain naming context.

SYSVOL shares. SYSVOL shares are replicated between a domain's domain controllers. Group Policy Objects also keep data in SYSVOL. The NT File Replication Service (NTFRS) replicates data between all domain controllers in a domain. NTFRS uses the existing site and replication topology to propagate these changes.

DNS zone records. If you're using AD-integrated DNS, DNS keeps zone records in the domain naming context for the domain in which the DNS servers are running. DNS zone data can change frequently—for example, if you have a large population of mobile users whose workstations change IP addresses on a regular basis.

Group members. AD group objects store their members as one multivalue attribute. Thus, when you change one user in a 500-user list, AD must replicate the whole attribute. In fact, Microsoft recommends keeping group membership below 5000 because above that number, replicating the entire attribute in a single replication event becomes difficult. If you need to support more users or computers in a group, use nested groups.

Finally, you need to consider, on a case-by-case basis, whether you want to place a domain controller physically close to client workstations in a remote location. In general, if the traffic that AD and related services generate to a local domain controller is greater than the traffic remote workstations generate to authenticate across the network link, you probably should not place a domain controller close to the remote workstations. Consider also that if you build a site around a remote set of users and put a domain controller in that site, you'll likely need to make that domain controller a GC server as well. This action will immediately increase your bandwidth requirements to that site. (For more information about designing AD sites, see Sean Deuby, "AD Sites, Part 1," June 2000, and "AD Sites, Part 2," July 2000.)

As you can see, moving AD from theory to practice requires careful planning to produce a thoughtful design. You'll need to base many of your design decisions on factors specific to your environment and your network. But with a little help from the resource kit and tools such as AD Sizer, implementing AD isn't completely black magic. The key is to understand thoroughly how and how much you'll use AD—then double those numbers and design accordingly, just to be safe.