It's like turning back the hands of time
It's Friday afternoon, and 5 o'clock is fast approaching. You're just about ready to head home for the weekend, and your phone rings. You glare at the phone with a sense of foreboding, and reluctantly you pick it up. The Help desk has just accidentally deleted an organizational unit (OU) containing several top executives' user accounts. None of the execs can log on to the domain to access resources, including email and calendars. You feel a knot in your stomach because although you've tested the restoration of some Active Directory (AD) objects, you haven't had to do it under fire and you haven't had to use tapes controlled by the backup group in a remote datacenter location. After 2 hours of wrangling with the backup group and finally getting the correct tapes in the tape loader, you're ready to perform the restore.
If you've done a good job of testing and documenting your backup and restore procedures, you'll need just another 2 hours before you can finally restore the directory information tree file on the domain controller (DC) and proceed with the authoritative restore of the objects. And all along, you're thinking, There must be a faster and easier way.
Many companies rely on AD not only as the domain-authentication mechanism that permits access to resources on the network but also as the email directory and in some cases the company's authoritative directory. Needless to say, the integrity of the data in AD needs to be heavily guarded and disaster recovery must be a priority. Every hour necessary to restore a deleted object can translate into thousands of dollars of lost productivity. Enter the delayed-replication recovery site.
The basic concept of delayed replication is simple: Imagine a pair of DCs that replicate with the rest of the forest only once per week on a staggered schedule. This lengthy replication cycle, in a multimaster directory, lets an administrator turn back the hands of time in the event of a disaster. For example, one recovery DC might replicate every Tuesday at 11:00 am, and the other every Friday at 11:00 pm. This staggered replication schedule ensures that you always have a minimum of 3.5 days to recover an item (or items) after it's deleted.
If you implemented only one recovery DC per domain, the timing might be such that the deletion would immediately precede the replication to the recovery DC, in which case you'd lose your opportunity to recover the item. Consider a scenario in which an object gets deleted on Tuesday at 10:00 am, but the deletion isn't noticed until Tuesday afternoon. If you had only one recovery DC replicating at 11:00 am on Tuesday, the DC would have replicated with the rest of the domain and received the deletion of the object. You would have missed the opportunity to recover the object. If you have a second DC replicating on a staggered schedule (e.g., Friday at 11:00 pm), you can still recover the object from that DC.
You could implement more than two delayed-recovery DCs and establish any number of replication timing scenarios. You could have seven delayed-replication DCs that replicate on different days, permitting you to restore objects with greater precision. In this article's 3.5-day scenario, it's possible that you could restore an object that is more than 3 days old. Any changes made to the object in that 3-day period, however, would be lost on restoration.
In the past, recovering a deleted object took as long as 6 to 8 hours and involved several folks from several support areas. With delayed replication, you can recover a deleted user account in less than an hour, using only one support person. In my company, the first time we used our delayed-replication DCs to recover a deleted account, even the user was surprised by how quickly we restored the account to working order.
Another advantage of putting DCs on a delayed-replication schedule is that you can query the directory on the delayed DC to find information (e.g., the distinguished name—DN—which is required to restore the object) about the deleted item. You might need this functionality, for example, if a user account was deleted but nobody knows the user account's OU path location. In a typical restore scenario, you would need to access a restored directory offline (not on the network) so that you could query for the object and gather the DN for use in the authoritative restore process.
Building a Delayed-Replication Site
The mechanism that controls the schedule a DC uses to replicate is the placement of DCs in separate AD replication sites. If you want to have two DCs for each domain that replicate on different schedules, you'll need to create two AD sites and configure site links from those sites to another well-connected AD site. You'll need to configure these site links to replicate at the times you desire. In the previous example, one site replicates only on Tuesday at 11:00 am and another replicates only on Friday at 11:00 pm. Ensure that your new AD sites are configured to use only the delayed-schedule links you specify and not the default site link.
You also should ensure that the new delayed-site links are configured to have a higher cost than any other site links in your forest. Doing so will ensure that Microsoft Exchange Server's Dsaccess component doesn't choose the delayed-replication DCs for directory lookups. This procedure is important even if no Exchange servers exist in the delayed-replication sites, because Exchange will sometimes choose DCs in other sites. You'll need to use the Active Directory Sites and Services snap-in to perform this work. Navigate to the Inter-Site Transports\IP container to create your site links. After you create a link, go to the link's properties and set the cost to the desired parameter. Next, click the Change Schedule button and choose the times you want replication to occur. Figure 1 shows an example of setting the link schedule.
To permit each DC to join a desired site when it's promoted to DC status, you'll want to add the appropriate subnets to each delayed-replication site. I suggest precreating a 32-bit subnet (one node) for each system's IP address before promoting the DCs into their respective sites. Again, you'll need to use the Active Directory Sites and Services snap-in to perform this task. Navigate to the Subnets container and select New Subnet from the Action menu. Enter the correct subnet information and associate the subnet with the delayed-replication site you desire. Figure 2 illustrates this process for a system with IP address 10.1.1.5.
Beware of Stale Data
Bear in mind that because these recovery DCs will replicate on a delayed schedule, you must take measures to prevent these DCs from servicing user authentication or directory lookups. Because replication will lag behind by several days, you should consider the data on the recovery DCs as stale. A changed phone number attribute on a production DC user object, for example, won't change on the recovery DC until replication occurs. To prevent user authentication and directory lookups, you can apply a special Group Policy setting to the AD sites that host the recovery DCs. This Group Policy setting essentially hides the DC from the rest of the environment and allows for replication only with partner DCs. The site-based DC Locator DNS Records not registered by the DCs Group Policy Object (GPO), which Figure 3 shows, prevents the delayed-replication DCs from registering SRV and other DNS records. You'll find this GPO in Group Policy Editor (GPE) under \administrative templates\system\netlogon\DC Locator DNS Records.
Your goal is to permit the registration of only the GUID Cname record in DNS, along with the A record for the DC nodename (because the GUID Cname points to this A record). It's important that you don't let Netlogon register any other DNS records, including the domain A record, any SRV records, and the Global Catalog (GC) A record. Each record classification is represented by a mnemonic to help make policy application easier. For each record type that you don't want to register in DNS, you must enter that mnemonic into a space-delimited list in Group Policy. DsaCname is the only mnemonic that should be missing from the list of space-delimited mnemonics that need to be entered into Group Policy. Table 1 shows the complete list of mnemonics for Group Policy.
If you're running a Win2K forest, you'll need to manually enter these settings into the registry of each delayed-replication DC, as the Microsoft article "How to Optimize the Location of a Domain Controller or Global Catalog That Resides Outside of a Client's Site" (http://support.microsoft.com/?kbid=306602) describes.
It's also important to prevent these DCs from registering in WINS, in which down-level clients might be attempting to resolve the 1C record to find a suitable DC in the domain. Each DC in the domain registers a 1C record in WINS. This record maps a domain name to an IP address, permitting client systems to find an appropriate DC according to the domain name. To prevent the registration of the 1C record, don't specify WINS resolvers in the IP configuration.
The Recovery Process
To be able to restore an object (or subtree of objects) in AD, you must first know the object's specific DN. The DN is the object's directory path that Ntdsutil will use to find and restore the object. Many times, you won't know the exact DN of an object you need to restore, so you might need to search for the object to garner that information. After you determine the DN, you'll use Ntdsutil in Directory Services Restore Mode on the delayed-replication DC to restore the object. You'll then need to replicate the recovered object back into the production environment.
Find the deleted object's DN. To find an object's DN, log on to the delayed-replication DC you want to use to restore the deleted item and perform a search for the object. Use the Support Tools utility ADSIedit.msc to perform the query, as
Restore the object. Because the lag DC's copy of the directory still contains the object, you can restore it without the necessity of retrieving tape backups or restoring an old directory tree file. You can use Ntdsutil to increase the object's universal serial number (USN) by an increment of 100,000, thereby ensuring that the restored object will win the replication conflict.
This command should appear on one line. Wrap the DN in quotes if it contains any spaces. Press Enter.
restore object CNemail@example.com,
Replicate the restored object into the rest of the domain. Determine which production DC in the domain is pulling updates from the delayed-replication DC by looking in the Active Directory Sites and Services snap-in. After you find the production DC that has a connection object from the delayed-replication DC you want, right-click the connection object and select Replicate Now to force the production DC to pull updates from the delayed-replication DC. The restored object should now replicate back to the production DC.
Recovering Crucial Information about the Deleted Object
If a user object has been deleted, restoring the object won't necessarily restore everything about that user. For example, when you restore a user object in Win2K, group memberships are lost. Therefore, you might also want look at the user's properties in the Active Directory Users and Computers snap-in. You can gather the group memberships for the user on the Member of tab of the account's Properties sheet. Windows 2003, in contrast, does a good job of fixing the domain group memberships after a restore. However, in either OS, membership in local groups of trusting domains will still be lost.
Keeping close track of local group memberships and logging that information will let you repopulate local groups after a user restore. This task might be tedious if you don't use some form of scripted automation. For more information about restoring groups, see "Resources," below.
Of course, other types of objects in AD might require restoration. One example is DNS data. Be mindful that DNS data might be stored within an application partition. Windows 2003 lets you move DNS data out of the default naming context and into an application. By default, application partitions aren't replicated to all DCs. For more information about how to ensure that your disaster-recovery plan includes application partitions, see the sidebar "Including Application Partitions,".
You might think delayed replication sounds great, but the cost of having several extra servers sitting around, doing very little other than replicating once per week, will make the solution a hard sell to those in control of the IT budget. Bear in mind that a recovery site reduces the number of personnel necessary to recover a deleted object and decreases the amount of lost productivity for the affected user.
Besides using the justification that delayed replication is an insurance investment, you can further mitigate the up-front costs through the use of virtual servers. Assuming you have sufficient memory and processing power, all your recovery DCs could reside as virtual-server instances on one virtual-server host.
Turn Back Time
Recovery of deleted AD objects can be a lengthy process that involves more than one support group, particularly in midsized to large companies. Coordination of efforts and backup-tape location can lead to lengthy downtimes for users. In the event that a user account or entire subtree of objects is deleted, rapid recovery is crucial to keeping your business running smoothly. Using a delayed-replication site to facilitate the recovery of deleted objects is like turning back the hands of time.
| MICROSOFT ARTICLES |
"How to restore deleted user accounts and their group memberships in Active Directory"
"Authoritative restore of groups can result in inconsistent membership information across domain controllers"
"HOW TO: Perform an Authoritative Restore to a Domain Controller in Windows 2000"
"HOW TO: Manage the Application Directory Partition and Replicas in Windows Server 2003"
"How to Optimize the Location of a Domain Controller or Global Catalog That Resides Outside of a Client's Site"