Implement virtual domain controllers while maintaining fault tolerance and security
Virtualization is all the rage because of the cost savings and flexibility it can bring to your data center. The first step companies usually take is to consolidate their physical servers onto host machines as virtual machines (VMs). Company management naturally wants to maximize savings by virtualizing as many servers as possible. When companies go through this process, the policy is often "virtual by default": Applications will be virtualized unless you can provide a good reason they shouldn't be virtualized. Can you virtualize Active Directory (AD)? Should you virtualize your AD forest, or part of it?
Virtual vs. Physical
The first and most important question is: "Does Microsoft support virtual domain controllers (VDCs)?" Moving a chunk of your critical infrastructure to an unsupported configuration is definitely a career-limiting move. Fortunately, Microsoft does support VDCs as part of Microsoft server software on both Microsoft and third-party virtualization products; you can find complete details of the company's support policies in the Microsoft article "Microsoft server software and supported virtualization environments." However, there are some important best practices you must pay attention to. Just because a configuration is supported doesn't mean you can't get yourself in trouble with it. Microsoft's Problem Resolution Services will be happy to help you—at a price—but if you follow the recommendations in this article, you won't need their help.
The next decision is when to virtualize a domain controller (DC) and when you should leave it physical. Performance isn't really a factor anymore; the 64-bit hypervisors available from VMware and Microsoft provide excellent performance compared with physical hardware; for instance, the Microsoft article "Performance and capacity requirements for Hyper-V" reports results of running Microsoft Office SharePoint Server 2007 in a virtual environment. Virtualization host clusters let you use features such as VMware VMotion or Hyper-V Live Migration to create highly available DCs more easily than ever. Still, I think there are two compelling reasons to keep at least some physical DCs in a forest: fault tolerance and security.
AD is fault tolerant because it's a distributed system. A company might have anywhere from the recommended minimum of two up to hundreds of DCs providing AD services. The domain or forest will survive the loss of one or more DCs because no single DC contains unique information that can't be recovered or otherwise reset. In a purely physical AD installation, there's an implied fault tolerance provided because each DC is a different physical box, and they're spread across physical locations. In a virtual infrastructure, you can't make these assumptions. For example, you could have several DCs on a single host, putting them all at risk if the host fails. Or your company's standard virtualization plan might call for all servers to use a SAN instead of local disks, which exposes much or all of your AD to a SAN failure. (For more information about AD storage, see the sidebar “For DCs, Simple Storage Is Better Storage.”) Therefore, when you're designing a virtualization plan for your AD forest, look closely at the supporting infrastructure and work with the virtualization team to eliminate any single points of failure. I'll talk about security reasons to not virtualize your DCs later in this article.
I recommend leaving at least two physical DCs in each domain, one of which should be the PDC Flexible Single-Master Operation (FSMO) role holder. This architecture ensures that if your entire virtual infrastructure becomes unavailable, you'll still have a fully functional domain with distributed fault tolerance. It's up to you to provide a sense of perspective: The cost of keeping two servers on physical hardware is dwarfed by the potential cost to your company of losing an entire domain.
Building and Deploying VDCs
After you've decided what to virtualize, it's time to configure your VDCs. From a purely technical viewpoint, this is a straightforward process. If your DCs run Windows Server 2008 or Server 2008 R2, consider using Server Core for the OS because of its reduced attack surface. Choose processor and memory requirements to emulate your current configuration—or what you'd like your current configuration to be if you could have afforded it. Ensure that the virtual machine enhancement for your virtualization solution (e.g., VMware Tools) is installed on the VDC. If it's a Hyper-V installation, be sure the VDC is using the synthetic network adapter rather than the legacy emulated adapter; the synthetic NIC is much faster.
You can use either fixed or dynamically expanding disks for the hard disk configuration; Microsoft now claims that Hyper-V R2's dynamic disk performance is nearly identical to fixed disks. However, a DC's disk requirements are fairly static, so after you've determined the optimal disk size for your DC—by looking at your physical DC's disk usage—I would recommend creating a fixed disk of the same size. Write caching on volumes that contain the AD database and log files is disabled by default to ensure that any interruption in the I/O process doesn't corrupt data.
You should also evaluate deploying read-only domain controllers (RODCs) in your forest. Because an RODC has only a read-only copy of AD, with no passwords by default, it helps mitigate some of the security concerns associated with VDCs. RODCs require at least Server 2008.
Disable the Synchronize time with host setting for your VDC; DCs have their own time-synchronization architecture and don't need or expect any other synchronization. If you're using Hyper-V, be sure that the virus scanner in the parent partition is excluding the VHD files of the child partitions or you might encounter performance problems and error messages when trying to start up VMs.
A VDC can be deployed in the same manner as other VMs—typically, with a management product such as Microsoft System Center Virtual Machine Manager (VMM) or VMware vCenter. If you need to run a highly automated DC deployment, the Dcpromo process can be scripted to run as a post-deployment option; see the Microsoft articles "Configuring the Automatic Installation of Active Directory" and "How to Configure Guest Operating System Profile Scripts."
The most important technical principle to remember when administering VDCs is that you don't want to pull any virtualization tricks on a VDC that the directory service isn't aware of. What does this mean? Virtualization lets you do interesting and useful things with a VM that you can't do with a physical machine, such as take snapshots that let you quickly roll a system back to a previous state, or restore the entire VM from a backup of the image file, or make copies of the image file for safe keeping or reuse. Don't do these things with a VDC, or you'll be setting yourself up for that Microsoft support phone call.
Why? Remember, AD is a distributed system. If AD resided on only one DC, these operations might be safely possible. But because the multiple DCs in a domain or forest must communicate with each other, each DC must therefore have a correct understanding of every other DC's state. Virtualization capabilities such as snapshots, image-based restores (with one exception), and cloning don't pass their state changes to the directory service on the target VM; it has no idea what's been done to it and therefore neither do its replication partners. This condition can wreak havoc in your domain or forest. Let's review what virtualization operations are supported for DCs, and which aren't.
Image-based (aka host-based) backups. Restoration from image-based backups, in which you copy or otherwise back up the virtual hard disk files that contain the VDC, isn't supported (with one exception). In this kind of operation, the OS and AD database are returned to a previous state without resetting the invocation ID (the version of the local database) so the other DCs don't know the target DC has been restored. This situation violates AD's data integrity and can create lingering objects or an update sequence number (USN) rollback scenario; you can find out more about this problem in the Microsoft article "How to detect and recover from a USN rollback in Windows Server 2003."
The exception is when the guest OS is running Windows Server 2003 or later and the backup utility on the host, such as Windows Server Backup, calls the guest's Volume Shadow Copy Service (VSS) writer to ensure the guest is backed up properly; Windows 2003 was the first OS to include this service. The guest VSS writer takes a volume snapshot of the guest, which ensures data integrity of the backup. In the event of a restore, the VSS-aware restore program notifies the guest's directory service that a restore has taken place. This process resets the AD database's invocation ID, which causes the DC's replication partners to recognize a restore has been performed, so replication coming from the DC is valid.
Client backups. The other supported method of backing up a VDC is by running client backups, just as if it were a physical DC. This process isn't as speedy as a host-based backup that uses the VSS writer, but it has an advantage over many current host-based backup applications because you can restore individual files on the guest. Most host-based backup applications don't support file-level restore, but as they become more sophisticated (for example, Microsoft System Center Data Protection Manager 2010), they, too, can restore individual files from guest OSs that support VSS. Microsoft has documented its best practices for backing up and restoring VDCs in the article "Backup and Restore Considerations for Virtualized Domain Controllers."
Should you even back up every VDC? I'd argue that for small forests, you should take system-state backups of two DCs in every domain, period. Larger forests with large (over 5GB) AD databases (ntds.dit) or geographically dispersed DCs should have more, following the principle of keeping a backup on the same LAN as the DCs, to speed the process of performing a Dcpromo from media. If you should lose a VDC for some reason, there are faster options for recovery than restoring one from backup. (For other options, see the DC Recovery page of my Active Directory Recovery Flowchart.)
VM snapshots. Restoring a VDC using VM snapshots isn't supported. These snapshots (not to be confused with directory snapshots taken with Ntdsutil or volume snapshots taken by VSS) are a point-in-time capture of a VM's state. Restoring a VDC to its previous state by using a saved snapshot causes the same inconsistency problems in your directory as an image-based backup.
Cloning. Cloning a DC by duplicating a VDC's hard disk file isn't supported. If the cloned VDC comes online in the same forest as the original, and you resolve the immediate problems with identical server names and IP addresses, you'll encounter problems with duplicate directory service agent (DSA) GUIDs, duplicate SIDs, duplicate Relative Identifier (RID) pools—and worse if the cloned VDC is a RID master—secure channel problems, machine account password updates . . . you just don't want to go there.
Physical to virtual (P2V) conversion. P2V conversion is supported, but only if the source physical DC is offline; VMM 2008 enforces this requirement. DC P2V conversion with the source DC online creates a problem similar to cloning. Frankly, I believe provisioning and promoting a new VDC is safer and just about as fast as performing a P2V conversion on an existing DC.
Pausing. Pausing a VDC (i.e., putting it in suspended animation) is actually OK, just "do not pause the domain controller for long periods," to quote the Microsoft article "Considerations when hosting Active Directory domain controller in virtual hosting environments" (support.microsoft.com/kb/888794). What happens when you pause a DC? To its replication partners, it suddenly falls off the network—the equivalent of pulling out the network cable. When the paused DC comes back online, time has suddenly jumped forward. Its Kerberos tickets have expired, its machine passwords might need to be updated, and if it's been paused longer than the tombstone lifetime, it can no longer replicate and must be rebuilt. I'd suggest pausing be used sparingly and not for extended periods of time.
Standardized configuration. Because a VM requires a different hardware abstraction layer (HAL) and a different device driver set than what you're using for your physical DCs, VDCs require a separate OS build standard. Most companies have at least two standard build configurations, one for widely deployed hardware nearing its end of life, and one for new hardware beginning a broader adoption. VMs, because of their HAL and device driver set, will require a third build configuration.
Security best practices for VDCs are a combination of the established best practices for DC security, such as physical security, and virtualization security, such as isolated networks. One hazard of virtualizing DCs is that your directory services team and virtualization team probably aren't familiar with each other's security practices. These teams must sit down together and review how to accomplish both teams' requirements. Here are a few examples of important security considerations.
Virtual disk security. Access to the VDC's virtual disks is the same as granting physical access to a physical DC; if you grant access, you can't guarantee security. Access to these virtual disk files must be carefully protected, especially because more people will require access to them as a result of virtual host administration needs. Therefore, host admins, enclosure admins, SAN storage admins, and data center admins are all groups that might need to be added to the list of personnel that are flagged as having access to corporate directory information.
Console access. DC administrators should be granted console access to VDCs in the same manner they would have access to physical DCs via an out-of-band console utility that doesn’t require an installed OS. In a VMware shop, you can use vCenter Server to manage console access, and in a Hyper-V installation you can use Authorization Manager (AzMan) or VMM's Self-Service Portal.
DC awareness. Full VDCs hold the "keys to the kingdom," and personnel with administrative access to the host have the ability to access and possibly disrupt activity of the VDC on that host. It's essential that all personnel with host access be trained to understand the implications of having a DC on their host servers.
RODCs. You can reduce some of the security risks associated with VDCs by deploying RODCs instead of full DCs wherever possible. RODCs don't perform any writes to AD, and by default user and machine account passwords aren't replicated to them. So, for example, if a virtual RODC's hard disk file is stolen, the attacker can't crack passwords out of it. A corrupted RODC hard disk file can't harm the rest of the forest, nor will any changes made to it be replicated to the rest of the forest. This situation doesn't mean a compromised RODC is harmless; possession will reveal organization structures, DNS records—in general, lots of information you don't want to share.
Do Your Homework
Virtualizing some of your AD infrastructure might yield corporate benefits, but there's practically no benefit to the AD administrator. It can be done though, and Microsoft supports it, but you must do your homework before you begin. The key Microsoft VDC documentation can be found in TechNet article "Running Domain Controllers in Hyper-V." Don't do anything to your VDCs that their directory services can't comprehend, and be aware that the very advantages virtualization brings to VDCs also mean that their security is more complicated.