As I first mentioned in "The AD Backup Bug: Microsoft Comes Clean," September 2001, InstantDoc ID 21853, a major flaw exists in Windows 2000's backup and restore APIs in versions earlier than Service Pack 2 (SP2). In certain circumstances, this bug corrupts Active Directory (AD) backups. However, the system doesn't provide any indication or warning about the corruption. When you restore these corrupted backups, they prevent the domain controller (DC) from starting and cause it to display a Directory Service cannot start error message. The restore also results in the system recording several errors in the System log. If you then run Ntdsutil with the SEMANTIC DATABASE ANALYSIS command (to run the database semantic checker), you receive error message 550: Database is inconsistent. The problem affects all applications that use the backup APIs to perform AD backups, which includes system-state backups performed with Win2K's built-in Backup utility and most third-party backup applications. The problem occurs with AD backup utilities even when you use the verify option of such programs.

The following conditions must occur for the bug to manifest itself:

  • While performing an initial AD backup on a Win2K DC, several objects change because of local changes or replication. The changes to these objects generate additional transaction logs that in turn advance the Jet database checkpoint. The Jet checkpoint maintains a database of unflushed data. The system stores two copies of the checkpoint data: one in the database header of the ntds.dit file and a second in-memory copy written to the backup media.
  • The system performs a second backup on another, relatively inactive DC, during which the aforementioned log-file generation and Jet-checkpoint advancement don't occur. This second backup completes before the log-file generation and checkpoint advancement on the first system that's backed up.
  • The second backup is then restored.

The core of the problem is that the system writes an outdated record of required transaction log files and checkpoint data to the backup media, then later restores it in the form of the second backup. When restored, the header in the restored database references logs that aren't required for AD recovery, and some of these log files aren't included in the backup, which explains the log entries that state Log files are missing from system state. This information can be misleading because the log files aren't missing; instead, the number of log files referenced in the restored database header is incorrect. A relatively large first backup is more likely to produce the problem because of the commensurately larger window of time the second backup needs to finish (and for the Jet checkpoint to advance on the first DC). The problem is also more likely to occur in large backups (or if the backup media doesn't have a fast backup rate) because the backup process takes longer and provides more opportunity for the checkpoint file to advance. DCs in busy production environments are less likely to experience this condition during typical activity (creations, deletions, and modifications to objects) because in AD these activities result in a steady advancement of the Jet checkpoint.

You can avoid these problems by installing SP2 and making new backups of the system state. The fix is preventive in nature—it doesn't resolve errors that already occurred when you restored system-state backups with incorrect header information. If you use backups as a recovery method for Win2K-based DCs, consider these suggestions:

  • Install SP2 on production DCs. (To be consistent with good change-management practices, first install SP2 on DCs in a lab environment that represents your production configuration. Make multiple backups and initiate restore tests before installing SP2 on your production DCs.) Create new system-state backups and clearly label them as post-SP1 backups.
  • Inventory and clearly label existing backup media that you used before installing SP2. Place pre-SP2 backup media in locked storage. Also consider backups that you've stored on the local disks of computers in your organization. Better yet, destroy pre-SP2 backups. (Backup media that contains AD replicas has a limited lifespan determined by the tombstone age, so existing backups will eventually become obsolete.)