You can learn a lot when software goes bad. Some of that learning is proved by understanding the context that provoked the failure; more is by thinking through why the circumstances might have caused the software to malfunction. The case described here revolves around the ownership of a database - is it "owned" by an individual Exchange standalone server or is it "shared" between the members of a Database Availability Group (DAG) that host a copy that can be activated. Exchange 2013 got confused when a server was removed from a DAG. Fortunately the confusion is relieved by Exchange 2013 SP1. Or so we hear...
The bug reported by Paul Cunningham in his article “Exchange Server 2013 Error: An Inconsistency in the Active Directory Was Detected” describes a scenario when an attempt is made to create a new mailbox database only for Exchange to report an inconsistency in Active Directory. The error text indicates that MsExchMasterServerOrAvailabilityGroup property for the new database was invalid. Because of the inconsistency, Exchange throws its hands up in the air, toys out of the cot, and generally misbehaves and the database is not fully created.
In fact, the necessary entry for the new database is created in Active Directory but the directories for the database and its transaction logs are not, which means that the database only exists in the Exchange configuration data stored in Active Directory and cannot be mounted, which is not really the kind of situation you’d like to encounter. Fortunately, as Paul reports, Exchange is pacified if you manually create the directories and mount the database as the server possesses enough intelligence to then recognize that all is well and it will create the database file, transaction logs, Search Foundation metadata, reserved files, and all the other paraphernalia that surrounds a mailbox database.
But the question then remains as to what caused Exchange to get upset. “A bug” is the normal reply, one that is absolutely correct. Some code misfired and the net result was a bad value was created in Active Directory. As it turns out, the bug seems to be that the cluster service is enabled on the server (because it was previously part of a DAG) and the clean-up that should have occurred when the server was removed from the DAG didn’t happen. Thus, when Exchange next came to create a new mailbox database, it was uncertain whether the server was standalone or a member of a DAG. Good housekeeping mandates efficient clean-ups and it just didn’t happen in this instance.
For many years, an Exchange database was tightly associated with the server to which it “belonged.” The notion of database portability did not exist and was not challenged until the advent of continuous cluster replication (CCR) in Exchange 2007, the forerunner of today’s DAG. CCR allows for two copies of a database where a DAG can accommodate up to sixteen, the point being that the active copy of the database can be hosted on any server that has a copy. Thus, a database does not have the same kind of tied relationship to a server that existed in the past.
Unless, of course, a database belongs to a standalone server. Much as Microsoft refers to the DAG as a fundamental building block for Exchange, there are still plenty of standalone Exchange 2010 andservers in production, and the databases running on these servers have no copies and are not going anywhere. They are in effect as tied to their servers as the databases running on an Exchange 2000 server.
Which brings us back to the MsExchMasterServerOrAvailabilityGroup property because it is the way that Exchange “knows” whether a mailbox database belongs to a standalone server or is part of a DAG. When you create a database on a standalone server, Exchange puts the distinguished name (DN) of the server into MsExchMasterServerOrAvailabilityGroup. However, if you create a new database on a DAG member, Exchange puts the DN of the DAG object into the property.
Another way of looking at the data is to run the Get-MailboxDatabase cmdlet and examine the “MasterType” property. In this excerpt, we see that the first four databases belong to a DAG while DB5 is owned by the ExServer3 server. The server name has no real meaning when a database belongs to a DAG because the active copy can run on any server that holds a copy. In this instance, the server name merely indicates the server on which the database was originally created.
Name MasterType Server
---- ---------- ------
DB2 DatabaseAvailabilityGroup EXSERVER1
VIP DatabaseAvailabilityGroup EXSERVER2
DB3 DatabaseAvailabilityGroup EXSERVER2
DB1 DatabaseAvailabilityGroup EXSERVER2
DB5 Server EXSERVER3
When you add a server to a DAG, the process of transforming the standalone server to a DAG member involves an enumeration of the mailbox databases that belong to that server and transfer of control to the DAG. That process is reversed if you then remove the server from the DAG and the MsExchMasterServerOrAvailabilityGroup property is populated with the DN of the server again. Lots of other things happen when a server joins or leaves a DAG but this description suffices for the purpose of this discussion.
The Active Manager process, part of the Microsoft Exchange Replication service, is responsible for managing the databases in the DAG, understanding what copies exist and where the current active copy is located, and generally making sure that clients are always able to connect to their mailbox. Active Manager can only orchestrate database activations of the databases are under its control rather than belonging to a particular server.
Getting back to the problem, my hypothesis is that the lingering indication of DAG membership provided by the enabled cluster service convinced Exchange that the new mailbox database should be a DAG resource. However, when Exchange checked with Active Manager to validate that the server was part of the DAG, it received an error which surfaced as an inconsistency in the configuration data reported by Active Directory and what was known to Exchange. After a mailbox database is created, Exchange normally creates the necessary files and then mounts the database. These steps didn’t happen because of the error condition.
According to well-placed sources inside Microsoft (code for “I know who told me this but can’t name them”), the bug is fixed in Exchange 2013 SP1. Cue for spontaneous sighs of relief all round.
The curious thing is that I can’t figure out is why the bug was not picked up sooner in the eighteen months or so that Exchange 2013 has been in the wild. Or why it was never detected by Microsoft’s testing for that matter. Perhaps people always add servers to DAGs and never remove them. But I think not.
All of which goes to prove that making sure that everything is done right all the time in software is difficult, especially with the kind of complex software we take for granted today.
Follow Tony @12Knocksinna