Q: What is a Windows Azure Availability Set?

A: It's useful to remember that, behind the scenes, the public cloud service offered by Windows Azure is actually racks and racks of servers (more exactly, many fault domains that include power and network equipment in addition to the rack of actual servers to avoid single points of failure).

Although Microsoft takes every precaution, there's the chance of failure of a rack, which will cause a brief interuption of virtual machines (VMs)/services running on that rack.

To avoid a single point of failure, you might deploy two instances of a service, for example deploying two domain controllers (DCs) into Windows Azure. However, you have no guarantee those two instances aren't running in the same rack, and therefore a rack failure would affect both instances!

By placing VMs in an Availability Set, the VMs are placed into separate fault domains and therefore separate racks, avoiding a single failure affecting all the instances in the Availability Set. Availability Sets can be created when the VM is created. You can also add a VM to an Availability Set by using the VM's Configure tab, allowing an Availability Set to be selected or a new one created.

The exact number of fault domains that the VMs in an Availability Set will be split over isn't exact. The Availability Set guarantees that not all VMs in the Availability Set will go down at the same time.

This means if there were three VMs in an Availability Set, it might be possible that two of them would be in the same fault domain. The fault domain can be viewed by looking at the Cloud Services view containing the VMs and looking at the Instances tab.

There should be different values for the fault domains--for example, if there were two VMs in an Availability Set, one VM would have a fault domain of 0 and the other a fault domain of 1.

Notice in my example below that I have three VMs in the Availability Set and two of them are in the same fault domain. It's therefore very important to make sure Availability Sets only contain VMs performing exactly the same function. This is because if you mix the functions of VMs into a single Availability Set, then the VMs performing the same function could end up in the same fault domain, which would be a very bad thing.