Q. Are Update Domains used by Azure IaaS?

A. There are two key concepts to availability for Azure services; fault domains and update domains. Behind the scenes you can think of Azure as a large number of servers in racks in datacenters. A rack can be considered a potential point of failure since it has a common top of rack switch, power and so on. Therefore a fault domain can be thought of as a rack of servers. An update domain is a group of servers and are used when updates are applied, for example a workload can be split over multiple update domains. When deploying a new version of that workload in a Platform as a Service (PaaS) model only one update domain at a time is updated minimizing the reduction in instances of your service. Update domains are also used when Azure planned maintenance is performed, such as updating the host OS of each server to again minimize the reduction of your instances in service. It is this second use case that applies to Azure IaaS. IaaS VMs in an Availability Set are split over two fault domains and five update domains and these numbers cannot be changed. The VMs are assigned in a round-robin pattern. This gives the Availability Set protection from a rack failure and also minimizes the impact on instances in the Availability Set by only taking down one update domain at any one time during planned maintenance. If you look at the instances within a cloud service the fault domain and update domains are shown.

In this example the 3 instances are split over two fault domains (the maximum) and three update domains (because there are only 3 instances). If there were more than 5 instances then update domains would be reused. For example consider an Availability Set with eight VMs. The distribution would look something like the following:

VM Fault Domain Update Domain
VM1 0 0
VM2 1 1
VM3 0 2
VM4 1 3
VM5 0 4
VM6 1 0
VM7 0 1
VM8 1 2

Notice that the VMs are distributed over two fault domains and five update domains with some VMs in the same update domain. During planned maintenance only one update domain would be taken down at any one time meaning no more than two VMs would ever be down, for example if update domain 0 is taken down for maintenance then VM0 and VM6 would be taken down. Once update domain 0 maintenance was complete and VM1 and VM6 running update domain 1 would be updated with VM2 and VM7 being unavailable and so on.