The Failover Clustering feature has gone through a huge amount of change in the recent versions of Windows Server. Windows Server 2008 saw great improvements to the simplicity of its setup and management, a new validation process for configurations, support for geographically distributed clusters, and more. Windows Server 2008 R2 added Cluster Shared Volumes (CSV) support and other improvements. Windows Server 2012 increased the scalability of clusters to 64 nodes, introduced support for scale-out file servers, completely changed how quorum works with dynamic quorum, added the ability to monitor processes running within virtual machines (VMs), and more. Windows Server 2012 R2 continues the tradition of adding improvements. Let's dive into the key new capabilities and changes in Server 2012 R2 Failover Clustering.

Quorum Improvements

Quorum used to be fairly complicated. The cluster architect had to decide whether to use a witness resource and, if so, whether that witness should be a file share or disk. If the number of nodes changed, this decision needed to be revisited because the witness is needed only if there's an even number of nodes in the cluster. Server 2012 introduced dynamic quorum, which automatically adjusts votes assigned to nodes as they're available. Although dynamic quorum solved the problem of over-zealous administrators taking down too many nodes, thereby causing the entire cluster to shut down, it didn't eliminate the need to manually configure a witness resource when node counts changed.

Server 2012 R2 introduces dynamic witness, which dynamically assigns the witness resource a vote, depending on the number of nodes in the cluster. If there's an even number of nodes, the witness is required and is given a vote. If there's an odd number of nodes, the witness isn't required and its vote is removed. This greatly simplifies the cluster configuration process and is the final change to essentially remove any "quorum mode" decisions.

When creating a Server 2012 R2 failover cluster, you always configure a witness resource. The Failover Clustering feature determines when the witness should have a vote. To check whether the witness resource currently has a vote, you can run the Windows PowerShell command:

(Get-Cluster).WitnessDynamicWeight

A value of 1 means the witness resource has a vote. A value of 0 means it doesn't have a vote.

There are other quorum improvements in Server 2012 R2. In Server 2012, dynamic quorum allows nodes' votes to be removed and added as the nodes' availability changes. However, there could still be a situation in which a cluster splits in half, with each partition getting 50 percent of the votes. This means no partition could make quorum, as more than 50 percent of the votes is required. Typically, the witness resource would handle this situation by making the number of votes odd, ensuring one partition would always be able to acquire more than 50 percent of the votes, but there are scenarios in which the witness resource fails and an even number of nodes remains.

In Server 2012 R2, the Failover Clustering feature has a new functionality that serves as a tie breaker when there's a 50 percent node split. When this situation occurs, the functionality will randomly pick one node in the cluster and remove its vote if there's an even number of nodes. Thus, there will be an odd number of votes in the cluster again and one partition of the cluster will be able to make quorum in the event of a communications breakdown between locations. If an actual node is shut down, the node that lost its vote would have the vote restored to ensure there's an odd number of votes in the cluster. This functionality works all the way down to two remaining nodes in a cluster.

If the idea of a node randomly losing its vote isn't desirable, you can make one location the primary location (i.e., the location in which you want to make quorum). You can then manually specify the node that should lose its vote in a tie breaker scenario. For example, if you want a node in the secondary location to lose its vote, you'd run a command such as:

(Get-Cluster).LowerQuorumPriorityNodeId = `
  (Get-ClusterNode -Name "<node name>").Id

In this command, you need to replace <node name> with the name of the node that should lose its vote if there's a tie.

As Figure 1 shows, you can view the vote status of each node in a cluster in the Nodes view of Failover Cluster Manager. This makes it easy to see the vote status within the cluster, which can be very useful given all the dynamic quorum capabilities.

Figure 1: Viewing Each Node's Vote Status

It's actually very hard to make a cluster not make quorum in Server 2012 R2, but there might be scenarios in which you have to force quorum for a cluster partition to enable it to provide services. To do so, you use the /fq switch when starting the cluster service:

net start clussvc /fq

In Server 2012 R2, the partition that starts with a forced quorum is deemed the authoritative partition. Afterward, if any other partition starts, it automatically starts in a prevent quorum mode. That partition then reconciles with the authoritative partition and resumes normal cluster mode without any other administrator intervention. Prior to Server 2012 R2, a number of manual steps had to be performed to achieve this.

CSV Improvements

CSV was initially introduced in Server 2008 R2. It allows NTFS-formatted LUNs to be concurrently accessed by all nodes in a cluster, eliminating the need to move LUNs between nodes as VMs are migrated. CSV also allows VMs to be run on different nodes that are all stored on the same CSV volume, which shows the simultaneous access nature of the CSV across the cluster.

In Server 2008 R2, CSV worked only with Hyper-V VMs. In Server 2012, the CSV role expanded because of its use by Server Message Block (SMB) for scale-out file services, which can be used for purposes such as Hyper-V VM storage and SQL Server database storage. In addition, CSV works very well with clustered Storage Spaces, another new feature in Server 2012.

Server 2012 R2 enhances CSV support in a number of key areas:

  • When using CSV with clustered Storage Spaces, CSV ownership is now dynamically distributed between the nodes in the cluster. The distribution of ownership helps to better balance the disk I/O between the nodes. As nodes leave and rejoin the cluster, the CSV ownership is dynamically rebalanced to ensure optimal distribution.
  • CSV resiliency has been increased through the use of multiple Server service instances for each node. This separates the different types of CSV traffic and minimizes the scale of impact in the event of a service failure.
  • The ability to diagnose CSV problems has improved. Failover Cluster Manager now shows the I/O mode for each CSV instance and the reason it's in that mode. 

CSV uses unbuffered I/O for read and write operations, which means no caching is ever used at the disk level. However, Server 2012 introduced the ability to use a portion of the system memory as a read cache for CSV at the OS level, which improves read performance. In Server 2012, the maximum amount of memory that you can assign to the CSV cache is 20 percent of the system's memory. In Server 2012 R2, this percentage has been increased to up to 80 percent of the system's memory.

There's only one step to enable CSV in Server 2012 R2. You just need to configure the amount of memory that can be used by the host for the CSV cache. For example, the following command sets the CSV cache size to 4GB:

(Get-Cluster).BlockCacheSize = 4096

There are two steps to enable CSV in Server 2012. First, you need to configure the amount of memory that can be used by the host for the CSV cache, using a command such as:

(Get-Cluster).SharedVolumeBlockCacheSizeInMB = 4096

Second, you need to enable the CSV cache on a per-disk basis. To enable a disk for the CSV cache, you can run the command:

Get-ClusterSharedVolume “Cluster Disk 1” |
  Set-ClusterParameter CsvEnableBlockCache 1

You don't need to run this command for Server 2012 R2 because the CSV cache is enabled by default. If you need to disable the CSV cache for a specific disk, it's important to know that the CsvEnableBlockCache property in Server 2012 has been renamed to EnableBlockCache in Server 2012 R2.

In Server 2012 R2, CSV supports more OS features, including:

  • Resilient File System (ReFS)—this is primarily aimed at archive type data and shouldn't be used for running VMs
  • Storage Spaces using tiering and write-back cache
  • Storage Spaces using parity (including dual-parity)
  • Data deduplication
  • SQL Server 2014 running on Server 2012 R2 CSV

Virtualization Changes

In "Windows Server 2012 R2 Hyper-V" and "Windows Server 2012 R2 Hyper-V: What's New, Part 2," I touched on some of the virtualization changes enabled by Failover Clustering. However, I want to cover them in detail here because Hyper-V is Failover Clustering's biggest client.

Creating guest clusters has been possible with Hyper-V VMs for many years. However, if the guest clusters required shared storage, technologies such as iSCSI and virtual Fibre Channel had to be used. Although this approach is effective, it directly exposes the physical storage fabric to the VMs, which isn't ideal in many environments.

Server 2012 R2 adds support for virtual hard disk sharing, which allows a virtual hard disk file (a VHDX file) to be simultaneously attached to multiple VMs through the SCSI controller. The shared VHDX file is exposed as shared Serial Attached SCSI (SAS) device to the VMs and therefore usable as shared cluster storage. The VHDX file must be stored on a CSV, which means it can be accessed directly by Hyper-V hosts on a local CSV or through a scale-out file server using SMB 3.0. (Scale-out file servers use CSV for the back-end storage.)

To share a VHDX file, you need to select the Enable virtual hard disk sharing option on the disk. You can find this option under Advanced Features in Hyper-V Manager, as shown in Figure 2.

Figure 2: Enabling a VHDX File to Be Shared

You can also use PowerShell to configure virtual hard disk sharing. You just need to use the Add-VMHardDiskDrive with the -ShareVirtualDisk switch, like this:

Add-VMHardDiskDrive -VMName savdalfc01 -Path `
 C:\ClusterStorage\Volume1\SharedVHDX\Savdalfcshared1.vhdx `
 -ShareVirtualDisk

Note that some features don't work with a shared VHDX file because multiple OSs are using it. For example, a shared VHDX file:

  • Can't be part of Hyper-V Replica replicated data.
  • Can't be moved using live storage. (You need to shut down the VMs before moving a shared VHDX file.)
  • Can't be dynamically resized.
  • Can't be part of host-level backups. (You can't back up the shared VHDX from the Hyper-V host.)

I expect that these limitations will be resolved in future versions of Windows Server, but currently it's important to be aware of them.

When you put a node into maintenance mode in Windows Server, workloads are drained in an organized manner. When you perform a live migration of VMs to another node in Server 2012, there's no impact to the VMs' availability as long as the node is put in maintenance mode prior to shutdown. If a node isn't placed in maintenance mode, there's an interruption to the VMs' availability because the VMs' states need to be saved, the VMs need to be moved, after which the VMs need to be resumed.

Server 2012 R2 fixes this behavior. It performs a live migration of VMs when a node is shutdown, even if it hasn't been placed in maintenance mode. Note that this doesn't mean you shouldn't place nodes in maintenance mode prior to shutdown. However, this change in behavior will protect virtual environments from periods of unavailability.

The final virtualization-related change continues the trend of ensuring virtual environments stay available as much as possible. A common problem with Failover Clustering logic is failure detection. If the node is running and the VM is running, the virtual environment is deemed healthy. However, a network adapter might fail or become disconnected, which means the VMs can no longer communicate externally with the host, which in effect makes them useless.

Server 2012 R2 introduces the protected network feature, which enables a VM's important network connections to be configured as requiring protection. As Figure 3 shows, you need to select the Protected network option. You can find this option under Advanced Features in Hyper-V Manager.

Figure 3: Configuring a Protected Network

In the event that the protected network isn't available, Failover Clustering will automatically live migrate the VM to another host that has connectivity to the required network. 

The New Active Directory-Detached Cluster

The last feature I want to tell you about is the new Active Directory-detached cluster, which is targeted at a very specific scenario. When you hear the name "Active Directory-detached," you might think that the nodes in the cluster don't have to be domain joined, but that's not the case. Every node in the cluster must be joined to the same domain. To understand what the new detached cluster feature has to offer, you need to have a little background information. When you create a cluster in Server 2008 or later, a Cluster Name Object is automatically created in Active Directory (AD) for the cluster. The Cluster Name Object is primarily used for Kerberos purposes, but it can cause some problems, as it requires creating objects in AD.

The Active Directory-detached cluster feature enables a cluster to be created without creating the Cluster Name Object. This option has to be set when creating the cluster and can't be changed after the cluster has been created. To set it, you need to create the cluster using the New-Cluster cmdlet with the -AdministrativeAccessPoint DNS parameter. (For information about how to use the New-Cluster cmdlet with this parameter, see TechNet's New-Cluster web page.)

The only real workload this feature is a good fit for is SQL Server clusters that don't leverage Kerberos. It's not a good fit for other workloads for various reasons:

  • It's not a good fit for clusters that use Kerberos because Kerberos is required to perform live migrations with Hyper-V clusters—and a Hyper-V cluster that can't live migrate is useless.
  • It's not a good fit for file-server clusters because SMB prefers to use Kerberos.
  • You can't use it if you want to use BitLocker to encrypt CSV because BitLocker requires the Cluster Name Object.
  • It's not a good fit if you're using Microsoft Message Queue Services (MSMQ) because MSMQ writes to AD.

This list really does stress that Active-Directory-detached clusters are only for SQL Server environments that use SQL Server Authentication and not recommended for anything else. However, this new feature is very useful for those SQL Server environments that use SQL Server Authentication and don't have permission to create objects in AD.

Closing Thoughts

Great enhancements have been made to both Failover Clustering and Hyper-V. Even more notable is that with each new release of Windows Server, Hyper-V and Failover Clustering have become increasingly intertwined. When used together, they provide a killer virtualization platform. In addition, Failover Clustering continues to be the high-availability backbone for many other services.