New high-availability and migration features bring the "wow"
Microsoft has been clear in its message that Windows Server 8 is the OS and virtualization platform, for both private environments and the public cloud. Hyper-V provides functionality that allows Windows Server 8 to be a true cloud solution. This typically means enough scalability, flexibility, and security or isolation capabilities to handle all the possible scenarios in a cloud solution that's shared by different business units or even different organizations.
A Bit of Background
Hyper-V was added soon after the release of Windows Server 2008. The original solution offered solid performance, but mobility was limited to quick migration, which required saving virtual machine (VM) memory and state to disk. Because there was no true shared storage, the disk was dismounted from the current host and mounted on the new host; memory and state were then loaded from disk, and the new VM was started. Clients were disconnected during this quick migration -- not a popular action. Server 2008 Hyper-V offered no support for NIC teaming.
Server 2008 R2 added live migration, giving a zero-downtime planned-migration capability and allowing all nodes in a cluster to concurrently read and write to a shared NTFS volume (using the Cluster Shared Volumes feature). Server 2008 R2 Hyper-V supports NIC teaming, although implementations vary by vendor. Support for more-advanced networking features, such as jumbo frames and virtual machine queue (VMQ), was also added. Server 2008 R2 Service Pack 1 (SP1) added Dynamic Memory for powerful memory-optimization capabilities and Microsoft RemoteFX for server-side graphics rendering in Microsoft Virtual Desktop Infrastructure (VDI) implementations. However, VMs were still limited to four virtual CPUs (vCPUs), 64GB of memory, and 16 nodes per cluster.
For most organizations, Server 2008 R2 SP1 and Microsoft System Center Virtual Machine Manager (SCVMM) meet the requirements for machine virtualization and provide a great experience. But some companies still want to see improvements in certain areas. As I speak to clients, these wished-for capabilities are the ones I hear about the most:
- scalability of VMs (or more than four vCPUs in a VM)
- ability to migrate VM storage without downtime
- ability to merge snapshots while the VM is online
- more nodes in a cluster
- ability to use live migration to migrate more than one VM at a time and to migrate between unclustered nodes
- fully supported, native NIC teaming solution that can include NICs from different vendors
- network virtualization and isolation
- native Hyper-V cluster-patching solution
- ability to use non-local and SAN options, such as file shares, for VM storage
- larger Microsoft Virtual Hard Disk (VHD) support
- storage deduplication and VHDs larger than 2TB
Windows Server 8 Hyper-V promises to deliver all this and a lot more, with features such as 32 vCPUs per VM, 512GB of memory per VM, 63 nodes in a cluster, 16TB VHDX format, and a native NIC teaming solution that can be managed through the new Metro-style Server Manager and Windows PowerShell. In future articles, I'll dive into these improvements. For this article, I want to look into the high-availability and migration improvements in Hyper-V. Quite frankly, they're awesome.
Server Message Block Share Support
Before Windows Server 8, a zero-downtime migration solution required storage between the two nodes in the VM migration. Both nodes needed to see the storage so that they could access the VM configuration and storage. The only type of storage that multiple nodes could see was external, such as a SAN to which all the nodes in a cluster connected (through a medium such as iSCSI or Fibre Channel) and that was made concurrently accessible through Cluster Shared Volumes.
This external storage requirement can be a major issue for organizations that don't want to invest in that type of infrastructure or that prefer to use a NAS solution. For such organizations, Windows 8 introduces Server Message Block (SMB) 2.2, which features key new capabilities that allow VMs to be stored on an SMB file share, with confidence in the integrity of and connectivity to VM assets.
To use a file share, the file server and Hyper-V servers need to run Windows Server 8. VMs can then be stored on a file share for standalone Hyper-V servers and Hyper-V servers that are part of a highly available failover cluster, as Figure 1 shows. The use of an SMB file share for VMs can be compared to other virtualization solutions that use NFS for file-level remote storage. For great flexibility, multiple file shares can be used within a host or failover cluster.
Other enhancements to SMB in Windows Server 8, such as continuously available file shares, a Microsoft Volume Shadow Copy Service (VSS) provider for remote file shares, and multichannel connectivity for great bandwidth and failover, make the use of SMB for VM storage a first-class solution.
The introduction of live migration in Server 2008 R2 Hyper-V -- to enable the movement of VMs between hosts in a failover cluster, with zero downtime and no dropped connections from clients -- was a huge benefit. Live migration enabled many scenarios, such as evacuating all VMs from a host to other nodes in a cluster for patching, rebooting, and hardware maintenance, with no loss of service.
Live migration also enabled Performance and Resource Optimization (PRO) and Dynamic Optimization in SCVMM. PRO and Dynamic Optimization perform automatic rebalancing and movement of VMs, based on resource utilization, to ensure optimal performance for the VMs by redistributing them across hosts as resource demands change. The SCVMM Power Optimization feature can also use live migration to consolidate VMs onto fewer hosts during quiet times and temporarily power down unneeded hosts to save power (and then wake them when needed again).
Windows Server 8 builds on the success of Live Migration, broadening its use and scalability for the new scenarios and infrastructure changes that we see in the most recent data centers. Specifically, it adds the ability to perform concurrent live migrations, SMB live migration, live storage migration, and migrations where VMs have nothing in common but a shared Ethernet connection.
Concurrent live migrations. Live migration in Server 2008 R2 was restricted to one concurrent live migration operation between the two nodes in the cluster that was involved in the migration: the current VM owner and the target owner. The reason for this enforcement was fairly simple. Most data centers leverage a 1Gbps network infrastructure. A live migration action is highly network-intensive; all the memory is copied over the network between the hosts, and because the VM is running during the memory copy, several copy passes are required to recopy over memory that changed during the previous memory-copy action. (These copy actions get faster with each pass because the amount of data will be much less with each pass.)
Hyper-V was efficient in its use of the network and would saturate a 1Gbps link. If you performed multiple, simultaneous live migrations between two hosts, the network speed would be split between multiple moves. With the bandwidth split, the copy would take longer for each VM, and more memory would change during the copy, increasing the total time to move the VM.
Think of pouring a bottle of soda through a funnel. Pouring four bottles down the same funnel will take four times as long because the funnel is a limiting factor. Now imagine that as you're pouring out one of those bottles, someone is dripping more soda back into it until it's empty. As a result, the longer you take to empty the bottle, the more soda you actually need to pour, increasing the total pouring time. In this scenario, pouring one bottle's worth at a time actually results in a faster total emptying time for the four bottles.
The funnel is the network, the bottle of soda is a live migration, and the extra soda that's being dripped in is the memory change during the live migration. SCVMM helped handle multiple live migration actions by queuing them and performing them in a series, allowing administrators to queue bulk live migrations in the management interface, and then walk away.
Fast forward: 10Gbps networks in the data center are more prevalent and a single live migration is unlikely to saturate a 10Gbps network link (think "really big funnel"). To accommodate these changes, Windows Server 8 allows multiple concurrent live migrations between hosts. There is no fixed maximum number of concurrent live migrations, although you can specify a maximum number as part of the Hyper-V host configuration. Hyper-V will examine the network capability and the amount of available bandwidth and will tune the number of concurrent live migrations, based on current conditions, to ensure the best performance.
SMB live migration. The enhancements to live migration in a highly available environment are great. But one of the biggest changes is the ability to perform a live migration of a VM outside of a cluster configuration. This capability lets you migrate a VM between two Hyper-V hosts that aren't part of a failover cluster. The two types of live migration outside of a cluster environment are SMB live migration and a shared-nothing live migration.
In an SMB live migration, two Hyper-V hosts both connect to the same SMB file share, which contains the VHDs and configuration files of the VM that is being migrated. The only requirement for the SMB file share is that both Hyper-V host computer accounts must have Full Control rights on the folder and share. The basic mechanics for the live migration are the same as for a live migration within a failover cluster. However, because the VM isn't a shared resource, there are some extra steps:
1. A TCP connection is created between the host that is running the VM (the source host) and the destination host. The VM configuration data is sent to the destination, which allows a skeleton VM to be created on the destination host and a reservation for the required memory to be made.
2. The memory of the VM is copied from the source to the destination. Like a typical live migration, this process consists of an initial complete memory transfer, followed by several iterations to copy over changed memory pages.
3. When the amount of memory that is left to transfer can be copied without significantly affecting the freeze time of the VM, the VM temporarily is stunned. The remaining memory, plus the CPU and device state, are copied to the destination host.
4. The handles to the files on the SMB file share are transferred from the source host to the destination host, along with any physical storage that might be attached through the new virtual Fibre Channel adapter (another great feature in Windows Server 8 Hyper-V).
5. The destination VM is unstunned. The source VM is deleted.
6. A reverse Address Resolution Protocol (ARP) packet is sent out, forcing network switches to update their mapping of the VM IP address to the MAC of the new host.
The VM is typically unresponsive for just milliseconds -- way below TCP connection timeouts, so there's no effect on VM users. If you sat and watched a ping to the VM that was being migrated, you might see a longer than usual response time for one ping packet (or even a dropped packet), but nothing that a user would notice or that would cause a disconnection.
Live storage migration. Before I talk about the shared-nothing live migration capability, I want to introduce another type of live migration. Live storage migration allows a VM's storage (such as its VHDs) and configuration files to be moved while the VM is running. Moving VM storage can be useful in a variety of scenarios:
- moving a VM from local storage to a SAN
- moving VMs between SANs for rebalancing I/O or performing maintenance on a SAN
- moving VMs to an SMB share
- emptying an NTFS volume of VMs so that you can run a chkdsk operation
Whatever the reason, Live Storage Move allows you to move VM files between all supported storage mediums -- DAS, SAN, and file-based (e.g., SMB) -- with no downtime for the VM.
Live storage migration works differently from live migration. Live migration works with memory, which can be read to and written to quickly. Therefore, performing iterations of changes since the last copy works. But for storage, those many iterations might never catch up for busy disks.
As Figure 2 shows, the live storage migration process involves several steps that solve the problem of the comparative slowness of physical disks. In step 1, an initial copy is made of the storage, which includes the VHD files, configuration files, snapshots, and everything related to the VM. During this time, the VM reads and writes to the source storage. In step 2, after the initial copy is complete, the VHD stack mirrors all writes to both the source and destination storage locations, while making a single pass of copying the blocks that changed during the initial copy. Finally, in step 3, both the source and target are synchronized. The VHD stack switches the VM to read and write to the target storage only and then deletes the data on the source storage. The result is a complete move of the storage that's associated with the VM, with no downtime.
Shared-nothing live migration. If I had to pick one feature that's behind my "wow" reaction, this would be it: the ability to migrate a VM from one Hyper-V server to another Hyper-V server that isn't part of the same cluster, shares no storage, and has only a Gigabit Ethernet connection to the first VM -- all with zero downtime!
Shared-nothing live migration looks very like SMB live migration. But this time, we also need to move the storage, using the technology I just discussed for live storage migration. We're essentially performing everything in the SMB live migration scenario, plus a live storage migration, and maintaining the mirroring of writes to both the source and destination storage while performing a live migration of the memory and state, before finally switching the host that's running the VM.
With shared-nothing live migration, we can move VMs between any Windows Server 8 Hyper-V hosts, even when they have nothing in common but a shared Ethernet cable. I've seen a great demonstration of the shared-nothing live migration between two laptops, each with only local storage. The running VM that uses local storage moves to the second laptop, without any downtime for the VM. Now, imagine the same capability being used to move VMs between clusters, hosts, or even private and public cloud Infrastructure as a Service (IaaS) providers.
All the live migration technologies, including live storage migration, are used in planned migrations in typical day-to-day scenarios. Organizations face a completely different set of challenges when thinking of disaster recovery, which requires a different set of solutions.
Numerous solutions provide disaster-recovery capabilities for Hyper-V environments. However, these solutions typically involve expensive storage solutions that might be unavailable to small and midsized organizations. Hyper-V Replica, another new feature in Windows Server 8, allows asynchronous replication of a VM's storage from one Hyper-V host to another, even if the hosts are using completely different types of storage. An initial replication of the source VM's storage is performed over the network (by using a direct connection between the primary and replica server) or by saving the VM's storage to a network location from which the replica server reads. This approach is known as off-the-network seeding.
If there is insufficient bandwidth for the initial storage seeding on the replica to occur over the network -- for example, when creating a replica at a remote location from a VM with large amounts of storage -- then a backup can be taken of the VM on the primary server. This backup is shipped to the replica server and restored.
When the initial replication of the VM is completed, a delta replication of the VM storage is performed every 5 minutes. Because this replication is asynchronous and periodic, it isn't a real-time replication solution. In the event of an unplanned outage, a few minutes' worth of storage data could be lost when failing over to the replica -- a factor that should be considered when developing a solution that uses Hyper-V Replica. However, the benefit of this asynchronous replication is that there are no limitations on the scale of the VM to be replicated or high requirements for the network between the primary Hyper-V server and the replica. The exact bandwidth that's needed between the servers depends on the amount of storage change. However, some of the planned scenarios of Hyper-V Replica include a replica in a secondary site, connected via a WAN link.
Hyper-V Replica is not an automated failover solution. During a disaster, an administrator must manually activate the feature. Options exist to test the failover process and run the replica VM that connects to a separate test network so that it doesn't interfere with the production primary VM. In planned scenarios, the primary VM is shut down manually, and a final delta is sent to the replica, which applies the delta and then starts a reverse replication to what was the primary VM.
The actual configuration of Hyper-V Replica is a fairly simple process. Replication configuration is now a setting on all Hyper-V servers. That setting can be enabled with the option to use integrated Windows Authentication, with replication of the changes over port 80 (HTTP), or certificate-based authentication, with replication over port 443 (HTTPS). The latter type of authentication also provides encryption of the VM update data when it's sent over the network. The ports that are used can be changed for the replication configuration. Also, a server can be configured to accept replication from any Hyper-V server or specific Hyper-V servers, as Figure 3 shows. The only additional step on the replication target is to enable firewall rules to allow communication over the selected ports.
When a Hyper-V server that has replication enabled is available, a VM can be configured to have a replica through an Enable Replication action. When replication is enabled, the administrator is prompted to specify the target replica server and how the initial replication should be performed. The replication also allows you to create a certain number of optional recovery points, which are hourly VSS snapshots that ensure the integrity of the specific replica recovery point. The VM replicates, and the replication health can be checked at any time through a health report option that shows the number of replication cycles, the average size of the replication, and the time that the replication has been happening, as Figure 4 shows. You can also configure an alternate TCP/IP configuration for the replica VM when it's activated. This alternate configuration must be injected into the VM if the replica is hosted in a different network and network virtualization isn't used (another great feature of Windows Server 8).
Understanding Hyper-V Replica is important. The feature is intended for small or midsized businesses that want secondary-site disaster-recovery capability. Hyper-V Replica works by periodically sending VM storage updates to the second location. During a disaster, the replica is activated and the OS starts in a crash-consistent state at the point of the last storage delta from the primary. If this crash-consistent state isn't good enough, and if the recovery point feature is enabled, the administrator can select a recovery point. This point starts the replica at a VSS snapshot point, which ensures that the VM is in an application-consistent state. This out-of-the-box feature gives a good level of disaster-recovery protection without requiring high network speeds, and supports any type storage that Hyper-V supports. However, the feature isn't real-time or automated, so if you need a higher level of functionality, you should look at third-party solutions that build on Hyper-V.
Great New Capabilities
The new Windows Server 8 features for VM migration and replication give organizations a great new capability for keeping VMs available and mobile throughout an organization's IT infrastructure -- without needing complex and expensive infrastructure changes. The Hyper-V live migration and Replica capabilities are just a few of the enhancements, and this discussion is based on the beta of Windows Server 8 so functionality could change. But the features give us an idea of the level of advancements that we're going to enjoy in the next version of Hyper-V and Windows Server.