How to virtualize Windows storage
With new versions of Windows hitting the shelves, we’re seeing lots of exciting new storage features. Both Windows Server 2012 and Windows 8 deliver a new functionality called Storage Spaces and Pools, which provides users with a number of new capabilities, including the following:
- A method of virtualizing storage
- RAID functionality that would otherwise be available only through expensive storage hardware
- Support for thin provisioning
- Scripted management via PowerShell
- Redundant data copies that can be used to repair file system problems
- Integration with Clustered Shared Volumes (CSV)
You’ll find the UI for Storage Spaces and Pools in the Control Panel Storage Spaces applet (Windows 8) and in Server Manager (Server 2012); you can also use PowerShell cmdlets (both OSs). For the most part, this article will refer to the Server Manager interface. The Windows 8 client version is simplified and differs greatly in appearance. However, the underlying technology is the same.
You can set up Storage Spaces and Pools on a wide variety of storage hardware. The supported bus types are Universal Serial Bus (USB), Serial ATA (SATA), and Serial Attached SCSI (SAS).
Although you can use Storage Spaces and Pools in conjunction with LUNs through either Fibre Channel or iSCSI, it isn’t a supported configuration. Users with such high-end storage solutions should look to their respective storage vendors to make best use of the functionality that they provide. Storage Spaces and Pools is geared toward less expensive storage solutions, to introduce functionality that would otherwise be unavailable.
Creating a Pool and a Storage Space
A pool is simply a logical grouping of physical disks, whereas a storage space is a virtualized disk that can be used like a physical disk. For this reason, using Storage Spaces and Pools to create a storage space is a two-step process: First, you create the pool; second, you carve out a storage space—called a virtual disk in Windows Server. Be sure not to confuse Storage Spaces and Pools virtual disks with Virtual Hard Disk (VHD) or VHDX files. The terms are similar but they don’t have anything to do with each other.
You can use the Server Manager interface to create your functional pool. You start out with a default pool called the Primordial Pool, which is merely a list of physical disks attached to the computer that can be pooled. The Primordial Pool doesn’t count as a functional pool. The wizard will prompt you for the name of the pool and the physical disks to be added. Once created, the new pool will show up in the Server Manager interface. (Note that although Windows allows you to create a multitude of pools, it’s recommended that you not create more than four.) The following three-line PowerShell script performs the same operation:
$stsubsys = (Get-StorageSubsystem)
$physd = (Get-PhysicalDisk PhysicalDisk1, PhysicalDisk2, PhysicalDisk3, PhysicalDisk4)
New-StoragePool -FriendlyName MyPool1 -StorageSubsystemFriendlyName $stsubs.FriendlyName -PhysicalDisks $physd
Now that you have a pool, you can create a virtual disk (called a storage space in Windows 8). The wizard will prompt you for the name of the storage pool used, the name of the virtual disk, the type of storage layout, the provisioning type (thin or fixed), and the virtual disk’s size. I’ll review the choices in the next section, but when the wizard is complete, you’ll see the virtual disk that Figure 1 shows. The following PowerShell command performs the same operation:
New-VirtualDisk -StoragePoolFriendlyName MyPool1 -FriendlyName MyVirtualDisk
-ResiliencySettingName Mirror -UseMaximumSize
You can use this virtual disk just as if you were using a physical disk. You can configure it to either Master Boot Record (MBR) or GUID Partition Table (GPT) partition style.
Figure 1: Creating a Virtual Disk
Understanding the Choices
When you’re creating a virtual disk, you have three basic choices: the type of storage layout (i.e., simple, mirror, parity), provisioning type (thin or fixed), and virtual disk size. Other choices, such as pool name and virtual disk name, are more arbitrary in nature.
Layout. The storage layout is simply the type of RAID you want to use. You can choose Simple (RAID 0 or stripe set without parity), Mirror (RAID 1), or Parity (RAID 5 or stripe set with parity). You can create a simple set with one or more physical disks from the pool. Parity sets require three or more physical disks to be available in the pool. Finally, mirror sets can be created using either two or more physical disks for a two-way mirror, or five or more physical disks for a three-way mirror.
Provisioning type. The provisioning type is a choice between thin provisioning and fixed (aka thin) provisioning. This choice determines whether you want to pre-allocate all the sectors involved in your virtual disk or allow them to be mapped to physical sectors on a “just in time” basis. The virtual disk size is the size of the virtual disk that you want to create. If you select fixed provisioning, you’ll be limited to a size based on the available physical disks in the pool. However, if you select thin provisioning, you can enter a size that’s much greater than the physically available space. As you need them, you can add physical disks into pool.
Virtual disk size. The size of the virtual disk depends on what was selected for provisioning type, storage layout, and the size of the physical disks that were used. If you plan to create just one virtual disk in your pool, you can simply select the Maximum size option. Note that the Maximum size option will be grayed out if you select thin provisioning.
More on Thin Provisioning
Thin provisioning is a technology that allocates blocks of storage on an as-needed, just-in-time basis. In fixed provisioning, physical blocks are allocated to the virtual disk whether they’re in use or not. In thin provisioning, only the used blocks are mapped to physical blocks. This lets you provision a much larger virtual disk than what would be possible with fixed provisioning. If the virtual disk starts to push toward the boundary of what can be mapped to a physical block, you can add more physical disks.
The benefit of thin provisioning is that storage space isn’t stranded. That is, if you want to have a 10TB virtual disk, you don’t need to provide the physical space for it up front. You can provision a thin virtual disk that is 10TB and add additional physical disks as needed. To make this even more efficient, NTFS has been enhanced to work with the storage subsystem to reclaim space after files are deleted or optimized. Windows has also been optimized to work more efficiently with high-end storage solutions that include thin provisioning functionality. This includes the ability to reclaim unused sectors, like what Storage Spaces and Pools is doing.
Understanding the Architecture
Now, let’s review what’s going on under the hood to make all this happen. Figure 2 shows the Window's storage stack. The SSP driver (SpacePort.sys) plugs in to the stack just above Partition Manager (Partmgr.sys). When a physical disk is brought into a pool, a partition is created on it and the physical disk is hidden from the UI. In the next step, when a virtual disk is carved out of the pool, said virtual disk is then presented back to the UI as a logical disk. The physical disks are still observable in Device Manager, but a new Microsoft Storage Space Device is also listed for each virtual disk that’s created.
Figure 2: Windows Storage Stack
Figure 3 depicts how the partitions would look on the physical disks. This covers both legacy MBR disks and disks using the GPT scheme. The partition will have a small area dedicated to storing metadata for Storage Spaces and Pools. The bulk of the partition will be used for actually storing file data. Once a virtual disk is created, it can be configured as either MBR or GPT, then utilized as a physical disk normally would be. It can be formatted with either NTFS or Microsoft's new Resilient File System (ReFS).
Figure 3: How Partitions Look on Physical Disks
Deep Dive to Understand Additional Options
Storage Spaces and Pools can be configured with additional granularity to help increase performance. It’s helpful to understand this granularity when you’re adding physical disks to a preexisting virtual disk. Particularly in Windows 8, Storage Spaces and Pools is simple to use, but if you would like to have more control over our storage options, Storage Spaces and Pools can provide that too.
For the most part, you can experience this granularity when you use the PowerShell cmdlet, New-VirtualDisk. The elements we’re concerned with are NumberOfColumns (specifies the number of columns to create), NumberOfDataCopies (specifies the number of data copies to create), and ResiliencySettingName (specifies the name of the desired resiliency setting—for example, Simple, Mirror, or Parity).
Number of columns. Figure 4 shows a diagram consisting of three disks. The disks are divided into units. As you stripe across the disks, you’re able to write simultaneously to each spindle. In the RAID world, this is known as a stripe set without parity. Roughly, this is what you’re doing with a virtual disk with a “simple” layout.
Figure 4: Simple Layout
Each physical disk is a column in your virtual disk. The more physical disks that are available when the virtual disk is created, the more columns it will have—and thus, the more simultaneous writes can occur. This works similarly with parity sets. The more physical disks you start out with, the more columns will be in your virtual disk. The only difference is that some of the space is lost to the parity bits. Windows will scale to use as many as eight columns when a new virtual disk is created (even more if they’re created using PowerShell).
The element that is used to control the columns is NumberOfColumns. The following is an example of how a user can manually control this element and the ResiliencySettingName element. (This command would create a virtual disk with three columns.)
New-VirtualDisk -FriendlyName NewVDisk -StoragePoolFriendlyName MyPool
-NumberOfColumns 3 -ResiliencySettingName simple -UseMaximumSize
Mixing columns with data copies. A data copy is just that: a copy of the data. If you have redundancy in the form of a completely standalone instance, you’ll have more than one copy of the data. Otherwise, you’ll have just one copy.
- A simple space will have just one copy.
- Mirror spaces will have either two or three copies.
- Parity spaces have just one copy.
Only the mirror space has a complete copy of the data instance, as you see in Figure 5. Although the parity space is fault-tolerant, it doesn’t achieve that by using a completely separate instance of the data. Therefore, it still has only a single data copy. A three-way mirror would have three data copies. The downside to the extra data copy is that writes have to be carried out multiple times. This makes mirror spaces slower on writes. One of the drawbacks to mirroring is the slower write speeds due to having to write the same data multiple times.
Figure 5: Differences Between Simple, Mirror, and Parity
With enough physical disks available, Windows can mitigate some of the slower write speeds by striping within each data copy. In the example that Figure 6 shows, four physical disks were used to create a mirror space. So, within each data copy, you can write to two disks simultaneously. Mirror spaces created using the GUI can have as many as four columns (per data copy), but mirror spaces created using PowerShell can have more than four columns. (Note that the number of columns is only per each data copy.)
Figure 6: Four Physical Disks Used to Create a Mirror Space
You can use the New-VirtualDisk element, NumberOfDataCopies, to state the number of data copies. As an example, look at the following PowerShell command, which will create a two-way mirror space that has six columns, similar to Figure 7.
New-VirtualDisk -FriendlyName NewVDisk -StoragePoolFriendlyName MyPool
-NumberOfColumns 6 -NumberOfDataCopies 2 -ResiliencySettingName mirror
Figure 7: A Two-Way Mirror Space with Six Columns
More on Columns
In Storage Spaces, the number of columns typically goes hand in hand with the number of physical disks available when the virtual disk was created. The number of columns can be less than the number of disks, but not greater. Columns are important because they represent how many disks you can access simultaneously. For example, in Figure 8, there are two simple spaces. They both use two disks, but the one on the left is using one column whereas the one on the right is using two columns. For the simple space on the right, you can carry out I/O on both disks at the same time, making the speed theoretically twice as fast.
Figure 8: Two Simple Spaces
The number of columns used by a storage space is set when the space is created. If you use the GUI, the highest number of possible columns will be configured. The following logic applies:
- If using the GUI to create a space, the highest column setting that it will use is eight.
- Using the PowerShell cmdlet New-VirtualDisk will allow you to configure a NumberOfColumns setting higher than eight.
- Parity spaces can’t have more than eight columns (even if created with PowerShell).
Adding Space to Spaces
Adding disk space to a preexisting storage space can be tricky. Adding to a storage space is all about understanding columns and data copies. In Figure 9, a simple space was created using two physical disks. If you wanted to extend the virtual disk, you would first need to add a new physical disk to the storage pool, if one wasn’t available. However, if an attempt is made to extend the virtual disk after the disk is added, the task would still fail. The error indicates that physical resources don’t exist to support adding more space to the virtual disk, even though you just added a new blank disk to the pool.
Figure 9: One Simple Space Created with Two Physical Disks
The problem is in the number of columns. Windows must follow the same striping model that was used when the space was created. You can’t simply add an additional column. If this were allowed, you would lose all benefit of striping when the original two disks became full. In addition, you can’t tack the new disk onto the bottom of one of the current columns (for much the same reason). To extend a virtual disk, you need to add a number of disks equal to or greater than the number of columns in said virtual disk. Doing so will allow striping to continue in the fashion for which it was originally configured. The same is true in both simple and parity spaces. You must add a number of disks equal to or greater than the number of columns in the virtual disk.
When it comes to mirror spaces, you have to take into account both the number of columns and the number of data copies. For example, a two-way mirror created with four physical disks would look like Figure 10. NumberOfDataCopies equals 2, and NumberOfColumns equals 2. The number of disks needed to extend this virtual disk can be found using the following formula:
NumberOfDataCopies * NumberOfColumns
2 * 2 = 4
Figure 10: A Two-Way Mirror Created with Four Physical Disks
Four physical disks are needed to extend the example space, similarly to Figure 11. The same formula can be used for simple and parity spaces. However, NumberOfDataCopies will always equal 1 for both those layouts.
Figure 11: Four Physical Disks Extending the Example Space
Discovering the Number of Data Copies and Columns
If you don’t know how many data copies and/or columns that your virtual disk has, it’s easy enough to discover the answer by using either the GUI to find the NumberOfColumns and NumberOfDataCopies values. The following PowerShell command would reveal the same information:
Get-VirtualDisk -FriendlyName MyVirtualDisk | ft FriendlyName, NumberOfColumns, NumberOfDataCopies
ReFS on a Mirror
I want to mention an additional benefit to using Storage Spaces and Pools mirrors. Earlier, I referred to Microsoft's new file system, ReFS. If files or metadata were to become corrupt on ReFS, Windows can use the redundant copy on the other side of the mirror to repair the damage. This is made possible, in part, by the checksums that both the data and metadata have in ReFS.
Powerful Storage Features
Storage Spaces and Pools brings functionality to people using low- to mid-range storage that they otherwise would not have access to. It’s easy to configure, can be configured at a granular level for those who want to utilize additional options, and brings additional resiliency to ReFS. Storage Spaces and Pools supports thin provisioning, and like most things in Server 2012 and Windows 8, it can be scripted using PowerShell. Out of all the new storage goodies in Windows, I think this will be the one that people will use the most.