Storage virtualization gained significant attention during 2001, as both new and well-established vendors attempted to market their virtualization offerings as elixirs for all sorts of storage-related ills. Virtualization of storage would, depending on the vendor's enthusiasm, dramatically reduce or completely eliminate problems of storage administrative costs, storage-capacity allocation, and data integrity. Although most vendors have since retreated somewhat from their exaggerated claims, virtualization still maintains the glow of tremendous promise, awaiting fulfillment in real, workable products.

The curious reader might ask, "Why virtualize storage at all? What's wrong with ordinary physical storage?" Physical storage would be less problematic if you could simply store data on a high-performance, reliable, and ever-expanding disk. In reality, even new disk drives have finite capacity, fixed access speeds, and half-lives. Storing corporate data, therefore, requires hundreds or thousands of individual disks, often with front-end logic to accelerate access time or provide redundancy.

The first step toward virtualization is consolidation of individual disks into RAID arrays, which improves performance and reliability. If one storage volume is actually eight individual disks with data striped across the disks for enhanced performance, virtualization has made the physical storage into a logical presentation. The administration of the eight individual disks generally occurs only once, when you initially set up the RAID configuration (either host-based or array-based). Thereafter, you can manage the physical storage devices as one entity, with an immediate savings in administrative overhead. Further, if the RAID logic can automatically reconstruct data in the event of an individual disk failure, data is available without additional human intervention or oversight.

RAID is a virtualization primitive, a granular unit that additional layers of virtualization can manipulate in turn. You can virtualize two or more RAID arrays, each representing a low level of physical storage abstraction, into one logical resource as a disk pool. Depending on the virtualization engine used, you can present disparate storage arrays and disks in the most appropriate logical entities to support a diversity of storage applications. For example, you can use a more economical Just a Bunch of Disks (JBOD) as a secondary mirror for a much more expensive primary RAID storage array. With additional intelligence, storage virtualization solutions can be application-aware so that you can automatically enforce policies for requisite access speed, security levels, and backup.

Ideally, administrators should be able to simply attach shared storage to their networks and let virtualization intelligence determine the appropriate allocation and placement of server data. With most currently available products, however, the configuration of a virtualization platform can be as labor-intensive, if not more labor-intensive, than manual administration of multiple physical-storage devices. In addition, vendors have yet to deal with many of the inherent availability concerns associated with shared storage. If you hide physical storage from view, how do you deal with failures of individual physical components without loss of data?

Hiding the complexity of physical storage from the user doesn't make that complexity go away but simply shifts responsibility for managing that complexity—onto the virtualization engine, in this case. For storage virtualization to live up fully to its promise, vendors need to produce robust, sophisticated platforms that can make storage self-managing and self-healing. In combination with the ubiquitous access that IP storage networking provides, virtualization products will let you focus more energy on providing users with applications and advanced services and spend less energy on storage plumbing and capacity management.