Windows Future Storage

Cataloging, sharing, and finding documents have always been problems, but these problems are becoming increasingly difficult as the number of stored documents increases. Microsoft is working on a solution by making significant changes to the Windows file system. The new file system, called Windows Future Storage (WinFS), will debut with Longhorn, the next major release of Windows that's expected to arrive in late 2005. WinFS will have a big effect on how we deal with and store documents.

WinFS is a database layer that sits on top of NTFS. The database will contain information about each document that will let WinFS-aware applications find, relate, and act on those documents in a way that isn't possible with NTFS alone. To use the WinFS capabilities, application developers will need to upgrade from the Win32 API to the new WinFX API set, which contains the commands necessary to manipulate the WinFS database.

With WinFS, you'll be able to use a natural-language query to find data. For example, you might type "Show me all the documents written by Mark Smith on the subject of storage," and the WinFS query engine would retrieve those documents. You'll also be able to expand your search to include users who work on documents as a project team or department. These formal and informal relationships provide key information regarding document sharing and utilization. For example, you might type "Find all PowerPoint documents for the road show project" to produce a list of documents created by anybody who's part of the road show project group. WinFS will let you act on those files and relationships. For example, you might type "Send all Excel files that are part of the road show project that have been updated this week to my management team," and WinFS will be able to understand the various parts of the query and act on it.

WinFS will have implications on how you store documents. Today, we think of Microsoft Office documents as flat files that are stored in shares. We typically store such documents on Direct Attached Storage (DAS) or a Network Attached Storage (NAS) device. Database files are much larger files that are stored on DAS or Storage Area Network (SAN) devices. Under WinFS, the flat file documents and the database that describes them will be tightly coupled. A record in the WinFS database will exist for each underlying file, and whenever a document is created, modified, deleted, backed up, or restored, WinFS will manipulate the document and the WinFS database together. As a result of this document/database marriage, storage management applications will have to be optimized for WinFS.

Because the WinFS specification is still under development, I can only speculate on its effects on storage management, but here's my early take on the subject. First, WinFS will require a WinFS-database-aware backup process. Most likely, a company will have one WinFS database that describes all documents across the enterprise. When the database is backed up, the backup program will need to look at individual WinFS records to make sure that the underlying documents are also being backed up. Because WinFS will support document version control, all versions of the underlying documents will need to be backed up.

Second, to maximize performance in this database/file scenario, backup architectures will need to be optimized for both flat files and large database files. Using today's storage technology, the ideal WinFS storage device would be a SAN with a NAS head. This combination device would provide the optimization for large database files as well as the performance necessary for small document files. And you can easily expand the SAN as the company's document storage needs grow.

Finally, a WinFS-aware backup will need to understand the nature of the underlying files. This understanding will let backup and recovery vendors provide much more granular recovery schemes because restore applications will understand the underlying file type and structure. For example, you'll be able to put the WinFS query engine into an end-user recovery application, which will let the user type an instruction such as, "Restore the previous version of the building project Word files that were created by the legal department into a directory called 'restored legal docs.'" The WinFS query engine will be able to deconstruct the components of this instruction and provide the recovery application with a list of the files that need to be recovered. And because most data will be backed up to disk, these restores will take only seconds to perform.

The goal of WinFS is to make the collection of knowledge located in an enterprise's documents more accessible to everyone in the organization. Pulling this off will require a storage infrastructure that's optimized for WinFS database queries, real-time backup and recovery, easy expandability, and data security. Administrators who have experience running highly active database servers in a SAN environment will have an advantage when planning for a future WinFS storage infrastructure.