Learn about NTFS5's Distributed Link Tracking, sparse-file support, volume change tracking, encryption, and alternate data streams

In "Inside Win2K NTFS, Part 1," November 2000, I began this two-part series by describing general indexing, consolidated security, reparse points, and quota management, all of which are features new to NTFS 5.0 (NTFS5), the Windows 2000 version of NTFS. In this issue, I look at how NTFS5 implements Distributed Link Tracking (DLT), sparse-file support, volume change tracking, and encryption. I conclude with a look at alternate data streams, an NTFS feature rarely used before NTFS5.

Distributed Link Tracking
Windows Explorer supports a type of symbolic link called a shell link or shell shortcut. Shell links, which often appear on the Windows desktop and in the Start menu, let you easily access programs and files without having to navigate to their original location. Another type of Windows link is an OLE link. OLE links are links in one application's files that store data belonging to another application. For example, if you embed a Microsoft Excel spreadsheet in a Microsoft Word document, an OLE link in the Word document refers to the Excel document.

If you've ever moved a link source (the executable program that a shell or OLE link refers to) in a pre-Win2K version of Windows and then clicked the link that referenced the source, you've witnessed the heuristic-based search that Windows Explorer performs in its attempt to find the source's new location. If the search is unsuccessful, Windows Explorer gives up and requires you to tell it where you moved the source. In Win2K, DLT automatically updates shell links to point at moved link sources. The only requirement is that the link source's original and final locations are both on NTFS5 volumes and in the same domain.

NTFS5 is required for automatic link updating because NTFS5 lets an application assign files and directories a 16-byte (128-bit) object ID. Not coincidentally, 16 bytes is the length of a Windows globally unique ID (GUID). Windows uses GUIDs as a general-purpose identification mechanism because a GUID's length and the algorithm that Windows uses to generate a GUID virtually guarantee that every GUID is statistically unique. The DLT service, which Win2K implements as a Win32 service, assigns a GUID to every link source that resides on an NTFS volume and directs NTFS to record the GUID as the source's object ID. When you click on a shell link or open a document with an embedded OLE link and Windows Explorer fails to find the link source at the path the link specifies, Windows Explorer queries the DLT service for the source's new location. DLT uses the object ID that Windows Explorer supplied to attempt to open the link source on each volume in the domain, starting with local volumes. When DLT finds the file (or directory), DLT asks NTFS for the name of the file and returns the result to Windows Explorer, which updates the link and follows it to the tracked location. Thus, even if the source moves to a volume on another computer in the domain, the link continues to work.

When the DLT service (or another application) assigns an object ID to a file or directory, NTFS creates an attribute of type $OBJECT_ID in the file's Master File Table (MFT) entry, as Figure 1, page 46, shows. (For information about the MFT and its entries and attributes, see "Inside NTFS," January 1998.) This 16-byte attribute stores the GUID. At the same time, NTFS adds an entry to the \$Extend$ObjId metadata file that maps the GUID to the file's MFT entry. Win2K stores the metadata file's entries in an index named $O, which NTFS sorts by object ID in a B+ tree. When DLT finds a moved link and attempts to use the link's object ID to open it, NTFS can look up the link's MFT entry number in the $ObjId file's $O index. After DLT uses the file's MFT entry number to open the file, DLT asks NTFS for the file's name, updates the link, and opens the link source. The $ObjId file's use is an example of general indexing, a new NTFS feature I describe in "Inside Win2K NTFS, Part 1."

Sparse Files
Many applications consist of a server that logs data to a file and a client that reads the data that the server writes. This architecture usually requires the use of a technique called circular logging, in which the log file is a fixed size and the server rolls over, or returns to the beginning of the file, after reaching the file's end. A rollover can lead to overwriting data before the client has a chance to read it. Other applications, such as databases, allocate extremely large files, of which valid data fills only small portions, resulting in disk space that is allocated but unused. NTFS5 includes a feature called sparse-file support that Win2K applies to both these situations to minimize unnecessary disk utilization and avoid the rollover problems inherent in circular logging.

Sparse-file support lets an application designate unused portions of a file as empty, freeing the disk space allocated to the empty regions. In a logging application, the server appends log information to the file without needing to roll over, and the client marks file areas empty as it reads them. A log file might therefore appear to be very large but use only a small amount of disk space. When you view a file's properties, Windows Explorer shows both sizes and calls the amount of space the file takes on disk the Disk size. If an application reads from part of a sparse file that is designated as empty, the application receives zero-filled data. In the case of a database, the database application marks the unused parts of the database as empty, releasing the disk space. Figure 2 illustrates a sparse file.

The NTFS implementation of compressed files has always had a type of sparse-file support. NTFS's compression algorithm compresses files in blocks of 16 virtual clusters, typically allocating fewer than 16 logical clusters on disk to store that data. A virtual cluster is a cluster within a file, and a logical cluster is a cluster on a volume. For an uncompressed file, virtual clusters correspond to logical clusters one-to-one, but in a compressed file, multiple virtual clusters might map to fewer logical clusters. When an application reads from a compressed file, NTFS decompresses the logical clusters that make up the 16-cluster block that the application is reading, recreating the 16 uncompressed virtual clusters in memory. If a 16-cluster block is filled with zeros, NTFS optimizes disk utilization by not allocating any logical clusters for the block. When an application reads from such a region, which is called a sparse region, NTFS returns zero-filled data.

NTFS5's sparse-file support relies on the same driver subroutines that NTFS uses for sparse regions of compressed files. However, two differences exist between sparse files and compressed files. One difference is that the nonempty portions of a sparse file aren't compressed. The second difference is that an application can indicate to NTFS that a region of a sparse file has become empty, freeing the logical clusters that were previously allocated to it. To make a sparse region of a compressed file, an application must write zero-filled data to the file.

You might think that users could use sparse files to exceed their disk quota by allocating large empty files and then filling them over time. However, NTFS counts a sparse file's virtual size, not the on-disk size, against a user's quota limit.

Volume Change Tracking
Before Win2K, an application that needed to monitor a volume for changes had limited options. The Win32 API exports functions that notify an application when the contents of a directory or directory tree change, but the application must either scan directories to determine the nature of changes as they occur or take the risk that the list of changes that Windows returns will overflow the application's buffer. Scanning directories after every change causes an unacceptable performance hit, yet many applications, such as incremental backup solutions, can't tolerate missed changes. NTFS5's new volume change tracking facility lets applications easily monitor changes to a volume and provides an effective alternative to earlier approaches for file- replication, incremental-backup, and virus-scanning applications.

Interestingly, volume change tracking relies on sparse-file support. NTFS disables volume change tracking by default, so applications that need to use it must enable it. When an application turns on volume change tracking, NTFS creates a change journal, which is a sparse file named \$Extend\$UsnJrnl. Figure 3 depicts the change journal. Every change to a volume, such as creating, resizing, or deleting a file, logs an entry to the $J alternate data stream (I describe alternate data streams in more detail later) in the $UsnJrnl file. A change entry records the name of the file that changed, the type and time of the change, and several other pieces of useful information. Applications can request that NTFS notify them when it adds new entries to the journal and can read entries as the system logs them. The Win2K File Replication Service (FRS), which Win2K uses to replicate Dfs shares and to propagate group policies, is one of the change journal's clients.

NTFS monitors the on-disk space that the change journal consumes and limits this space to the amount the application specifies when it enables change tracking. When a journal exceeds the specified size, NTFS marks the change journal's oldest valid entries as empty. For even a moderately sized change journal, only an extremely high volume of file activity would cause an application to fall so far behind that it couldn't read entries before NTFS deletes them.

Support for Encrypting File System (EFS) is an important part of NTFS5. Although I cover that support extensively in the series "Inside Encrypting File System" (June and July 1999), I provide a summary here. (For background information about EFS, see "Related Articles in Previous Issues." For an administrator's view of EFS management, see Mark Minasi, Inside Out, "Decrypting EFS," page 139.)

EFS isn't built in to NTFS but is an add-on driver (\winnt\system32\driversefs.sys). EFS provides transparent file-based encryption so that users can protect sensitive data that might fall into unauthorized hands. Although NTFS security prevents unauthorized access to a file while a Win2K system is online, a malicious user could bypass this security by booting a computer to another installation or by using NTFSDOS (available from www.sysinternals.com/ntfs30 .htm) from a DOS boot floppy disk.

When someone uses the Advanced Attributes panel of a file's properties dialog (which Figure 4 shows) to initially encrypt a file, the EFS service, which runs in the Local Security Authority Subsystem (LSASS—\winnt\system32\lsass .exe), uses Crypto API services to assign the user a private/public key pair that EFS can use for file encryption. EFS randomly generates a file encryption key (FEK) for an encrypted file and uses the FEK and the Data Encryption Standard X (DESX) encryption algorithm to encrypt the file's data. Because DESX is a symmetric algorithm, decryption also uses the FEK, so EFS must protect the FEK from unauthorized access. To protect the FEK, EFS uses the RSA asymmetric encryption algorithm to encrypt the FEK with the user's EFS public key and stores the encrypted FEK with the file. When a user opens and reads an encrypted file, EFS uses the user's EFS private key to decrypt the FEK, then uses the decrypted FEK to decrypt the file data.

Because EFS uses the user's password to encrypt the user's EFS private key, an intruder would have to obtain the user's password to compromise encrypted file data. Further, although the initial release of Win2K stores encrypted EFS private keys on disk within a user's profile directory, subsequent releases will let users store their keys on smart cards.

EFS stores encrypted FEKs in a file's $LOGGED_UTILITY_STREAM attribute (which is new to NTFS5), as Figure 5, page 50, shows. The stored data consists of a Data Decryption Field (DDF) and a Data Recovery Field (DRF). The DDF contains a copy of the FEK encrypted with the user's EFS public key, and the DRF contains a copy of the FEK encrypted with the system's Recovery Agent EFS public key. On a standalone system, the local administrator is the default system Recovery Agent; on a domain-based system, the default Recovery Agent is the domain administrator. Because EFS includes in every file an FEK encrypted with a Recovery Agent's EFS public key, an authorized user can decrypt the file and recover its contents if the user is unable to do so.

When the system boots, NTFS reads the Registry value HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Control\FileSystem\NtfsEncryption Service to obtain the EFS driver's name, then starts the driver. The EFS driver registers callbacks (i.e., routines that NTFS directly invokes) so that NTFS can hand off encrypted-file-related operations to the EFS driver. Internally, NTFS handles encrypted files much as it handles compressed files. NTFS decompresses a compressed file's data into the Win2K file-system cache and recompresses the data when it's written to the on-disk file. Similarly, NTFS decrypts an encrypted file's data into the cache and re-encrypts the data when writing it to disk.

Alternate Data Streams
An alternate data stream is a way to embed files within other files. Technically, every NTFS file contains an embedded file that has no name. This file, which is called the default data stream or unnamed data stream, is the file you see when you use Notepad to open a file and is the file that applications see when they use a file's standard name to open the file. Applications also can add embedded files, which are called alternate data streams, and give them different names. Applications use the Win32 API syntax File:Alternate Stream (in which AlternateStream is the name of an alternate stream) to access data within alternate streams.

Microsoft originally included alternate-stream functionality in NTFS to support Apple Macintosh (Mac) file system resource forks. Many Mac files use the Mac's Hierarchical File System (HFS) to store icon and other information in an alternate data stream. Because Windows NT Server comes with Services for Macintosh (SFM—a service that lets NT share files with Mac clients), NT must support alternate streams so that Mac clients can store files with resource forks on NT servers without losing the resource fork information. This relatively obscure role of alternate data streams has meant that few NT users have used alternate data streams. In Win2K, alternate data streams have a more prominent role.

The Summary tab, which Figure 6 shows, that the Win2K version of Windows Explorer displays when you edit the properties of an NTFS file lets you associate with a file arbitrary textual information such as a title, keywords, and revision number. Windows Explorer stores that information as an alternate data stream named ?SummaryInformation (the question mark represents an unprintable character).

Because alternate data streams are the exception, most applications and console commands aren't aware of alternate streams. For example, Windows Explorer and the Dir command show the size of only the file's unnamed data stream.

You can use the Echo and More commands, which are aware of alternate streams, to experiment with streams. First, create a file with an alternate stream:

echo hello > file.txt:alternatestream

Get a directory listing of the file to verify that Windows reports the file's size as zero, then use the More command to display the data in the alternate stream:

more < file.txt:alternatestream

(You can't view Windows Explorer summary information streams in this manner because the name of the summary information stream begins with a non-ASCII character.)

If you're curious about whether your files and directories contain alternate data streams, you might want to use the free Streams utility from www.sysinternals .com/misc.htm. This utility takes a file or directory name as a command-line parameter and reports the names of all alternate data streams that the file or directory contains. Streams accepts the wildcard character *, and you can use the /s switch to cause Streams to examine subdirectories, so you can easily see all the files and directories that contain alternate streams on a volume or in a directory.

NTFS Metadata Files
With the addition of new features, NTFS has acquired additional metadata files to store data related to those features. As I've discussed the new NTFS5 features, I've described the metadata file that each feature uses. Table 1 summarizes all the NTFS5 metadata files.

In "Inside NTFS," January 1998, I present a utility named NtfsInfo that shows you the size of the various NTFS metadata files. NtfsInfo relies on an undocumented feature of NTFS that lets applications explicitly specify the name of a metadata file to obtain a directory listing of the file. However, NTFS5 doesn't allow access to metadata files, so NtfsInfo doesn't work on Win2K. Instead, you can use NFI, the NTFS File Sector Information Utility that I introduced in "Inside Win2K NTFS, Part 1," to list information about all the files on a volume, including metadata files.

More to Come
NTFS5's powerful enhancements, including support for large files, encryption, and change tracking, will help extend Win2K's reach into the enterprise. However, you can be sure we haven't seen the end of NTFS's evolution.

Related Articles in Previous Issues
You can obtain the following articles from Windows 2000 Magazine's Web site at http://www.win2000mag.com/.

"Windows 2000 EFS," March 2000, InstantDoc ID 7977
NT Internals, "Inside Encrypting File System,
Part 2," July 1999, InstantDoc ID 5592
NT Internals, "Inside Encrypting File System,
Part 1," June 1999, InstantDoc ID 5387
NT Internals, "Inside NTFS," January 1998, InstantDoc ID 3455