NT's native file systempast, present, and future
When Microsoft first released Windows NT, it came with support for three file systems: FAT, HPFS, and NTFS. Microsoft originally designed FAT for DOS, IBM introduced HPFS for OS/2, and Microsoft created NTFS for NT. NTFS is NT's native format, and NT supports the FAT and HPFS formats to provide migration paths from the other operating systems (although Microsoft didn't include support for HPFS in NT 4.0).
Microsoft's implementation goal for NTFS was to overcome the limitations of the other two NT-compatible file systems and provide the advanced features that an enterprise-level operating system requires. For example, NTFS supports file and directory-granular security, whereas FAT and HPFS have no security capability. In addition, NTFS's allocation scheme can efficiently address huge disksFAT and HPFS both are limited to disk sizes that are challenged by today's commodity hardware. Finally, of the three file systems, NTFS is the only one that's Unicode-enabled for international environments and has built-in data reliability features to prevent file and file system corruption if the system fails.
In this article, I'll compare NTFS's capabilities with FAT's and explain the design of NTFS on-disk data structures. I'll wrap up with a preview of NTFS changes in NT 5.0 and mention several resources you can turn to if you end up with a corrupt NTFS drive. (For information about choosing a file system in NT 4.0, see Sean Daily, "NTFS vs. FAT," October 1996.)
NTFS Capabilities
In the late 1980s, Microsoft designed NTFS in parallel with the initial development of NT. After NTFS's basic infrastructure was in place and functional, Microsoft directed the NT programming team to use NTFS as the file system for its NT development systems. This arrangement created an effective feedback loop for the NTFS developers.
Because NTFS was to be a new file system from the ground up, its design could incorporate features that would overcome the limitations of existing PC file systems and anticipate the demands of NT's enterprise users down the road. The most obvious way to focus attention on NTFS was to make it scale with ever-expanding disks. All Windows file systems divide a disk partition into logical allocation units known as clusters. The FAT file system uses 16-bit entries to reference clusters, so at most FAT can address 216 or 65,536 clusters. Clusters can vary in size according to the size of a disk, but large clusters result in internal fragmentation, or wasted space inside clusters. For example, if a file has only 250 bytes of data, it still requires an entire cluster of allocated disk space, which results in more than 15KB of wasted space on a disk with 16KB clusters.
With only 65,536 clusters of addressing capability, a FAT drive with 1KB clusters can cover a 65MB disk. Today's 4GB or larger disks require FAT clusters of 64KBa size that typically yields large amounts of wasted space. In contrast, NTFS references clusters with 64-bit addresses. Thus, even with 512-byte clusters, NTFS can map disks up to sizes that we won't likely see even in the next few decades.
The developers of FAT and HPFS overlooked file system security, but NTFS uses the same security model as NT. Discretionary access control lists (DACLs) and system access control lists (SACLs), which control who can perform operations on a file and what events will trigger logging of actions performed on a file, are stored in their native NT form within NTFS. This approach makes managing NTFS security a natural fit with general NT security administration.
The FAT file system uses the ASCII 8-bit character set in its file and directory name scheme. Employing the ASCII character set constrains FAT names to English (and some symbolic) characters. NT and NTFS both use the Unicode 16-bit character set for names. This attribute of NTFS lets NT users around the world manage their files using their native language.
FAT's model of data stored within a file is a one-unit approach. In contrast, NTFS lets several data streams exist within a file. The NTFS unnamed data stream is equivalent to the traditional FAT view of file data, but NTFS named streams can refer to alternative units of data. The named-stream model has not come into widespread use on NTFS (there isn't even a Win32 programming API that can access this capability), but Services for Macintosh (SFM) relies on the named-stream model to transparently let Macintosh clients use named streams on NTFS network drives as they do on their native Hierarchical File System (HFS) drives.
Finally, FAT has no provision for fault-tolerance. If a system crashes while you are creating or updating files and directories, FAT's on-disk structures can become inconsistent. This situation can result in the loss of the data being modified, as well as a general corruption of the drive and the loss of much of the disk's data. This risk is unnecessary and clearly unacceptable for NT's target market. NTFS has built-in transactional logging so that whenever a modification is about to take place, NTFS makes a note of the modification in a special log file. If the system crashes, NTFS can examine the log file and use it to restore the disk to a consistent state with minimal data loss.
NTFS Disk Management
Disk Administrator (windisk.exe) and the command-line Format utility are two tools you can use to create NTFS disk partitions and logical drives. Both tools let you specify exactly what size you want the clusters to be. However, if you choose not to specify an override, both tools will select default sizes based on the size of the disk. The tools make choices based on the amount of wasted space that can result from large clusters versus the amount of overhead that comes with managing small clusters. Table 1 summarizes the default cluster sizes.
In addition to storing data belonging to your files and directories, NTFS uses several files to store data associated with disk management. The data stored within these files and any NTFS-related data stored in user files and directories is known as metadata. When you initialize an NTFS disk, NTFS creates 11 metadata files. Table 2 lists the metadata files and describes their purpose. These files are typically invisible to you when you browse an NTFS drive from the command prompt or from Explorer, but you can see them if you know what to look for. You can reveal information about a metadata file from the command prompt by typing
dir /ah <filename>
Screen 1 displays the results of using this technique for several metadata files.
I'll describe a few of the files in depth later, but I'll mention some files now. In addition to the logging that NTFS performs to prevent catastrophic loss of a disk, NTFS protects its on-disk data structures with signatures. When errors occur in reading NTFS data, NTFS identifies the clusters as bad clusters, remaps the data to another location of the disk, and updates the $BADCLUS file so that NTFS will avoid the faulty clusters in the future. If fault-tolerant disk drivers are present, they return information to NTFS that augments this hotfix capability to protect other data.
The $BITMAP file is a large array of bits in which each bit corresponds to a cluster on the drive. If the bit is off, the cluster is free; otherwise, the cluster is in use. NTFS maintains this file to track free clusters on the drive for allocating to new data.
The Master File Table
NTFS's command central is the Master File Table (MFT). The MFT is analogous to the FAT file system's file allocation table because the MFT maps all the files and directories on the drive, including NTFS's metadata files. The MFT is divided into discrete units known as records. In one or more MFT records, NTFS stores the metadata that describes a file or directory's characteristics (security settings and other attributes such as read-only or hidden) and its location on the disk. Surprisingly, the MFT is also a file that NTFS maps using records within the MFT. This structure lets the MFT grow and shrink, which makes the space NTFS uses for metadata efficiently track the amount of metadata that exists.
NTFS internally identifies files and directories using the position in the MFT of the record describing the start of their metadata. For example, the metadata files that Table 2 lists have pre-assigned starting (base) records in the MFT, which appear in the second column of Table 2. The records are usually 1KB in size on an NTFS 4.0-formatted drive, but they can be larger.
The $MFTMIRR file is another NTFS data-loss prevention measure. $MFTMIRR contains a copy of the first 16 records of the MFT, and NTFS stores the file in the middle of the disk (the MFT starts near the beginning of the disk). If NTFS has trouble reading the MFT, it refers to the duplicate. An NTFS disk's boot record (512 bytes of data at the beginning of the disk) contains entries that locate the position of the MFT and the MFT mirror.
MFT-access performance plays a critical role in the overall performance of an NTFS drive, so NTFS takes a step to keep access fast. Because the MFT is a file on an NTFS drive, it can become fragmented as it grows and shrinks. This fragmentation results because NTFS cannot contiguously allocate the entire MFT in advance. When the MFT grows and other files use the clusters just beyond its end, the MFT must look elsewhere on the disk for available space, which causes discontinuous sequences of clusters in its file map.
The fastest file access occurs when a sequential disk operation can read entire chunks of a file, but a fragmented MFT means that NTFS may require multiple disk operations to read a record, which can lead to lower performance. In an effort to prevent MFT fragmentation, NTFS makes a region of clusters around the MFT off-limits to other files and directories. This area, the MFT-Zone, increases the chance that the MFT will find contiguous clusters or at least keep its data close to other MFT data when it needs to grow. When the disk space outside the MFT-Zone becomes scarce, NTFS relaxes its restriction and clusters from the zone can be allocated for other uses. Thus, the MFT runs a higher risk of becoming fragmented on disks that are near capacity. Unfortunately, NTFS does not let defragmentation tools defragment the MFT.
You can access the manual version of the Diskeeper defragmenter in Windows NT 5.0 Beta 1 by going to the Preview Directory on the CD-ROM, selecting x86 or Alpha, selecting the Defrag directory, and clicking Install. The defragmenter then becomes an operational part of the system. The defragmentation engine exists in beta 1, but Microsoft hasn’t yet included the GUI. The Defrag GUI will be included in beta 2.<br>
--Jobee Knight<br>
Director of Public Relations<br>
Executive Software
Jobee Knight August 10, 1999