Caching file system data is an important performance optimization that virtually every modern operating system (OS) performs. The premise behind caching is that most applications access data that is primarily localized within a few files. Bringing those files into memory and keeping them there for the duration of the application's accesses minimizes the number of disk reads and writes the system must perform. Without caching, applications require relatively expensive disk operations every time they access a file's data.
This month, I'll look inside the Cache Manager, an Executive subsystem that works closely with the Memory Manager and file systems to cache file-system data in memory. In my last two columns about the Memory Manager, you can find details about many of the concepts I refer to this month (such as mapping a view of a file).
Logical Block and Virtual Block Caching
OSs use two types of file-system data caching: logical block caching and virtual block caching. The two types store data at different levels of abstraction, as Figure 1, page 68 shows. A logical drive resides on a disk partition that's composed of physical storage units called sectors. When an application accesses data in a particular file, the file system responsible for the drive (e.g., FAT, NTFS) determines which sectors of the disk hold the data in the file. The file system then issues disk I/O requests to read from or write to those sectors. In logical block caching, the OS caches sector data in memory so that the memory associated with the target sectors, rather require disk operations, can satisfy disk I/O requests. Most older variants of the UNIX OS, including BSD 4.3, every Microsoft OS (Windows 98, Win95, Windows 3.x, and DOS) except Windows NT, and Novell NetWare, cache file system data at the logical block level.
Virtual block caching caches data at the file system level rather than the disk level. When an application accesses data in a file, the file system checks to see whether the data resides in the cache. If the data is in the cache, the file system doesn't need to determine which sectors of the disk store the data and issue disk I/O requests. The file system simply operates on the data in the cache. NT relies on virtual block caching (implementing it in the Cache Manager), as do newer versions of UNIX, including Linux, Solaris, System V, and BSD 4.4. Virtual block caching has a couple of advantages over logical block caching. First, when file data the application is reading is in a virtual block cache, the file system performs no file-to-sector translations. In fact, in some cases, the I/O system can bypass the file system altogether and retrieve requested data directly from the cache. Second, the cache subsystem knows which files and which offsets within the files an application is asking for. The cache subsystem can monitor the access patterns of each file and make intelligent guesses about which data an application is going to ask for next. Using its guesses as guidelines, the cache subsystem reads the data from disk in anticipation of future requests. This slick process is known as read-ahead, and when the cache subsystem's predictions are accurate, read-ahead boosts system performance. Although read-ahead is possible with logical block caching, virtual block caching makes read-ahead simple to implement.
The Virtual Address Control Block Array
NT assigns the Cache Manager a fixed amount of a system's virtual address space (virtual memory is between 2GB and 4GB on most systems, and optionally between 3GB and 4GB on Enterprise Edition) during system initialization. Virtual memory is where the Cache Manager maps data in disk files, and the amount of virtual memory NT assigns to the cache depends on the size of physical memory. By default, NT gives the cache 128MB of virtual memory, but for each 4MB of physical memory above 16MB, NT gives the cache an additional 64MB of virtual memory. NT caps virtual memory at 512MB on x86 systems and 416MB on Alphas. (On x86 systems running NT 5.0 with Terminal Server support not enabled and with the Registry value HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/SessionManager/
MemoryManagement/LargeSystemCache set to 1, NT can assign the cache as much as 960MB of virtual memory.) Thus, if an x86 computer has 64MB of physical memory, NT sizes the cache at 512MB. The distinction between physical memory and virtual memory is important to remember, especially with respect to the cache: Although the Cache Manager might be tracking 512MB of virtual file-system data, the cache's working set is usually smaller. The file data present in the cache's working set--not the file data mapped into the cache's working memory--determines what data the Cache Manager caches.