Not long ago, hardware storage vendors were manufacturing hardware-only file compression add-ons that provided high-speed compression with no performance loss. In fact, using these products typically improved performance. But what about using Windows NT's built-in file compression? I set out to determine whether users can reliably deploy the file compression utility that ships with NT 3.51 and NT 4.0 with little or no performance cost. The obvious benefit is confidence in deploying file compression on data drives and achieving a bigger bang for your online storage buck.
Throughout the testing, I used a variety of desktop PCs to read and write data to file servers that would be typical of any modern company’s server infrastructure. These systems, which included single- and dual-processor Pentium PCs with 64MB to 196MB of RAM and processor speeds ranging from 66MHz to 333MHz, all gave unique results.
In the end, NTFS file compression provides anywhere from 20 percent to 40 percent more disk space for no extra dollar cost. With high-end server hardware (i.e., multiple processors and significant amounts of RAM), you can easily get more data onto existing hardware without asking for more cash. Based on the test results, enabling file compression on NTFS volumes that primarily share read-only data involves little risk; however, take care when enabling file compression on NTFS volumes that have significant write activity because compressing data that you're writing to disk significantly increases processor utilization.
Note, however, that developing baseline server performance before you enable any compression tools will likely save your IT department lots of complaints on performance. If you enable NTFS file compression on substandard hardware, you'll bring your clients to a crawl.
Microsoft NTFS File Compression
Microsoft introduced NTFS file compression with little fanfare. The technical types, like myself, all tried it on our local NTFS drives, said "that’s neat," and then fell into the same old trap of thinking that 150GB of storage would be plenty as hard disk prices dropped and storage sizes increased. As the old adage goes, "You grow into the size of the home you purchase."
To activate NTFS file compression (not available for FAT or HPFS disk partitions), you select the properties of the drive, directory, or file desired and set the compression attribute, as Screen 1 shows. Because NTFS file compression is a software solution, I considered the following factors before beginning the performance testing:
- If NTFS file compression operates as a background or foreground application, it must use CPU cycles.
- If NTFS file compression manipulates data, it must use memory.
- Memory is physical.
- Lack of physical memory translates to page swapping.
- Page swapping increases disk utilization.
Developing a Hypothesis
By simple deduction, a system can read a compressed file from a disk array faster than its uncompressed counterpart—fewer bytes, less time. Less time spent at disk access, which is slow compared with memory access, speeds retrieval time. Even adding some processor cycles for expanding the file before sending it to the client can, in theory, improve on or equal the performance of retrieving and sending the original uncompressed file.
If this deduction is true, file writes from a client should follow the same pattern. You'll want to try to be forward-thinking and factor in more memory usage and processor usage. By taking this approach, you should have a low-cost solution (at least for the interim) for those immediate real estate requirements that the resource-hungry clients are demanding.
The relationship between uncompressed and compressed data access is
F = Sample file in megabytes
T = Time to read/write data to or from disk in MBps (constant)
C = The percentage compression achieved on the sample file type
P = Processor constant to compress or uncompress data
Based on this simple calculation, the NT file server should be able to write a file that compresses from 1MB to 500KB to the physical disk faster and read that file from disk faster than the original 1MB file with no compression. T is constant on each side of the equation, and the compression ratio, C, more than counters the assumed adverse processor effect, P. The assumption is that software-based file compression depends on a fast processor (microsecond speeds) compared to hardware-based disk I/O, which is physical and slower (millisecond speeds).
Table 1 presents the characteristics of the test data I used. Based on this data and the formula created, the following equation should hold true:
108.01MB / T > (108.01MB * .6785 * 110%) / T
By removing the constant T from each side and factoring an assumed 10 percent processor utilization increase, the formula holds true:
108.01MB > 80.62MB
Putting Theory into Practice
Once I had established the theory, it was time to head to the lab. I used two Pentium II workstations, one with Windows 95 and one with NT 4.0 Service Pack 3 (SP3). The detailed configuration of these workstations was insignificant because they were distanced from the testing (i.e., the workstations always received the same data regardless of whether file compression was on or off because all the file compression took place at the server). Network performance also played little significance because it was consistent (I performed all tests in a controlled environment with consistent peripheral traffic).
The tests consisted of copying the test data from the servers to local disks at the workstations and then copying the data back to the server. After each copy, I removed the destination directory structure and emptied the Recycle Bin to ensure a complete write of data to the destination disk. I used Microsoft Windows NT Resource Kit tools such as the Now utility in batch files to provide precise time stamps during the copy process, and I used Performance Monitor to determine processor and paging file usage.
I used three servers for the tests to provide significant differences between configurations and allow some sort of trend analysis. The first server (Server A) had a 66MHz Pentium CPU, 64MB of RAM, and a RAID 5 disk array. The second server (Server B) had a 200MHz dual Pentium II CPU, 128MB of RAM, and a RAID 5 disk array. The final server (Server C) had a 333MHz dual Pentium II CPU, 196MB of RAM, and a RAID 5 disk array. All three servers were running NT Server 4.0 SP4. I performed three iterations of the tests against these servers and performed all tests in duplicate to verify results.
Table 2 shows the results of Test 1 for Server A without file compression enabled, and Table 3 shows the results of Test 1 for Server A with file compression enabled. Enabling file compression during the first test made relatively little difference in read times (a maximum of 13 seconds positive or faster to read with compression), as Table 4 shows. However, during disk writes, the performance hit was staggering. With the Win95 client, writing 108MB of executable files without compression took 176 seconds, while writing the same data with compression took 307 seconds—a whopping 75 percent longer to write. Similarly, the increase in write time for a sample of user data and image files was 67 percent and 60 percent, respectively. Additionally, file swapping and processor utilization underwent significant increases with 100 percent sustained processor usage occurring during testing.
Table 5 shows the results of Test 2 for Server B without file compression enabled, and Table 6 shows the results of Test 2 for Server B with file compression enabled. Enabling file compression during the second test made relatively little difference in read times (a maximum of 5 seconds negative or slower to read with compression), as Table 7 shows. During disk writes, the performance hit was less noticeable from the client perspective. However, if you view the processor and file-swapping statistics, you notice a significant increase in both values, especially under disk write conditions. While testing, I noticed processor usage peaks of 65 percent.
Table 8 shows the results of Test 3 for Server C without file compression enabled, and Table 9 shows the results of Test 3 for Server C with file compression enabled. Enabling file compression during the third test made relatively little difference in read times (a maximum of 9 seconds positive or faster to read with compression), as Table 10 shows. During disk writes, the performance hit was less noticeable from the client perspective. The 11-second difference on writing 108MB of data equates to 5 percent. Once again, if you view the processor and file-swapping statistics you notice a significant increase in both values, especially under disk write conditions. While testing, I noticed processor usage peaks of 50 percent. An interesting peculiarity of the dual processor servers was that the second processor always took most of the load, as Screen 2 shows.
I would encourage those NT network administrators or engineers looking desperately for disk space to use caution. Enabling NTFS file compression on your old single-processor servers might be a decision you'll regret. If you anticipate using NTFS file compression, keep the following suggestions in mind:
- You can freely compress read-only shares to slightly improve performance; however, the client won't notice this improvement. An example of read-only shares would be shared application directories, static documentation drives, and online archives that users seldom write to.
- You'll typically need to use at least two processors when enabling NTFS file compression. Because most processor use associated with file compression seems to shift to one processor, the other processor remains free for other server services.
- Adding RAM will likely improve performance, but the tests did not prove it (most likely because the servers were not performing any other functions during the tests). In fact, both the single-processor 64MB server and the dual-processor 196MB server had the pages/sec counters churning wildly while doing disk writes.
- Always baseline your servers’ performance before enabling NTFS file compression. Processor usage and memory page swapping directly relate to enabling file compression, and you don’t want to peak either server characteristic.
- If you're searching for candidates to enable file compression on, you might want to load balance your servers, plan or change your environment to place static data onto less performance-driven servers, and put your active data on powerful boxes. To check this performance on current servers, monitor the network interface reads and writes or disk I/O reads and writes to get some idea of what the clients hitting the server are doing.
- Ultimately, NTFS file compression can give you anywhere from 20 percent to 40 percent more disk space without adding another disk. If you have the power, there's little penalty. If you have to add the power in memory or processors, you might want to weigh the price of another disk array.