One of my consulting client’s main file servers started freezing up about 2 months ago. The problem started gradually, freezing up once every few weeks, but eventually it was freezing every few days. Intermittent problems are the most difficult problems to solve because they can be so elusive. The server is an HP DL740 four-processor 2.0GHz server with 32GB of memory connected to two external disk subsystems. The hard disks are configured in two RAID 10 arrays with 146GB hard disks connected to a 6404 RAID controller. This is the client’s main file and print server and has been in service for roughly 2 years.

When the server freezes, there are no error messages, and users can't access data on any of the server shares. The server has two arrays: The first array is tied to the C and D drives, and the second array corresponds to the E drive. Sometimes you can log on to the server with the attached keyboard, but as soon as you access the D or E drive, the server console session hangs. If the server is rebooted, users can access the server shares until it freezes again.

Initially, I suspected a hardware/hard disk problem because one of the disks recently failed and had to be replaced. I ran the latest HP online diagnostics tool, but it reported that everything was OK. Over a weekend, I started the latest HP offline diagnostics and ran the hardware tests in a continuous loop, but I still didn’t find anything wrong with the server. I also upgraded all the firmware and drivers on the server, but the problem persisted. Both the HP Event Viewer and the Windows Server 2003 Event Viewer didn't reveal any clues about why the server continued to freeze, although it seemed to freeze when it was under a heavier load.

I opened a case with HP Technical Support, but they couldn't find anything wrong with the server and suggested the problem could be OS-related. To make matters worse, the server was running out of disk space and I had to quickly make a decision whether to expand an unstable server or evaluate alternatives for additional disk space. While troubleshooting the server, I looked at the disk space situation and noticed that the server drives were almost full. The D drive is 540GB and had 30GB free and the E drive is 280GB and had 10GB free. This client was eating through disk space very rapidly--at the rate of roughly 3GB per day. This was a serious problem. Servers should have no less than 14 percent (ideally 20 percent) of free disk space. NTFS does an OK job of preventing disk fragmentation as long as you have enough contiguous free disk space to write new files, but as the disk fills up, fragmentation occurs at an exponential rate. I used the Windows 2003 Disk Tools to run a disk analysis on the hard disks and found that both D and E drives were heavily fragmented. Unfortunately, I was stuck between a rock and a hard place because I needed to move files off the server, but it was becoming unstable and I didn’t want to risk another server crash.

Was disk fragmentation causing the server to freeze? Everyone has heard of disk fragmentation hurting server performance--but could it freeze a server? I decided to download an evaluation copy of Diskeeper Enterprise (http://www.diskeeper.com/profile/submit-select.aspx?a=l&PId=102 ) because it would likely defragment the disk faster and better than the Windows 2003 Disk Defragmenter. Before running Diskeeper, the D drive had an average of 5.5 disk fragments per file and E had an average of 2.5 fragments per file. After the defragmentation process, files on the D drive had an average of 2.9 fragments per file and the E drive had an average of 1.8 fragments per file. Not perfect, but it was a significant improvement.

After a week with no server hangs, we concluded that disk fragmentation was indeed the cause of the problem. Now that the server was stable, I installed an additional eight 146GB drives in each disk subsystem and created a 1.2TB RAID 10 array on the server. With the additional disk space I was able to move off a couple of large folders from the D and E drives. With more than 50 percent of free space on each drive, Diskeeper was significantly more effective at reducing disk fragmentation. On servers that are heavily used, consider using a product like Diskeeper to keep disk fragmentation to a minimum. Not only will this improve the performance of your disk subsystem, it can prevent server instability on busy servers.

Tip

Windows Server 2003 R2 ships on two CD-ROMs. As with Windows 2003, you must run Adprep/forestprep and Adprep/domainprep on your Windows 2003 or Windows 2000 forest before you can introduce the first R2 domain controller (DC). Unfortunately, R2 ships with two different versions of Adprep: one on the first CD-ROM under \i386 and one on the second CD-ROM in the \Cmpnents\r2\adprep folder. Make sure to run Adprep from the second CD-ROM and not from the first; otherwise, you'll receive an error message that the Adprep didn't successfully complete when you try to introduce the first R2 DC into the Active Directory (AD) forest.