Crouching Server, Hidden Memory Leak

How I rescued an SMB's server and restored its missing memory

Monday, May 16, 9:30 a.m.: Customer's server crashes for the umpteenth time.
Accusations hurtled through the air, and angry email messages and phone calls flew furiously between the small-to-midsized business (SMB) customer and the Value Added Reseller (VAR) that supported the customer's financial application. What spawned this IT battle scene? It all started when a Windows 2000 server that hosted the customer's application started crashing intermittently. I work for a Microsoft Business Solutions Gold Partner, and customers who use Microsoft Business Solutions for Financial Management—Great Plains software are an important part of our practice. My boss dispatched me to the client's site to assess the problem.

By the time the client called us, the server was crashing every few days. Before the crash, ODBC connections from Great Plains clients would become sluggish and finally disconnect. The client's accounting managers, IT people, and Great Plains implementers hurled epithets at each other over the fallen server.

The Great Plains implementer on this project is a capable technician, but his training and experience hadn't prepared him to handle the problem at hand: resolving server lockups and crashes. In desperation, he emailed the client/server coordinator and copied me on the message.

Our Microsoft Customer Relationship Management (CRM) system contains our clients' histories for contacts, product purchases, licensing keys, trouble tickets, and other relevant customer information. I located the client's resident IT support person in the CRM database and phoned him.

10:00 a.m.: I begin problem resolution by calling the client's onsite IT person.
I introduced myself to the IT support person and explained why I was calling. Quickly, I reassured him that I—the VAR—was on his side and that I wanted to help him resolve the problem. I won his trust, and he gave me his full cooperation.

He told me that the server was downed like a badly wounded soldier, bleeding memory slowly but continuously. He also told me that his company's security policies prohibited using remote management software, which would have let me examine the injured system. I'd have to find another way to investigate the problem.

10:20 a.m.: I examine the server event logs for clues.
I asked the IT person whether he could send me the server's System and Application event logs, SQL Server event logs, and perhaps a snapshot of Task Manager. He emailed them to me at 10:40 a.m.

I opened the logs and looked at the System log first. The first thing I saw was a bright red streak of Event ID 2019 errors flashing on my laptop screen: The server was unable to allocate from the system nonpaged pool because the pool was empty. Then, in the Application log, I saw Event ID 208. This error fingered the Great Plains application as part of the problem.

In the SQL Server event log, I saw the Event ID 17052 error. And finally, in the Task Manager snapshot, I got a little more information about the Event ID 2019 error, as Figure 1 shows.

I looked in the Microsoft Help and Support Knowledge Base and found an article at http://support.microsoft.com/?kbid=888928 that showed that the Event ID 2019 error might be related to having McAfee VirusScan installed on the server. McAfee VirusScan was, in fact, on the server, and the vendor had a hotfix for the problem. I notified the local IT support person, who downloaded and quickly applied the hotfix and rebooted the server. Alas, the hotfix failed to stop the resource bleeding.

11:30 a.m.: En route to the client's site, I find a fruitful lead.
Finally I persuaded the client to let me investigate the problem on site. To pass time during my drive to the client's site, I listened to a CD; no, not Pink Floyd or Willie Nelson, but Mark Minasi's Tuning Your Windows 2000 Servers. While perusing the event logs, I'd been mulling over memory leaks and how to find them. On the CD, Mark talks about memory and mentions "leakers"—programs that allocate a file handle every few seconds. By itself, the file handle doesn't use much memory, but the repeated allocations gradually use up a great deal of it.

1:15 p.m.: I find the source of the problem.
When I arrived at the site, I met the IT support person, who ushered me into the server room. I opened Task Manager on the server and customized the view by adding the User Name, Paged Pool, Non-paged Pool, Handle Count, and Thread Count fields. I clicked OK, then maximized the Task Manager window and sorted by file handles.

On my Windows XP laptop, svchost.exe uses 1424 handles and outlook.exe uses 1333 handles. Running on the client's server, however, I found an applet associated with sending messages from the onboard SCSI card. That program had used 700,000 file handles since it had been rebooted 10 minutes before—and the file-handle count continued to climb.

I did a quick Google search on the filename of the errant program, and my results showed that many people were having problems with this file and certain motherboards. This added further evidence that we'd found the problem. Earlier, I'd told the Great Plains consultant that I suspected a memory leak. As I stared intently at Task Manager, I exclaimed, "Well, I guess we found our 'leaker'!"

1:45 p.m.: I bring the "crouching server" back to life.
The final step was to fix the rogue program so that it no longer created file handles ad infinitum. Although the server hardware was under warranty, its service level agreement (SLA) didn't cover onsite support. The server housed sensitive financial information, so moving it off site for service wasn't an option.

My alternative (and easier) solution was to modify the registry entries for the applet. I ran regedit, found the applet's launch areas in the registry, and made changes to the registry subkeys related to the applet (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run) to prevent the applet from running when the server was rebooted.

Finally, I rebooted the server, and the problem vanished. The administrator signed my time sheet and wished me well. As I drove back to the office, I put my trusty Windows technical CD back in the player. For me, it was just another day of tracking down technical problems, dispelling customer qualms, and relearning something interesting about Windows.

Discuss this Article 12

KALYAN (not verified)
on May 28, 2008
Very useful information
SCG
on Oct 27, 2005
Nice bit of troubleshooting, Curt. Although it is easy to suspect a resource leak of some kind, it is not always trivial to find it, especially if it is not just 'memory' but handles or even more esotheric stuff. Good job.
pietrzak
on Oct 26, 2005
Great article Curt. Many times those small little applications that these companies add on to the servers are more trouble than they are worth. I make it a point to completely format and rebuild from scratch any server that is for production, just for the sole purpose to clean all the bloat off of there. Nice troubleshooting as well. Michael Pietrzak
SCG
on Oct 27, 2005
Nice article Curt. Persistance is the key to success in gaining the trusted advisor approach of a VAR. Nice job....
ASMB-Support (not verified)
on Nov 4, 2005
Another great article from Curt dealing with “Real World” IT issues… Very informative article, definitely going into our Tips and Tricks collection. Another reminder why we also have the Minasi collection… Tim Bolton
wgalanis
on Nov 10, 2005
Excellent article, it contributed to my "Learn something new every day" plan. Where do I get the CD you mentioned, "Tuning Your Windows 2000 Servers"? Is this an audio book? I can find nothing on Amazon.
David (not verified)
on Oct 27, 2005
Great Article. Even better Troubleshooting. Thanks for sharing the though process as I am sure it will help me one day in the wee a.m hours. David B

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottYou'll have the opportunity to experience:
• The Microsoft
Technology Roadmap
• Office 365 Implementation
• Hyper-V Optimizing
• Windows 8 Deployment
and much more!

Come See Paul Thurrott & Rod Trent in Person!

Early Registration Now Open

Upcoming Training

Mastering System Center 2012

During over 6 hours of training you can join John Savill from your computer as he will walk you through the key components and capabilities of System Center 2012, what’s involved in using the components, and the benefit they can bring to your environment.

Register Now

Current Issue

May 2013 - The NameTranslate object is useful when you need to translate Active Directory object names between different formats, but it's awkward to use from PowerShell. Here's a PowerShell script that eliminates the awkwardness.

CURRENT ISSUE / ARCHIVE / SUBSCRIBE

Windows Forums

Get answers to questions, share tips, and engage with the Windows Community in our Forums.