| Executive Summary: You probably can’t avoid tech support problems entirely, but by using tools that Microsoft’s Global Escalation Services support team uses, you can obtain detailed Windows system and application information that can shorten a support call or even avoid it. In this first What Would Microsoft Support Do? column, Microsoft Escalation Engineer Michael Morales shows you how to use the user-mode dump heap (UMDH) tool (umdh.exe, one of the Debugging Tools for Windows) to diagnose a process memory leak on a system and use UMDH’s output to solve the problem more quickly. |
If you manage a Windows environment, you know that a
call to tech support is an inevitable part of your job. But
there are things you can do to help resolve support issues
faster—and perhaps avoid the dreaded support call entirely.
In nine years as an escalation engineer for Microsoft’s Global
Escalation Services support team, I’ve found a number of
Microsoft tools especially helpful for resolving tech support issues. In
this new column, What Would Microsoft Support Do?, I’ll show you
how you can use these tools to obtain valuable information that will
help you either facilitate your tech support call or research your own
solution. We’ll start our exploration of Microsoft’s troubleshooting
tools by walking through using the user-mode dump heap (UMDH)
tool to identify and solve a memory-leak problem.
Troubleshooting a Memory Leak
UMDH (umdh.exe) is part of the Debugging Tools for Windows,
which you can download at www.microsoft.com/whdc/devtools/
debugging/installx86.mspx#a. UMDH aids in troubleshooting
process memory leaks by revealing the components responsible
for allocating the most memory. You can use UMDH with Windows
Server 2008, Windows Server 2003, Windows 2000 Server, Windows
Vista, and Windows XP systems.
I recently used UMDH to solve a customer’s memory-leak
problem. The customer’s Performance Monitor logs indicated that
the svchost.exe process was leaking enough memory to cause the
entire system to crawl. However, the information didn’t pinpoint
what components were involved in the leak or the functions those
components executed—information that UMDH could provide.
UMDH Steps
Using UMDH to troubleshoot a memory leak involves a sequence of
straightforward steps. Here’s the process:
1. Use the gflags.exe tool to enable the registry setting Create user
mode stack trace database. This setting lets the system store the process’s
function calls and module listing in a database during execution;
UMDH then dumps the database into an output file. Gflags is installed
when you install the Debugging Tools for Windows. This sample
gflags.exe command enables a setting for the notepad.exe process:
gflags.exe -i notepad.exe +ust
The command sets a registry value that’s read by the system during
process startup and lets the system keep track of the threads that
allocate memory inside the process. After running gflags.exe to
enable the setting, you’ll need to restart the process before you can
perform step 3. Also, remember to turn off the setting after you’ve
completed the necessary tracing for a leaking process. The following
command disables gflags.exe for the notepad.exe process:
gflags.exe -i notepad -ust
2. Set up the Microsoft symbol path to point to the Internet for
symbols. Enabling symbols lets UMDH output the process trace
information in a readable format. Without symbols, each line in
the trace output will show the word “module”’ instead of an actual
.dll name and numbers instead of the function name (more about
the trace output shortly).
To enable symbols, right-click My Computer, click Properties
and the Advanced Tab, then click the Environment Variables button.
Under System Variables, click the New button, and in the Variable
name box, enter
_NT_SYMBOL_PATH
In the Variable value box, enter the symbol path srv*c:\symbols*
http://msdl.microsoft.com/download/symbols. UMDH will use
the symbol path to display the components responsible for leaking
memory. (This symbol path is valid for Server 2008, Windows
2003, Win2K, Vista, and XP.)
3. Now you can take your first UMDH snapshot. To do so, from
the command line, navigate to the location where you’ve installed
the debugging tools. Then enter a command like this:
C :\debug>umdh -p:260 -f:Notepad1.txt
(Here, I installed the tools in the C:\debug directory.) The -p: is
the process ID of the leaking process (which you can obtain from
Performance Monitor or Task Manager), and the -f: is the name
you’ve chosen for the first snapshot file.
4. Allow enough time between the first and second snapshots
to ensure that the process leaks memory. While you’re waiting
between snapshots, you can use Performance Monitor to see how
much memory is being leaked.
5. Take your second snapshot, for example
C :\debug>umdh -p:260 -f:Notepad2.txt
6. Now compare the two snapshots, by running a command
like this:
:\debug>umdh -v Notepad1.txt
Notepad2.txt >c:\comparefiles.txt
The -v parameter tells UMDH to include
in its output summary information that
describes how much memory each thread
has consumed between the first and
second snapshots (more about threads
shortly). You need to specify a file to contain
the output for the snapshot comparison;
here, the filename is comparefiles.txt.
The previous command’s output lists the
components and function calls that allocated
the most memory within the process.
Having this detailed information about the
process will make the problem easier for
tech support to pinpoint and resolve—
or will give your systems administrator
adequately specific information to research
the problem further and possibly update the
binaries involved in the leak.
Continue to page 2
A note about using UMDH: You can trace
both Microsoft and non-Microsoft related
processes and services by running UMDH
commands; however, to actually capture the
component name involved in the leak, you’ll
need the corresponding symbol file for that
component. Some vendors don’t make their
symbol files public; if you don’t have access
to the symbol file, the information in the
UMDH output file will be limited to only the
component’s load address and exclude the
component name and function being executed.
So, to get any meaningful output from
UMDH, you should specify
at least the Microsoft symbol
path, as explained earlier.
Interpreting UMDH
Output
When you open the output
file—comparefiles.txt in Figure
1—at the top you’ll see
the first thread of execution
(thread for short), two lines
followed by a succession
of lines grouped together.
Threads represent a running
task inside a process; they’re
components and functions
that have memory allocated.
Every process must have at
least one thread to be able
to load and run. The top two
lines are the thread’s summary
information, and the group of lines under
the summary information represents each
thread entry in that process. Let’s look more
closely at the output and what it means.
The first two lines of the thread stack
show comparative memory-usage information
from the two snapshots. The first hexadecimal
number, +113faf000, represents the
delta change in memory consumption from
the first snapshot to the second. So in our
example, you can see a change of more than
4.6GB of memory (113faf000-0 = 4.6GB). To
see the delta value, you can convert the hex
value to decimal by using the Windows calculator’s
Scientific view (you can access the
calculator either through the Start menu or
by running calc.exe).
The next number, 81df8, represents the
number of actual allocations that occurred
to consume the memory. 81df8 hex represents
531,960 allocations. This high number
of allocations is normal, considering this
thread is responsible for more than 4GB of
memory. The next part, BackTrace8117, is
the internal ID with which the system has
tagged this thread.
The thread at the top of the UMDH output
is the thread that consumed the greatest
amount of memory, so that’s where you’ll
start investigating the memory-leak problem.
Each thread in the output file consists
of the component’s load address (e.g., in
the first entry, 77EDCA76), the component
filename (or DLL name—ntdll in the first
entry), and just after the ! sign, the function
within the DLL that was executed (e.g.,
!$$VProc_ImageExportDirectory).
You can use the UMDH output to start
your troubleshooting investigation by
reviewing the components in the output file
and, if necessary, updating them to their latest
versions. If the components involved in
the process are up-to-date and the leak still
occurs, your next step is to call tech support
or research the problem further.
Using UMDH Information
You can further narrow down and possibly
solve the problem by researching it online.
For example, I searched on information
from the sample UMDH snapshots—the
string “repdrvfs wmi leak,” including “wmi”
because the leak occurred in a Windows
Management Instrumentation (WMI) process
and “repdrvfs” because that component
name was high on the thread stack (i.e.,
the thread that was consuming the most
memory) and repeated several times (indicating
that the repdrvfs DLL was involved
in the consumption of memory). My search
found a TechNet article that provided the fix
for the problem, at support.microsoft.com/kb/838884. Thus, when you select components
to search, you’ll probably be most successful
searching those that are both high on
the thread stack and repeated.
Of course, you won’t solve all leaky-application
problems by using UMDH. However,
using UMDH for troubleshooting leaky
processes will provide key information that
can significantly reduce
the time needed to resolve
a technical support issue.
Check out the Microsoft
Advanced Windows Debugging
and Troubleshooting
blog (blogs.msdn.com/ntdebugging) for further guidance
in identifying and resolving
Windows technical issues.