Friday, May 30, 2008

Debugging reference count leaks

One of the hardest things is debugging a reference count leak. COM objects lifetime depends on the reference count (read here for more...). So each client of a COM object must call AddRef on the IUnknown interface when going to use it and it must call Release when done. If any client (and there might be many many of a single one) violates this rule you get into severe trouble.

Scenarios

1.) Number of Release calls = Number of AddRef calls

This is the normal scenario: As soon as no client needs the server object anymore it is getting destroyed

2.) Number of Release calls > Number of AddRef calls

If Release is called one time too often another client might crash because the server get's destroyed too early - bad thing here is that you see the crash in some place but this does not tell you where is root cause is located. All you know is which objects reference count has been corrupted.

3.) Number of AddRef calls > Number of Release calls

If AddRef is called one time too often the reference count never reaches 0 and hence the server object never get's destroyed. This is causing memory leaks and also might cause resource leaks. The effect of this scenario is much less obvious: You might see memory increasing over time and/or performance degrade and/or resources to be locked when they should be unlocked again.

Finding the place where the unbalanced AddRef/Release occurred might be like finding the needle in the hay. I did research in the Google reachable web but didn't find a good tool available that really assist's in this task. Luckily Sara Ford described in this post the first step you need to take in order to get the data necessary to drill down into the problem.

Somehow I didn't manage to set the trace points in Visual Studio 2005 (can anybody tell me how to set a break point on a single objects AddRef, Release methods?) so I launched my beloved WinDbg.

First I created script to create me an xml snippet for an event that alters the ref count (I didn't find a better name so I called it ToXml.txt and placed it into my scripts folder):

.printf "-->\n<Event><Ref>%d</Ref><![CDATA[",poi(${$arg1})
k100
.printf "]]></Event>\n<!--\n"

Then I placed a break point on the server objects constructor

bp MyServer!CMyClass::CMyClass

When the breakpoint hit, I stepped out <Shift>+<H11> into CComCreator::CreateInstance and then stepped over the p->SetVoid(pv); call in this class.
(I think it should be possible to set a breakpoint directly at MyServer!ATL::CComCreator<ATL::CComObject<CMyClass> >::CreateInstance+0xb1, but I didn't try...)

Now I gathered the address of m_dwRef by:

0:000> ?? &(p->m_dwRef)
long * 0x110d724c

Next thing to do is setting the data breakpoint by:

ba w4 0x110d724c "$$>a<C:/windbg/scripts/ToXml.txt 0f084cb4;gc"

(you might need to change the path 'C:/windbg/scripts/')

With .logopen we make sure that we directly write all events into an logfile:

.logopen c:\temp\Events.xml

Now let the application run with 'g' or <F5> and do whatever creates your ref counting problem.

When done break into and close the log with .logclose.

At this point we are half the way through. The Events.xml we created is not valid xml. You need to add

<?xml version="1.0" encoding="UTF-8"?>
<Events>
<!--

at the beginning and

--></Events>

at the end.

Now comes the tooling. In my scenario I had around 1400 Events - a little tedious to analyze all by hand.

So I created "Volkers RefCount Buster" which does the following:

1.) After loading the file (enter path in first text box and press Start) all events are identified for either beeing AddRef or Release

2.) Then the call stack is taken to group the events:

First action is to exclude events that match the pattern entered in the second text box (exclude pattern):

var includeQuery = from frame in this.StackFrames.Frames

where String.IsNullOrEmpty(ExcludePattern) ? true : !excludePattern.IsMatch(frame)

select frame;

Then the remaining frames are searched for the selection pattern:

var selectionQuery = from frame in includeQuery

let match = selectionPattern.Match(frame)

where match.Success

select match.Value;

and the top most match is taken:

string sourceGroup = selectionQuery.FirstOrDefault();

the all events are grouped into the found source groups:

var ResultQuery = from refCountEvent in refCountEvents

group refCountEvent by refCountEvent.SourceGroup into g

select g;

Then the number of AddRefs and Releases is calculated for each group and accumulated:

foreach (var ResultSet in ResultQuery)

{

long numOfAddRefs = (from rce in ResultSet.AsEnumerable()

where rce.RefCountType == EventType.AddRef

select rce).Count();

long numOfReleases = (from rce in ResultSet.AsEnumerable()

where rce.RefCountType == EventType.Release

select rce).Count();

long balance = numOfAddRefs - numOfReleases;

...

Now it's up to you to find the Exclude Pattern and Selection Pattern that will directly point you to the component or file, that is the culprit. Then you just need to look at those stacks that belong to the found bad guy and you will also be able to see the source line that created the problem.

VolkersRefCountBuster

You can download the sources and binaries here...

Have fun,

Volker

Thursday, May 08, 2008

Windbg help is online...

Please correct me if I'm wrong, but at least some month ago there was online help for Debugging Tools for Windows on MSDN or anywhere else.

Now Microsoft put the (very good and helpful!) help online:

image

Have fun...

Tuesday, May 06, 2008

Windbg version 6.9.3.113 is out (I'm still keeping 6.7.5.0)

Microsoft released a new version of Debugging Tools for windows. As usual the installation bits can be grabbed here..

Unfortunately the integrated managed debugging that accidentally came in with version 6.7.5.0 is still missing :-(

Anyway here's the list of changes:

Highlights in Version 6.9.3.113

In this release, you will find better performance on systems with greater memory as well as those with large CPU counts. Better performance and reliability on transport initialization. Enhancements to dt, sx, z, !defwrites, !sysinfo, !gflags, Symsrv, as well as several others. For further details, please read the RELNOTES.TXT provided in the package.

Taken from the installation relnotes.txt:

* Fix kd to function properly when debugging 256 processor machine.
* Fix windbg window dragging performance problems when running under AERO
on Vista.
* Alert the user when a debug transport is already opened by another
instance of the debugger.
* Only attempt driver install after opening the transport fails with file
not found.
* Add /LARGEADDRESSAWARE to debugger executables (cdb/kd/ntsd/windbg).
* Update vmdemux tool
* Fix pdbcopy.exe tool
* "dt" would display enumerant symbolic names for enumeration-typed bit-
field members.
* Make "dt" member field match case-insensitive.
* Support wildcard module name in "dt" command. For example, "dt
adv!*RegQuery*".
* "dt/dv" would output more information indicating that this is an empty
string (the default display "") or this is memory read failure (new output
"--- memory read error at address ...").
* "sx? ud" commands can now use an image name (for example, ntdll.dll) as
well as a module name (ntdll).
* Fix ".dbgdbg" command failure when debugger is installed in a directory
that contains spaces (for example, "c:\Program Files\Debuggers").
* Fix "z" command loop counter reset problem.
* Debugger extension would be loaded using LOAD_WITH_ALTERED_SEARCH_PATH
so that dependent binaries could be loaded from the same directory where
loaded extension resides (and the directory is not part of search path).
* "!defwrites" (in kdexts.dll) will not query nt!MmThrottleTop and
nt!MmThrottleBottom values in Windows Vista.
* Fix for "!sysinfo cpuinfo".
* Fix for "!sysinfo gbl" infinite loop problem.
* Fix for DisplayFlags() has output string buffer overrun when using
"!handle" (in ntsdexts.dll).
* Fix for "!gflags" command.
* Fix so that "fltkd" and "boot" debugger extensions run on pre-Win7 OS.
* Fix Symstore/SymChk improved detection for resource-only binaries
* Continued on-going improvements to !analyze

Good to know, that you can have multiple versions of windbg on the same machine by simply x-copying it - hope that someday there's no need to write: "I'm still keeping 6.7.5.0"