Skip to content

Fix DefaultCOMImpl reference counting bugs in DAC#125231

Open
max-charlamb wants to merge 3 commits intodotnet:mainfrom
max-charlamb:dev/maxcharlamb/runtime-dac-ref-leak-check
Open

Fix DefaultCOMImpl reference counting bugs in DAC#125231
max-charlamb wants to merge 3 commits intodotnet:mainfrom
max-charlamb:dev/maxcharlamb/runtime-dac-ref-leak-check

Conversation

@max-charlamb
Copy link
Member

@max-charlamb max-charlamb commented Mar 5, 2026

Summary

Fix two reference counting bugs in the legacy DAC's DefaultCOMImpl<T> template class, and remove the corresponding compat workaround in the cDAC.

Sibling PRs:

dotnet/diagnostics#5746
microsoft/clrmd#1398

Bug 1: Release() uses post-decrement (dacimpl.h)

DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef):

// Before (bug):
ULONG res = mRef--;  // captures value BEFORE decrement
if (res == 0)        // never true when mRef was 1
    delete this;     // object is never freed

Per the IUnknown::Release contract, Release must return the new reference count and free the object when it reaches 0. The post-decrement meant objects were never freed — a memory leak affecting all DefaultCOMImpl-derived classes (DacHandleWalker, DacStackReferenceWalker, DacMemoryEnumerator subclasses, DacMethodTableSlotEnumerator, DacStackReferenceErrorEnum).

Bug 2: DacMethodTableSlotEnumerator missing QueryInterface (request.cpp)

GetMethodTableSlotEnumerator returned the object via raw pointer assignment without calling QueryInterface/AddRef, leaving mRef at 0:

// Before (bug):
*enumerator = methodTableSlotEnumerator;  // mRef stays 0

Every other DefaultCOMImpl subclass correctly uses QueryInterface before returning, which calls AddRef to give the caller an owning reference. Fixed to match that pattern.

cDAC compat removal (SOSDacImpl.cs)

The cDAC's GetHandleEnum and GetHandleEnumForTypes previously called ComInterfaceMarshaller.ConvertToUnmanaged to intentionally leak a ref count, matching the legacy DAC's broken behavior. Now that the legacy bug is fixed, this compat workaround is removed.

Version bump

Bumps CLRDATA_REQUEST_REVISION from 9 to 10 so that ClrMD can detect the fixed ref counting behavior via IXCLRDataProcess::Request.

Copilot AI review requested due to automatic review settings March 5, 2026 18:00
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@rcj1
Copy link
Contributor

rcj1 commented Mar 5, 2026

if we're ok with making this a breaking change then lgtm.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@max-charlamb max-charlamb force-pushed the dev/maxcharlamb/runtime-dac-ref-leak-check branch from 6e1f88e to ca6e0fa Compare March 5, 2026 19:30
max-charlamb pushed a commit to max-charlamb/diagnostics that referenced this pull request Mar 5, 2026
- Fix SOSMethodEnum simulator to use standard COM ref counting:
  initialize refCount to 0 and use QueryInterface to return the
  interface pointer with a proper AddRef.
- Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>.

Matching PR: dotnet/runtime#125231

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit to max-charlamb/diagnostics that referenced this pull request Mar 5, 2026
…_CHANGE_VERSION

- Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6 to match runtime.
- Fix SOSMethodEnum simulator to use standard COM ref counting:
  initialize refCount to 0 and use QueryInterface to return the
  interface pointer with a proper AddRef.
- Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>.

Matching PR: dotnet/runtime#125231

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 5, 2026 19:44
@max-charlamb max-charlamb force-pushed the dev/maxcharlamb/runtime-dac-ref-leak-check branch from ca6e0fa to 86b1a68 Compare March 5, 2026 19:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@max-charlamb max-charlamb requested a review from Copilot March 5, 2026 20:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

max-charlamb pushed a commit to max-charlamb/diagnostics that referenced this pull request Mar 5, 2026
…_CHANGE_VERSION

- Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6 to match runtime.
- Fix SOSMethodEnum simulator to use standard COM ref counting:
  initialize refCount to 0 and use QueryInterface to return the
  interface pointer with a proper AddRef.
- Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>.

Matching PR: dotnet/runtime#125231

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb added a commit to microsoft/clrmd that referenced this pull request Mar 5, 2026
Adapts ClrMD to handle the DAC ref-counting fix in dotnet/runtime#125231.

Adds SosDac9 wrapper for ISOSDacInterface9::GetBreakingChangeVersion(), and extracts a ReleaseLeakedDacRef() helper that skips the extra Release() when BreakingChangeVersion >= 6
Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

hoyosjs pushed a commit to dotnet/diagnostics that referenced this pull request Mar 5, 2026
…_CHANGE_VERSION (#5746)

## Summary

Matching PR for dotnet/runtime#125231, which fixes
`DefaultCOMImpl::Release()` reference counting bugs in the DAC.

### Changes

**1. Bump `SOS_BREAKING_CHANGE_VERSION` from 5 to 6** (`sospriv.h`,
`sospriv.idl`)

The runtime PR bumps this version. Without updating SOS's copy, users
would see a spurious "SOS needs to be upgraded" warning when debugging
newer runtimes.

**2. Fix `SOSMethodEnum` simulator COM ref counting** (`util.cpp`)

The `SOSDacInterface15` simulator's `GetMethodTableSlotEnumerator` had
the same bug pattern fixed in the runtime PR — raw pointer assignment
without `QueryInterface`, and the null check occurred after the
dereference. Fixed to:
- Initialize `refCount` to 0 (matching `DefaultCOMImpl` convention)
- Use `QueryInterface` to return the interface, properly calling
`AddRef`
- Check for null before dereferencing the out parameter
- Use `delete` on failure (since `QueryInterface` wasn't called at
refCount 0)

**3. Fix `ISOSMethodEnum` leak in `strike.cpp`**

Changed raw `ISOSMethodEnum*` to `ToRelease<ISOSMethodEnum>` so
`Release()` is called when the pointer goes out of scope, matching the
pattern used everywhere else (e.g., `ToRelease<ISOSHandleEnum>`).

Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
// stress log structs (StressMsg, StressLogChunck, ThreadStressLog, etc), exception
// stack traces (StackTraceElement), the PredefinedTlsSlots enums, etc.
cpp_quote("#define SOS_BREAKING_CHANGE_VERSION 5")
cpp_quote("#define SOS_BREAKING_CHANGE_VERSION 6")
Copy link
Member

@jkotas jkotas Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOS_BREAKING_CHANGE_VERSION is meant to track what the comment says. It is not meant to track fixes for memory leaks, etc

What is going to happen if we do not bump SOS_BREAKING_CHANGE_VERSION as part of this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a mechanism to determine if this was fixed as ClrMD double frees this COM object to close the memory leak.

When ClrMD sees the new version, it should only free the object once.

Copy link
Member

@jkotas jkotas Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ClrMD check the runtime version (e.g. DAC file version) instead?

Copy link
Member

@jkotas jkotas Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it is worth the pain to fix this at all. These legacy DAC interfaces are broken in so many ways, I hope we will just get rid of them (archive them) at some point in future. Trying to fix them is a distraction.

Copy link
Member Author

@max-charlamb max-charlamb Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only fixing this because it impacts the cDAC. Due to ClrMd's patch we need to leak a ref on a number of APIs which causes a memory leak for other users of the API.

I'd prefer to fix this it simplifies the cDAC and fixes existing memory leaks in SOS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be acceptable revert the breaking change bump and instead gate this behind checking net11.0 in ClrMD? @leculver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOS breaking change version comes from user provided dump in cDAC. The user provided dump should not be able to influence whether the DAC and its consumers end up leaking or double-freeing memory. It is broken security-wise.

This breaking change will require everybody to update ClrMd to the fixed version for .NET 11. I think the cleanest way to fix this would be to rev the GUIDs on the affected interfaces. The existing ClrMD versions are going to fail cleanly in that case (vs. likely inscrutable crash with the current fix). The new ClrMD version that knows about the fixed interfaces can do the right thing for both old and new.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does rev the GUID mean? Change the COM interface GUID (effectively making it a different API)? The alternative existing versioning mechanism is CLRDATA_REQUEST_REVISION on the ClrDataAccess::Request API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative existing versioning mechanism is CLRDATA_REQUEST_REVISION on the ClrDataAccess::Request API.

I did not know about this. Yes, this would work.

Copilot AI review requested due to automatic review settings March 6, 2026 16:11
@max-charlamb max-charlamb force-pushed the dev/maxcharlamb/runtime-dac-ref-leak-check branch from 0a7fa6f to d70f388 Compare March 6, 2026 16:11
@max-charlamb max-charlamb force-pushed the dev/maxcharlamb/runtime-dac-ref-leak-check branch from d70f388 to 54114f4 Compare March 6, 2026 16:12
max-charlamb pushed a commit to max-charlamb/clrmd that referenced this pull request Mar 6, 2026
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--)
instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived
COM objects. ClrMD compensated with an extra Release() after wrapping
enumerators. With the fix in dotnet/runtime#125231
(CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would
prematurely free the object.

Changes:
- DacRuntime: Accept DAC version 9 and 10, store version on SOSDac
- SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper
  that only releases the leaked ref when DacVersion < 10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit to max-charlamb/diagnostics that referenced this pull request Mar 6, 2026
- Fix SOSMethodEnum simulator to use standard COM ref counting:
  initialize refCount to 0 and use QueryInterface to return the
  interface pointer with a proper AddRef.
- Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>.

Matching PR: dotnet/runtime#125231

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

max-charlamb pushed a commit to max-charlamb/clrmd that referenced this pull request Mar 6, 2026
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--)
instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived
COM objects. ClrMD compensated with an extra Release() after wrapping
enumerators. With the fix in dotnet/runtime#125231
(CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would
prematurely free the object.

Changes:
- DacRuntime: Accept DAC version 9 and 10, store version on SOSDac
- SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper
  that only releases the leaked ref when DacVersion < 10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit to max-charlamb/clrmd that referenced this pull request Mar 6, 2026
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--)
instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived
COM objects. ClrMD compensated with an extra Release() after wrapping
enumerators. With the fix in dotnet/runtime#125231
(CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would
prematurely free the object.

Changes:
- DacRuntime: Accept DAC version 9 and 10, store version on SOSDac
- SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper
  that only releases the leaked ref when DacVersion < 10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Max Charlamb and others added 3 commits March 6, 2026 13:07
Fix two bugs in the legacy DAC's DefaultCOMImpl<T> template class:

1. DefaultCOMImpl::Release() used post-decrement (mRef--) instead of
   pre-decrement (--mRef), causing it to return the old ref count and
   never delete the object when the count reaches zero. This violates
   the IUnknown::Release contract and leaks all DefaultCOMImpl-derived
   objects (DacHandleWalker, DacStackReferenceWalker,
   DacMemoryEnumerator, DacMethodTableSlotEnumerator, etc.).

2. DacMethodTableSlotEnumerator was returned to callers via raw pointer
   assignment without calling QueryInterface/AddRef, leaving mRef at 0.
   All other DefaultCOMImpl subclasses correctly use QueryInterface
   before returning, which calls AddRef to give the caller an owning
   reference.

Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cDAC's GetHandleEnum and GetHandleEnumForTypes implementations
previously called ComInterfaceMarshaller.ConvertToUnmanaged to
intentionally leak a ref count, matching a bug in the legacy DAC's
DefaultCOMImpl::Release (post-decrement instead of pre-decrement).

Now that the legacy DAC bug is fixed, remove the compat workaround
so the cDAC returns handle enumerators with the correct ref count.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert SOS_BREAKING_CHANGE_VERSION back to 5 — per review feedback,
it is meant for structural changes (struct layouts, enums), not for
ref count fixes.

Instead, bump ClrDataAccess::Request CLRDATA_REQUEST_REVISION from
9 to 10. ClrMD queries this via IXCLRDataProcess::Request and can
use it to detect the fixed ref counting behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/maxcharlamb/runtime-dac-ref-leak-check branch from 54114f4 to 2e0b30b Compare March 6, 2026 18:10
hoyosjs pushed a commit to dotnet/diagnostics that referenced this pull request Mar 6, 2026
## Summary

Matching PR for dotnet/runtime#125231, which fixes
`DefaultCOMImpl::Release()` reference counting bugs in the DAC.

### Changes

**1. Fix `SOSMethodEnum` simulator COM ref counting** (`util.cpp`)

The `SOSDacInterface15` simulator's `GetMethodTableSlotEnumerator` had
the same bug pattern fixed in the runtime PR — raw pointer assignment
without `QueryInterface`, and the null check occurred after the
dereference. Fixed to:
- Initialize `refCount` to 0 (matching `DefaultCOMImpl` convention)
- Use `QueryInterface` to return the interface, properly calling
`AddRef`
- Check for null before dereferencing the out parameter
- Use `delete` on failure (since `QueryInterface` wasn't called at
refCount 0)

**2. Fix `ISOSMethodEnum` leak in `strike.cpp`**

Changed raw `ISOSMethodEnum*` to `ToRelease<ISOSMethodEnum>` so
`Release()` is called when the pointer goes out of scope, matching the
pattern used everywhere else (e.g., `ToRelease<ISOSHandleEnum>`).

Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit to max-charlamb/clrmd that referenced this pull request Mar 6, 2026
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--)
instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived
COM objects. ClrMD compensated with an extra Release() after wrapping
enumerators. With the fix in dotnet/runtime#125231
(CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would
prematurely free the object.

Changes:
- DacRuntime: Accept DAC version 9 and 10, store version on SOSDac
- SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper
  that only releases the leaked ref when DacVersion < 10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jkotas
Copy link
Member

jkotas commented Mar 7, 2026

I would like to see microsoft/clrmd#1401 (comment) and https://github.com/dotnet/diagnostics/pull/5749/files#r2898744570 resolved to make sure that the end-to-end holds together. Otherwise looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants