Fix DefaultCOMImpl reference counting bugs in DAC#125231
Fix DefaultCOMImpl reference counting bugs in DAC#125231max-charlamb wants to merge 3 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
|
if we're ok with making this a breaking change then lgtm. |
6e1f88e to
ca6e0fa
Compare
- Fix SOSMethodEnum simulator to use standard COM ref counting: initialize refCount to 0 and use QueryInterface to return the interface pointer with a proper AddRef. - Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>. Matching PR: dotnet/runtime#125231 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…_CHANGE_VERSION - Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6 to match runtime. - Fix SOSMethodEnum simulator to use standard COM ref counting: initialize refCount to 0 and use QueryInterface to return the interface pointer with a proper AddRef. - Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>. Matching PR: dotnet/runtime#125231 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ca6e0fa to
86b1a68
Compare
…_CHANGE_VERSION - Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6 to match runtime. - Fix SOSMethodEnum simulator to use standard COM ref counting: initialize refCount to 0 and use QueryInterface to return the interface pointer with a proper AddRef. - Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>. Matching PR: dotnet/runtime#125231 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adapts ClrMD to handle the DAC ref-counting fix in dotnet/runtime#125231. Adds SosDac9 wrapper for ISOSDacInterface9::GetBreakingChangeVersion(), and extracts a ReleaseLeakedDacRef() helper that skips the extra Release() when BreakingChangeVersion >= 6
…_CHANGE_VERSION (#5746) ## Summary Matching PR for dotnet/runtime#125231, which fixes `DefaultCOMImpl::Release()` reference counting bugs in the DAC. ### Changes **1. Bump `SOS_BREAKING_CHANGE_VERSION` from 5 to 6** (`sospriv.h`, `sospriv.idl`) The runtime PR bumps this version. Without updating SOS's copy, users would see a spurious "SOS needs to be upgraded" warning when debugging newer runtimes. **2. Fix `SOSMethodEnum` simulator COM ref counting** (`util.cpp`) The `SOSDacInterface15` simulator's `GetMethodTableSlotEnumerator` had the same bug pattern fixed in the runtime PR — raw pointer assignment without `QueryInterface`, and the null check occurred after the dereference. Fixed to: - Initialize `refCount` to 0 (matching `DefaultCOMImpl` convention) - Use `QueryInterface` to return the interface, properly calling `AddRef` - Check for null before dereferencing the out parameter - Use `delete` on failure (since `QueryInterface` wasn't called at refCount 0) **3. Fix `ISOSMethodEnum` leak in `strike.cpp`** Changed raw `ISOSMethodEnum*` to `ToRelease<ISOSMethodEnum>` so `Release()` is called when the pointer goes out of scope, matching the pattern used everywhere else (e.g., `ToRelease<ISOSHandleEnum>`). Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/coreclr/inc/sospriv.idl
Outdated
| // stress log structs (StressMsg, StressLogChunck, ThreadStressLog, etc), exception | ||
| // stack traces (StackTraceElement), the PredefinedTlsSlots enums, etc. | ||
| cpp_quote("#define SOS_BREAKING_CHANGE_VERSION 5") | ||
| cpp_quote("#define SOS_BREAKING_CHANGE_VERSION 6") |
There was a problem hiding this comment.
SOS_BREAKING_CHANGE_VERSION is meant to track what the comment says. It is not meant to track fixes for memory leaks, etc
What is going to happen if we do not bump SOS_BREAKING_CHANGE_VERSION as part of this change?
There was a problem hiding this comment.
We need a mechanism to determine if this was fixed as ClrMD double frees this COM object to close the memory leak.
When ClrMD sees the new version, it should only free the object once.
There was a problem hiding this comment.
Can ClrMD check the runtime version (e.g. DAC file version) instead?
There was a problem hiding this comment.
I am wondering whether it is worth the pain to fix this at all. These legacy DAC interfaces are broken in so many ways, I hope we will just get rid of them (archive them) at some point in future. Trying to fix them is a distraction.
There was a problem hiding this comment.
I'm only fixing this because it impacts the cDAC. Due to ClrMd's patch we need to leak a ref on a number of APIs which causes a memory leak for other users of the API.
I'd prefer to fix this it simplifies the cDAC and fixes existing memory leaks in SOS.
There was a problem hiding this comment.
Would it be acceptable revert the breaking change bump and instead gate this behind checking net11.0 in ClrMD? @leculver
There was a problem hiding this comment.
SOS breaking change version comes from user provided dump in cDAC. The user provided dump should not be able to influence whether the DAC and its consumers end up leaking or double-freeing memory. It is broken security-wise.
This breaking change will require everybody to update ClrMd to the fixed version for .NET 11. I think the cleanest way to fix this would be to rev the GUIDs on the affected interfaces. The existing ClrMD versions are going to fail cleanly in that case (vs. likely inscrutable crash with the current fix). The new ClrMD version that knows about the fixed interfaces can do the right thing for both old and new.
There was a problem hiding this comment.
What does rev the GUID mean? Change the COM interface GUID (effectively making it a different API)? The alternative existing versioning mechanism is CLRDATA_REQUEST_REVISION on the ClrDataAccess::Request API.
There was a problem hiding this comment.
The alternative existing versioning mechanism is CLRDATA_REQUEST_REVISION on the ClrDataAccess::Request API.
I did not know about this. Yes, this would work.
0a7fa6f to
d70f388
Compare
d70f388 to
54114f4
Compare
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived COM objects. ClrMD compensated with an extra Release() after wrapping enumerators. With the fix in dotnet/runtime#125231 (CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would prematurely free the object. Changes: - DacRuntime: Accept DAC version 9 and 10, store version on SOSDac - SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper that only releases the leaked ref when DacVersion < 10 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix SOSMethodEnum simulator to use standard COM ref counting: initialize refCount to 0 and use QueryInterface to return the interface pointer with a proper AddRef. - Fix ISOSMethodEnum leak in DumpMT by using ToRelease<>. Matching PR: dotnet/runtime#125231 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived COM objects. ClrMD compensated with an extra Release() after wrapping enumerators. With the fix in dotnet/runtime#125231 (CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would prematurely free the object. Changes: - DacRuntime: Accept DAC version 9 and 10, store version on SOSDac - SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper that only releases the leaked ref when DacVersion < 10 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived COM objects. ClrMD compensated with an extra Release() after wrapping enumerators. With the fix in dotnet/runtime#125231 (CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would prematurely free the object. Changes: - DacRuntime: Accept DAC version 9 and 10, store version on SOSDac - SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper that only releases the leaked ref when DacVersion < 10 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix two bugs in the legacy DAC's DefaultCOMImpl<T> template class: 1. DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef), causing it to return the old ref count and never delete the object when the count reaches zero. This violates the IUnknown::Release contract and leaks all DefaultCOMImpl-derived objects (DacHandleWalker, DacStackReferenceWalker, DacMemoryEnumerator, DacMethodTableSlotEnumerator, etc.). 2. DacMethodTableSlotEnumerator was returned to callers via raw pointer assignment without calling QueryInterface/AddRef, leaving mRef at 0. All other DefaultCOMImpl subclasses correctly use QueryInterface before returning, which calls AddRef to give the caller an owning reference. Bump SOS_BREAKING_CHANGE_VERSION from 5 to 6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cDAC's GetHandleEnum and GetHandleEnumForTypes implementations previously called ComInterfaceMarshaller.ConvertToUnmanaged to intentionally leak a ref count, matching a bug in the legacy DAC's DefaultCOMImpl::Release (post-decrement instead of pre-decrement). Now that the legacy DAC bug is fixed, remove the compat workaround so the cDAC returns handle enumerators with the correct ref count. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert SOS_BREAKING_CHANGE_VERSION back to 5 — per review feedback, it is meant for structural changes (struct layouts, enums), not for ref count fixes. Instead, bump ClrDataAccess::Request CLRDATA_REQUEST_REVISION from 9 to 10. ClrMD queries this via IXCLRDataProcess::Request and can use it to detect the fixed ref counting behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
54114f4 to
2e0b30b
Compare
## Summary Matching PR for dotnet/runtime#125231, which fixes `DefaultCOMImpl::Release()` reference counting bugs in the DAC. ### Changes **1. Fix `SOSMethodEnum` simulator COM ref counting** (`util.cpp`) The `SOSDacInterface15` simulator's `GetMethodTableSlotEnumerator` had the same bug pattern fixed in the runtime PR — raw pointer assignment without `QueryInterface`, and the null check occurred after the dereference. Fixed to: - Initialize `refCount` to 0 (matching `DefaultCOMImpl` convention) - Use `QueryInterface` to return the interface, properly calling `AddRef` - Check for null before dereferencing the out parameter - Use `delete` on failure (since `QueryInterface` wasn't called at refCount 0) **2. Fix `ISOSMethodEnum` leak in `strike.cpp`** Changed raw `ISOSMethodEnum*` to `ToRelease<ISOSMethodEnum>` so `Release()` is called when the pointer goes out of scope, matching the pattern used everywhere else (e.g., `ToRelease<ISOSHandleEnum>`). Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The DAC's DefaultCOMImpl::Release() used post-decrement (mRef--) instead of pre-decrement (--mRef), leaking all DefaultCOMImpl-derived COM objects. ClrMD compensated with an extra Release() after wrapping enumerators. With the fix in dotnet/runtime#125231 (CLRDATA_REQUEST_REVISION bumped to 10), the extra Release() would prematurely free the object. Changes: - DacRuntime: Accept DAC version 9 and 10, store version on SOSDac - SOSDac: Add DacVersion property and ReleaseLeakedDacRef() helper that only releases the leaked ref when DacVersion < 10 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I would like to see microsoft/clrmd#1401 (comment) and https://github.com/dotnet/diagnostics/pull/5749/files#r2898744570 resolved to make sure that the end-to-end holds together. Otherwise looks good. |
Summary
Fix two reference counting bugs in the legacy DAC's
DefaultCOMImpl<T>template class, and remove the corresponding compat workaround in the cDAC.Sibling PRs:
dotnet/diagnostics#5746
microsoft/clrmd#1398
Bug 1:
Release()uses post-decrement (dacimpl.h)DefaultCOMImpl::Release()used post-decrement (mRef--) instead of pre-decrement (--mRef):Per the IUnknown::Release contract,
Releasemust return the new reference count and free the object when it reaches 0. The post-decrement meant objects were never freed — a memory leak affecting allDefaultCOMImpl-derived classes (DacHandleWalker,DacStackReferenceWalker,DacMemoryEnumeratorsubclasses,DacMethodTableSlotEnumerator,DacStackReferenceErrorEnum).Bug 2:
DacMethodTableSlotEnumeratormissingQueryInterface(request.cpp)GetMethodTableSlotEnumeratorreturned the object via raw pointer assignment without callingQueryInterface/AddRef, leavingmRefat 0:Every other
DefaultCOMImplsubclass correctly usesQueryInterfacebefore returning, which callsAddRefto give the caller an owning reference. Fixed to match that pattern.cDAC compat removal (SOSDacImpl.cs)
The cDAC's
GetHandleEnumandGetHandleEnumForTypespreviously calledComInterfaceMarshaller.ConvertToUnmanagedto intentionally leak a ref count, matching the legacy DAC's broken behavior. Now that the legacy bug is fixed, this compat workaround is removed.Version bump
Bumps
CLRDATA_REQUEST_REVISIONfrom 9 to 10 so that ClrMD can detect the fixed ref counting behavior viaIXCLRDataProcess::Request.