Skip to content

[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395

Open
max-charlamb wants to merge 1 commit intodotnet:mainfrom
max-charlamb:cdac-stackrefs-pr1
Open

[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395
max-charlamb wants to merge 1 commit intodotnet:mainfrom
max-charlamb:cdac-stackrefs-pr1

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Summary

Part 1 of 5 stacked PRs splitting #126408 into reviewable pieces.

What this PR contains

  • Stack walker fixes (IsFirst preserved for skipped frames, UpdateState refactor)
  • PromoteCallerStack / PromoteCallerStackUsingGCRefMap for transition frames
  • GCRefMapDecoder + FindGCRefMap with ReadyToRun import section resolution
  • SOSDacImpl.GetStackReferences using cDAC contract
  • GCInfoDecoder.EnumerateLiveSlots goto removal
  • GcSignatureTypeProvider for GC type classification
  • Data descriptor additions (TransitionBlock, ExceptionInfo, frame types)
  • RequiresInstArg, IsAsyncMethod, HasRetBuffArg on IRuntimeTypeSystem

Stack overview

PR Content Status
→ This PR Stack walk fixes + GC scanning 🔄
PR 2 RuntimeSignatureDecoder (ELEMENT_TYPE_INTERNAL) Pending
PR 3 ArgIterator port from crossgen2 Pending
PR 4 Native stress framework (cdacstress.cpp) Pending
PR 5 Managed stress tests + CI pipeline Pending

Testing

  • 1707/1707 unit tests pass ✅
  • Dump tests (StackWalkDumpTests, StackReferenceDumpTests) validate end-to-end

Note

This PR description was created with AI assistance from Copilot.

Implement GC reference scanning for stub/transition frames and fix
stack walker state machine bugs:

- PromoteCallerStack/PromoteCallerStackUsingGCRefMap for transition frames
- GCRefMap decoder for ReadyToRun import section resolution
- FindGCRefMap with FindReadyToRunModule fallback
- SOSDacImpl.GetStackReferences using cDAC contract
- Fix IsFirst preserved for skipped frames
- Fix skipped frame handling moved to UpdateState
- GCInfoDecoder goto removal (ReportUntrackedAndSucceed local function)
- RequiresInstArg, IsAsyncMethod, HasRetBuffArg on IRuntimeTypeSystem
- ExceptionInfo ClauseForCatch fields for catch handler detection
- Data descriptor additions for frame types and TransitionBlock layout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the first slice of cDAC stack-walk GC reference scanning, extending data descriptors/contracts and wiring SOSDacImpl.GetStackReferences to use the cDAC StackWalk contract (including transition-frame root scanning via GCRefMap and signature-based fallback).

Changes:

  • Add/extend cDAC data descriptors and contract data types for transition frames, ReadyToRun import sections, and exception clause ranges needed for stack GC root enumeration.
  • Refactor StackWalk filtering/state transitions and implement managed/live-slot + frame-based GC root scanning paths.
  • Extend contract APIs (ExecutionManager/GCInfo/RuntimeTypeSystem) and update solution/test project structure to include StressTests.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs Updates mock ReadyToRunInfo descriptor layout with import section fields.
src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj Treats StressTests as a separate test project and excludes it from unit-test compilation.
src/native/managed/cdac/cdac.slnx Adds StressTests project to the cdac solution filter.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements GetStackReferences using cDAC StackWalk contract, with DEBUG cross-check.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ReadyToRunInfo.cs Reads/imports R2R import section pointer + count into the contract data.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/TransitionBlock.cs Exposes descriptor-provided layout offsets used for GCRefMap/arg-reg scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/StubDispatchFrame.cs Adds GCRefMap + lazy-resolution fields for stub dispatch frame root scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/ExternalMethodFrame.cs New contract data type for ExternalMethodFrame GCRefMap-based scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/DynamicHelperFrame.cs New contract data type for DynamicHelperFrame flag-based scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ExceptionInfo.cs Adds catch-handler clause range fields for interruptible-offset override logic.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs Refactors stack-walk filtering/Next integration and adds GC root enumeration behaviors.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcSignatureTypeProvider.cs Adds signature-type classifier used by signature decoding for GC scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs Adds managed decoder for the compact GCRefMap bitstream.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/FrameHandling/FrameIterator.cs Adds return-address retrieval and frame-type-specific GC root scanning (GCRefMap/MetaSig paths).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs Implements RequiresInstArg and IsAsyncMethod for TransitionFrame argument layout decisions.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs Extends decoder surface to expose interruptible ranges.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs Plumbs GetInterruptibleRanges through the GCInfo contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs Removes goto, fixes untracked-slot reporting behavior, and keeps decoding accessible.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_2.cs Exposes FindReadyToRunModule in ExecutionManager v2 contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_1.cs Exposes FindReadyToRunModule in ExecutionManager v1 contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManagerCore.cs Implements R2R module resolution via RangeSection lookup.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/DataType.cs Adds DataType entries for new frame data types.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs Adds RequiresInstArg and IsAsyncMethod to abstractions interface.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs Adds InterruptibleRange and GetInterruptibleRanges to the public abstraction contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IExecutionManager.cs Adds FindReadyToRunModule to the abstraction contract.
src/coreclr/vm/readytoruninfo.h Adds cdac_data offsets for import sections in ReadyToRunInfo.
src/coreclr/vm/frames.h Adds cdac_data for ExternalMethodFrame/DynamicHelperFrame and fields for StubDispatchFrame GCRefMap resolution.
src/coreclr/vm/datadescriptor/datadescriptor.inc Adds/extends descriptors for new frame fields, transition block offsets, and catch clause ranges.
docs/design/datacontracts/StackWalk.md Documents new descriptors used by stack walking + GC reference scanning.
docs/design/datacontracts/RuntimeTypeSystem.md Documents new RuntimeTypeSystem contract methods.
docs/design/datacontracts/GCInfo.md Documents new GCInfo contract API for interruptible ranges.

Comment on lines +21 to +25

// These are offsets relative to the TransitionBlock pointer, stored as field "offsets"
// in the data descriptor. They represent computed layout positions, not actual memory reads.
FirstGCRefMapSlot = (uint)type.Fields[nameof(FirstGCRefMapSlot)].Offset;
ArgumentRegistersOffset = (uint)type.Fields[nameof(ArgumentRegistersOffset)].Offset;
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransitionBlock data descriptor now defines OffsetOfArgs, but the managed TransitionBlock contract type doesn’t expose or initialize a corresponding property. This leaves the contract/data-descriptor/documentation out of sync and makes it harder for callers to consume the new layout information consistently.

Consider adding an OffsetOfArgs property (similar to FirstGCRefMapSlot/ArgumentRegistersOffset) or removing the unused descriptor/doc entry until it’s actually consumed.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be used in a later PR

Comment on lines +59 to +60
public GcTypeKind GetGenericMethodParameter(object? genericContext, int index) => GcTypeKind.Ref;
public GcTypeKind GetGenericTypeParameter(object? genericContext, int index) => GcTypeKind.Ref;
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetGenericTypeParameter/GetGenericMethodParameter currently classify all generic parameters as GcTypeKind.Ref. In the native implementation (MetaSig + SigTypeContext), generic parameters are resolved against the actual instantiation and can be non-GC (e.g., int, float) or value types with embedded refs. Treating them unconditionally as object refs risks reporting bogus GC roots and producing incorrect/unstable stack reference enumeration.

Consider resolving generic parameters using the method/type instantiation from IRuntimeTypeSystem (or returning GcTypeKind.Other and deferring reporting) until proper instantiation-based classification is implemented.

Suggested change
public GcTypeKind GetGenericMethodParameter(object? genericContext, int index) => GcTypeKind.Ref;
public GcTypeKind GetGenericTypeParameter(object? genericContext, int index) => GcTypeKind.Ref;
public GcTypeKind GetGenericMethodParameter(object? genericContext, int index) => GcTypeKind.Other;
public GcTypeKind GetGenericTypeParameter(object? genericContext, int index) => GcTypeKind.Other;

Copilot uses AI. Check for mistakes.
Comment on lines +4085 to +4087
catch (System.Exception)
{
hr = HResults.E_FAIL;
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block sets hr = E_FAIL for all exceptions, which drops the more specific HRESULTs typically returned elsewhere in this file (most methods use hr = ex.HResult). This can change observable behavior for callers (e.g., E_INVALIDARG, E_POINTER, E_NOTIMPL) and makes DEBUG validation against the legacy DAC less meaningful.

Consider setting hr = ex.HResult (and optionally mapping NotImplementedException/missing-contract cases to E_NOTIMPL) to keep HRESULT semantics consistent with the rest of SOSDacImpl.

Suggested change
catch (System.Exception)
{
hr = HResults.E_FAIL;
catch (System.Exception ex)
{
hr = ex.HResult;

Copilot uses AI. Check for mistakes.

/// <summary>
/// Finds the R2R module that contains the given address.
/// Used by FindGCRefMap to resolve m_pZapModule when it's null.
Copy link
Copy Markdown
Member

@jkotas jkotas Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this unconditionally and drop ZapModule from the constract? ZapModule seems to be a nice-to-have cache.

(Also, ZapModule can be renamed to Module or ReadyToRunModule. Zap is a very old codename name for crossgen/ngen that we have almost eradicated from the codebase.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants