Skip to content

Fix SIGILL crash on ARM64 platforms with SME but no SVE #127398

Open
AndyAyersMS wants to merge 3 commits intodotnet:mainfrom
AndyAyersMS:fix/122608-sve-sigill
Open

Fix SIGILL crash on ARM64 platforms with SME but no SVE #127398
AndyAyersMS wants to merge 3 commits intodotnet:mainfrom
AndyAyersMS:fix/122608-sve-sigill

Conversation

@AndyAyersMS
Copy link
Copy Markdown
Member

Replace CONTEXT_GetSveLengthFromOS() calls in signal context handling with direct reads of sve->vl from the kernel-provided signal frame. The CONTEXT_GetSveLengthFromOS function executes the SVE 'rdvl' instruction, which causes SIGILL on platforms that have SME (streaming SVE) but not standalone SVE — such as Apple M4 under macOS Virtualization.Framework with Podman/Colima.

On these platforms, the Linux kernel includes an SVE_MAGIC record in signal frames (with vl=0 and minimal size) due to SME's streaming SVE mode, but the CPU does not support SVE instructions. When a signal fires (e.g. SIGUSR1 for activation injection), CONTEXTFromNativeContext sees the SVE record and calls rdvl, which triggers SIGILL. The SIGILL handler then tries to capture context again, hitting rdvl recursively.

The fix uses sve->vl from the signal frame directly, which is always available when an SVE context record is present. On real SVE hardware, sve->vl equals what rdvl would return. On SME-only platforms, sve->vl is 0, so the SVE register save/restore is correctly skipped.

Fixes #122608

Replace CONTEXT_GetSveLengthFromOS() calls in signal context handling with
direct reads of sve->vl from the kernel-provided signal frame. The
CONTEXT_GetSveLengthFromOS function executes the SVE 'rdvl' instruction,
which causes SIGILL on platforms that have SME (streaming SVE) but not
standalone SVE — such as Apple M4 under macOS Virtualization.Framework
with Podman/Colima.

On these platforms, the Linux kernel includes an SVE_MAGIC record in
signal frames (with vl=0 and minimal size) due to SME's streaming SVE
mode, but the CPU does not support SVE instructions. When a signal fires
(e.g. SIGUSR1 for activation injection), CONTEXTFromNativeContext sees
the SVE record and calls rdvl, which triggers SIGILL. The SIGILL handler
then tries to capture context again, hitting rdvl recursively.

The fix uses sve->vl from the signal frame directly, which is always
available when an SVE context record is present. On real SVE hardware,
sve->vl equals what rdvl would return. On SME-only platforms, sve->vl is
0, so the SVE register save/restore is correctly skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 19:31
@AndyAyersMS
Copy link
Copy Markdown
Member Author

@janvorli PTAL

Pre-fix, dotnet new classlib run in a linux container on an M4 would fail about 5% of the time. Post-fix no failures were observed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an ARM64 SIGILL crash in CoreCLR PAL signal-context handling on platforms that provide an SVE signal-frame record but don’t support executing SVE instructions (e.g., SME-only environments). It does so by avoiding the SVE rdvl instruction and instead using the kernel-provided sve_context::vl value from the signal frame.

Changes:

  • Replace CONTEXT_GetSveLengthFromOS() (uses rdvl) with direct reads of sve->vl when handling SVE state in signal contexts.
  • Add explanatory comments documenting the SME-without-SVE scenario and why rdvl must be avoided in this path.

Comment on lines +903 to 906
if (sve->vl == 16)
{
_ASSERT((lpContext->XStateFeaturesMask & XSTATE_MASK_ARM64_SVE) == XSTATE_MASK_ARM64_SVE);

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CONTEXTToNativeContext, the SVE block later derives vq from lpContext->Vl and relies only on an _ASSERTE to ensure lpContext->Vl == sve->vl. Since _ASSERTE is compiled out in non-debug builds, a mismatch in release could make the subsequent offset calculations write to the wrong locations in the signal frame. Consider computing vq from sve->vl (the actual frame layout) and using a non-assert runtime check for equality before writing, otherwise skip updating the SVE region.

Copilot uses AI. Check for mistakes.
Comment thread src/coreclr/pal/src/thread/context.cpp Outdated
// (e.g. Apple M4 with SME streaming SVE under Virtualization.Framework).
if (sve->vl == 16)
{
_ASSERTE((sve->vl > 0) && (sve->vl % 16 == 0));
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside if (sve->vl == 16), the _ASSERTE((sve->vl > 0) && (sve->vl % 16 == 0)) becomes redundant (it can only ever be true). Consider removing it or changing it to assert the condition that actually matters here (e.g., that the context record size matches the expected minimum for VL=16).

Suggested change
_ASSERTE((sve->vl > 0) && (sve->vl % 16 == 0));
_ASSERTE(sve->head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(16)));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Comment thread src/coreclr/pal/src/thread/context.cpp Outdated
// (e.g. Apple M4 with SME streaming SVE under Virtualization.Framework).
if (sve->vl == 16)
{
_ASSERTE((sve->vl > 0) && (sve->vl % 16 == 0));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - this assert is now useless

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to replace this with the other assert suggested above?

- Remove the now-tautological _ASSERTE((sve->vl > 0) && (sve->vl % 16 == 0))
  inside the 'if (sve->vl == 16)' block (janvorli).
- In CONTEXTToNativeContext, derive vq from sve->vl (the signal frame's
  authoritative layout) instead of lpContext->Vl (copilot-reviewer).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@am11
Copy link
Copy Markdown
Member

am11 commented Apr 24, 2026

These are the only two usages, we can delete its definition:

src/coreclr/pal/src/arch/arm64/context2.S:303:    LEAF_ENTRY CONTEXT_GetSveLengthFromOS, _TEXT
src/coreclr/pal/src/arch/arm64/context2.S:306:    LEAF_END CONTEXT_GetSveLengthFromOS, _TEXT
src/coreclr/pal/src/include/pal/context.h:1628:    CONTEXT_GetSveLengthFromOS
src/coreclr/pal/src/include/pal/context.h:1637:CONTEXT_GetSveLengthFromOS(

Also, the comment can be reworded to avoid mentions of CONTEXT_GetSveLengthFromOS (after the definitions are deleted).

@am11
Copy link
Copy Markdown
Member

am11 commented Apr 24, 2026

Other places using rdvl:

src/coreclr/jit/codegenarm64test.cpp:6016:    theEmitter->emitIns_R_I(INS_sve_rdvl, EA_8BYTE, REG_R0, -32); // RDVL <Xd>, #<imm>
src/coreclr/jit/codegenarm64test.cpp:6017:    theEmitter->emitIns_R_I(INS_sve_rdvl, EA_8BYTE, REG_R5, 0);   // RDVL <Xd>, #<imm>
src/coreclr/jit/codegenarm64test.cpp:6018:    theEmitter->emitIns_R_I(INS_sve_rdvl, EA_8BYTE, REG_R10, 5);  // RDVL <Xd>, #<imm>
src/coreclr/jit/codegenarm64test.cpp:6019:    theEmitter->emitIns_R_I(INS_sve_rdvl, EA_8BYTE, REG_R15, 31); // RDVL <Xd>, #<imm>
src/coreclr/jit/emitarm64.cpp:1134:        case IF_SVE_BC_1A: // rdvl
src/coreclr/jit/emitarm64sve.cpp:1627:        case INS_sve_rdvl:
src/coreclr/jit/instrsarm64sve.h:2103:INST1(rdvl,              "rdvl",                  0,                       IF_SVE_BC_1A,            0x04BF5000                                   )
src/coreclr/vm/arm64/asmhelpers.S:29:        rdvl    x0, 1
src/coreclr/vm/arm64/asmhelpers.asm:92:        rdvl    x0, 1

- Remove unused CONTEXT_GetSveLengthFromOS definition (context2.S) and
  declaration (context.h) since all callers now use sve->vl (am11).
- Reword comments to not reference the deleted function (am11).
- Replace redundant assert with meaningful size check (janvorli/AndyAyersMS):
  _ASSERTE(sve->head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(16)))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AndyAyersMS
Copy link
Copy Markdown
Member Author

AndyAyersMS commented Apr 24, 2026

Other places using rdvl:

JIT usage should be ok. Let me check the helpers. Looks like they're only invoked after we have verified SVE support. So they seem ok too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET 10 SDK ARM64: Illegal instruction (SIGILL) on Apple M4 with macOS Virtualization.Framework

4 participants