Skip to content

[Wasm RyuJit] Enable native wasm fast tail calls#129134

Open
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:wasm-fast-tailcalls
Open

[Wasm RyuJit] Enable native wasm fast tail calls#129134
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:wasm-fast-tailcalls

Conversation

@AndyAyersMS

Copy link
Copy Markdown
Member

Set FEATURE_FASTTAILCALL=1 and FEATURE_TAILCALL_OPT=1. Fast tail calls lower to return_call / return_call_indirect. Tag the SP arg so codegen adds compLclFrameSize to undo the prolog adjustment, so the callee receives the incoming shadow-stack pointer.

Set FEATURE_FASTTAILCALL=1 and FEATURE_TAILCALL_OPT=1. Fast tail calls
lower to return_call / return_call_indirect. Tag the SP arg so codegen
adds compLclFrameSize to undo the prolog adjustment, so the callee
receives the incoming shadow-stack pointer.
Copilot AI review requested due to automatic review settings June 8, 2026 18:59
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 8, 2026
@AndyAyersMS

Copy link
Copy Markdown
Member Author

@kg PTAL
fyi @dotnet/wasm-contrib @dotnet/jit-contrib

Passes various Pri-0 tail call tests. We emit ~4K tail calls in SPC.

Using an LIR flag may raise some hackles; happy to consider alternatives.

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Enables WebAssembly fast tail calls in CoreCLR RyuJIT and wires up shadow-stack/SP handling so wasm return_call / return_call_indirect can be emitted correctly.

Changes:

  • Turn on FEATURE_FASTTAILCALL and FEATURE_TAILCALL_OPT for TARGET_WASM.
  • Tag the wasm shadow-stack/SP argument for fast tail calls in RA and adjust it in codegen to undo the prolog’s SP delta.
  • Relax a fast-tailcall eligibility check that is stack-based and not applicable to wasm’s local-based argument passing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/coreclr/jit/targetwasm.h Enables fast tail calls + opportunistic tail calls for wasm.
src/coreclr/jit/regallocwasm.cpp Tags the well-known wasm shadow-stack pointer arg for fast tail calls.
src/coreclr/jit/morph.cpp Skips an arg-stack-space constraint that doesn’t apply to wasm.
src/coreclr/jit/lir.h Adds a wasm-specific LIR flag to mark the fast-tailcall SP arg.
src/coreclr/jit/codegenwasm.cpp Emits INS_end for tailcall “jmp epilog” blocks and adjusts SP arg / return type handling for tail calls.

#define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers
#define FEATURE_FASTTAILCALL 0 // Tail calls made as epilog+jmp
#define FEATURE_TAILCALL_OPT 0 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls.
#define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp. On wasm the "jmp" is the native return_call / return_call_indirect opcode.
Comment on lines +577 to +589
if (callNode->IsFastTailCall())
{
CallArg* const spArg = callNode->gtArgs.FindWellKnownArg(WellKnownArg::WasmShadowStackPointer);
if (spArg != nullptr)
{
GenTree* const argNode = spArg->GetNode();
assert(argNode != nullptr);
assert(argNode->OperIs(GT_PHYSREG));
assert(argNode->AsPhysReg()->gtSrcReg == m_perFuncletData[m_currentFunclet]->m_spReg);

argNode->gtLIRFlags |= LIR::Flags::WasmFastTailCallSp;
}
}
Comment on lines +2418 to +2428
if ((tree->gtLIRFlags & LIR::Flags::WasmFastTailCallSp) != 0)
{
// Fast tail call SP arg: undo the prolog SP adjustment (asserts funclet tail calls don't happen).
assert(m_compiler->funCurrentFuncIdx() == ROOT_FUNC_IDX);
assert(tree->gtSrcReg == GetStackPointerReg(m_compiler->funCurrentFuncIdx()));
if (m_compiler->compLclFrameSize != 0)
{
GetEmitter()->emitIns_I(INS_I_const, EA_PTRSIZE, m_compiler->compLclFrameSize);
GetEmitter()->emitIns(INS_I_add);
}
}
Comment on lines +2593 to +2618
// For a fast tail call wasm requires the callee's result type to match the enclosing
// function's, so derive it from the caller's signature (call->gtType is TYP_VOID).
if (params.isJump)
{
if (m_compiler->info.compRetBuffArg != BAD_VAR_NUM)
{
// The enclosing method returns its struct via a retbuf arg, so the wasm-level
// return is empty.
typeStack.Push(CORINFO_WASM_TYPE_VOID);
}
else if (m_compiler->info.compRetType == TYP_VOID)
{
typeStack.Push(CORINFO_WASM_TYPE_VOID);
}
else if (m_compiler->info.compRetType == TYP_STRUCT)
{
typeStack.Push(
m_compiler->info.compCompHnd->getWasmLowering(m_compiler->info.compMethodInfo->args.retTypeClass));
}
else
{
// Normalize small ints (bool/byte/short/...).
typeStack.Push((CorInfoWasmType)emitter::GetWasmValueTypeCode(
ActualTypeToWasmValueType(m_compiler->info.compRetType)));
}
}

@SingleAccretion SingleAccretion left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of using guaranteed-tailcall return_call for implicit tailcalls?

I was imagining we could use implicit tailcalls for shadow stack only since it has some benefits w.r.t. zero-sized shadow frames.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

What's the benefit of using guaranteed-tailcall return_call for implicit tailcalls?

I was imagining we could use implicit tailcalls for shadow stack only since it has some benefits w.r.t. zero-sized shadow frames.

Not sure what to make of this ... are you saying the underlying engine can do this instead in most cases? Or that's not worth doing in general?

@SingleAccretion

SingleAccretion commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Or that's not worth doing in general?

return_call is the WASM equivalent of .NET .tail. It constrains the final code generator for the benefit of predictable semantics. Fast tailcalls are about performance, so this kind of change should come with some (measured) performance benefit. I don't know whether in the current engines there will be such a benefit. In principle, constraining the code generator should be strictly worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants