Design discussion raised by the gale team in #178 (separate from the optimizer bug fixed in PR #179). PulseEngine owns both sides (synth output + the gale/Zephyr host), so we can co-design the contract.
Problem
synth addresses linear memory as [fp + offset], fp=__linear_memory_base. A wasm function dereferencing a native host pointer passed in r0 (e.g. z_impl_k_sem_give(struct k_sem *sem)) computes [fp + sem], which ≠ the real sem unless __linear_memory_base == 0.
Options (from #178)
__linear_memory_base = 0 / native-pointer mode for --relocatable. [fp + r0] = [r0] = correct native deref. Catch: module-internal statics at low linmem offsets would land at base 0 (flash). Host can force r11 = 0 (callee-saved) around the call.
- Marshalling trampoline (host copies fields in/out). Correct today, but marshalling cost is in the measurement (≈
GALE_USE_SYNTH class, not LTO-parity).
- Per-param "native pointer" attribute — selected pointer params deref at base 0 / directly; module-internal data keeps the linmem base. True native drop-in and optimized codegen. (Preferred long-term.)
Decisions to make
- Should
--relocatable (or a new --native-abi) zero the linmem base, or expose __linear_memory_base as a documented linker symbol the host sets?
- Native-pointer-param annotation: source (WIT/attribute) vs. a synth flag vs. inferred?
&extern_data lowering: does synth emit R_ARM_MOVW_ABS_NC+R_ARM_MOVT_ABS (or R_ARM_ABS32) for an address-of-extern? Needs verification — required for option 1 (the extern gale_wasm_shim_lock path).
Related
#34 (Meld host model), #170 (standalone --cortex-m resolution + $t symbols), #171 (i64 regalloc — resurfaces on z_impl once the lock is extern), #180 (re-enable optimized memory path).
cc the gale team (#178 thread) — what shape works best on the kernel side?
Design discussion raised by the gale team in #178 (separate from the optimizer bug fixed in PR #179). PulseEngine owns both sides (synth output + the gale/Zephyr host), so we can co-design the contract.
Problem
synth addresses linear memory as
[fp + offset],fp=__linear_memory_base. A wasm function dereferencing a native host pointer passed inr0(e.g.z_impl_k_sem_give(struct k_sem *sem)) computes[fp + sem], which ≠ the realsemunless__linear_memory_base == 0.Options (from #178)
__linear_memory_base = 0/ native-pointer mode for--relocatable.[fp + r0]=[r0]= correct native deref. Catch: module-internal statics at low linmem offsets would land at base 0 (flash). Host can forcer11 = 0(callee-saved) around the call.GALE_USE_SYNTHclass, not LTO-parity).Decisions to make
--relocatable(or a new--native-abi) zero the linmem base, or expose__linear_memory_baseas a documented linker symbol the host sets?&extern_datalowering: does synth emitR_ARM_MOVW_ABS_NC+R_ARM_MOVT_ABS(orR_ARM_ABS32) for an address-of-extern? Needs verification — required for option 1 (theextern gale_wasm_shim_lockpath).Related
#34 (Meld host model), #170 (standalone
--cortex-mresolution +$tsymbols), #171 (i64 regalloc — resurfaces onz_implonce the lock isextern), #180 (re-enable optimized memory path).cc the gale team (#178 thread) — what shape works best on the kernel side?