Skip to content

runtime/rp2: handle RP2350 shared FIFO IRQ for GC#5482

Open
rdon-key wants to merge 1 commit into
tinygo-org:devfrom
rdon-key:fix-rp2350-gc-deadlock-min
Open

runtime/rp2: handle RP2350 shared FIFO IRQ for GC#5482
rdon-key wants to merge 1 commit into
tinygo-org:devfrom
rdon-key:fix-rp2350-gc-deadlock-min

Conversation

@rdon-key

Copy link
Copy Markdown
Contributor

Fixes #5151.

This PR fixes an RP2350 -scheduler=cores deadlock observed when repeatedly calling runtime.GC().

On Pico 2, the test program below stops at the first GC cycle, while the identical program completes on RP2040 and on RP2350 with -scheduler=tasks.

Before this change:

pico   -scheduler=cores   PASS
pico   -scheduler=tasks   PASS
pico2  -scheduler=tasks   PASS
pico2  -scheduler=cores   FAILS, stops at ok: 0

This points to an RP2350-specific issue in the multicore GC path, not a general spinlock problem.

Root cause

The deadlock is in the SIO FIFO IRQ handling used during the GC stop-the-world phase.

RP2040 and RP2350 have different SIO FIFO IRQ topology:

  • RP2040 has separate SIO FIFO IRQs for each core.
  • RP2350 uses a shared SIO FIFO IRQ for both cores.

The previous common RP2 runtime registered FIFO IRQ handling with a per-core model that works on RP2040, but does not behave correctly on RP2350 during stop-the-world.

Separating the FIFO IRQ setup per chip resolves the deadlock.

Change

  • RP2040: keeps separate per-core FIFO IRQ handlers with fixed core IDs. This preserves the existing behavior.
  • RP2350: registers one shared FIFO IRQ handler and determines the current core at interrupt time with currentCPU().

Hardware spinlock note

The existing hardware spinlock implementation is intentionally left unchanged.

I checked whether this reproduced failure needs to be addressed by switching the runtime to software spinlocks. It does not appear to be necessary for this bug.

The runtime currently uses the following hardware spinlock IDs:

printLock      20
schedulerLock  21
atomicsLock    22
futexLock      23

This PR adds no new hardware spinlock usage, does not change these lock IDs, and does not add writes to the newer SIO registers involved in the RP2350-E2 aliasing issue.

With the hardware spinlock implementation left unchanged, fixing the SIO FIFO IRQ setup is enough for the test below to pass repeatedly on Pico 2. Therefore, spinlock changes are out of scope for this PR.

Test program

package main

import (
	"runtime"
	"time"
)

func main() {
	time.Sleep(2 * time.Second)
	println("Start")
	const N = 10000
	for i := 0; i < N; i++ {
		runtime.Gosched()
		runtime.GC()
		if i%1000 == 0 {
			println("ok:", i)
		}
	}
	println("PASS: survived", N, "cycles")
	for {
		time.Sleep(time.Second)
	}
}

Before this change

With the previous implementation, the test stopped on Pico 2 with -scheduler=cores.

pico2 -scheduler=cores

Connected to COM3. Press Ctrl-C to exit.
Start
ok: 0

The program did not reach PASS.

After this change

All four tested combinations pass:

pico   -scheduler=cores   PASS
pico   -scheduler=tasks   PASS
pico2  -scheduler=cores   PASS
pico2  -scheduler=tasks   PASS

pico2 -scheduler=cores was run multiple times with N = 10000 and completed successfully each time.

Start
ok: 0
ok: 1000
ok: 2000
ok: 3000
ok: 4000
ok: 5000
ok: 6000
ok: 7000
ok: 8000
ok: 9000
PASS: survived 10000 cycles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC-caused scheduler deadlock on RP2350

1 participant