fix: prevent data races and panics in SDK stop/heartbeat handling#457
Conversation
Two concurrency bugs in action_kit_sdk could crash any extension: - The shared stopEvents slice was appended/resliced/ranged from the HTTP stop and status handlers, the heartbeat-timeout goroutine and the signal handler with no lock — a data race that could panic or drop stop events. Guard it with a mutex. - heartbeat.Monitor.Stop closed the pulse channel non-idempotently and RecordHeartbeat did a blocking send, so two stop paths racing (HTTP stop handler vs heartbeat-timeout goroutine) could double-close, and a concurrent RecordHeartbeat could send on a closed channel — both panic in a non-recovered goroutine, crashing the process. Make Stop idempotent and RecordHeartbeat a non-blocking, closed-safe send (guarded by the monitor's mutex), and use LoadAndDelete in stopMonitorHeartbeat so only one caller stops a given monitor. Adds race tests for both paths.
|
Claude finished @joshiste's task in 1m 54s —— View job Review complete ✅Todo list:
SummaryThis is a clean, well-scoped concurrency fix. I traced all call sites for both bugs and the fix closes them correctly:
Heartbeat double-close / send-on-closed (
Tests — No bugs, security issues, or leftover race conditions found. No inline comments needed — the changes are small, correct, and match the stated intent. Note: I was not able to run |
|



Problem
A security/code audit of the shared kit libraries found two CRITICAL concurrency bugs in
action_kit_sdk— both can crash any of the ~33 extensions that use the SDK, and neither is caught byexthttp.PanicRecovery(they occur in non-recovered goroutines / are data races).stopEventsslice data race.markAsStopped(append+stopEvents = stopEvents[1:]) andgetStopEvent(range) run with no lock from the HTTPstop/statushandler goroutines, the heartbeat-timeout goroutine (StopAction) and the signal handler. Concurrent append/reslice/range on a plain slice → data race → torn reads, dropped/duplicated stop events, orslice bounds out of range.Heartbeat channel double-close / send-on-closed panic.
Monitor.Stopclosedpulsenon-idempotently andRecordHeartbeatdid a blocking send. Two stop paths (the HTTPstophandler and the heartbeat-timeout goroutine) canclose(pulse)twice → panic: close of closed channel; a concurrentRecordHeartbeatcan send on a closed channel → panic. These run in a non-recovered goroutine → the whole extension process dies.Fix
stopEventswith a packagesync.Mutex(both accessors).heartbeat.Monitorself-guarding: an internalmu+closedflag makesStopidempotent (close once) andRecordHeartbeata non-blocking, closed-safe send (never panics, never blocks the HTTP status handler; a beat is dropped only if the buffer is already full).stopMonitorHeartbeatusessync.Map.LoadAndDeleteso only one caller stops a given monitor.Adds race tests for both paths (
go test -race).Notes
action_kit_commonsrobustness findings (ociruntime nil-derefs, netfault divide-by-zero, process-wrapper nil-derefs, tc/iptables newline-injection hardening) will follow as a separate PR.Verification
go build ./...,go vet ./..., andgo test -race ./...pass (modulego/action_kit_sdk); existing heartbeat tests still pass.