fix(platform-wallet): spv error propagation#3810
Conversation
|
Warning Review limit reached
More reviews will be available in 1 minute and 57 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR refactors SPV startup to separate client initialization from background loop spawning. It removes the ChangesSPV Startup and Run Loop Lifecycle
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
00bc228 to
db78106
Compare
|
✅ Review complete (commit 416faba) |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/rs-platform-wallet/src/spv/runtime.rs (1)
196-215:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the task slot reusable after a finished run loop and guard it atomically.
existing.is_some()treats a completedJoinHandleas “still running”, so afterrun()exits the next successful start can return through FFI without ever scheduling sync again. The unlocked check/store split also lets two callers race past the guard and spawn duplicate run loops on the same client.Suggested fix
pub fn spawn_run_loop(self: &Arc<Self>) { - { - let existing = self.task.lock().expect("spv task mutex poisoned"); - if existing.is_some() { - tracing::warn!( - "spawn_in_background called while a task is already running; ignoring" - ); - return; - } - } - let this = Arc::clone(self); + let mut task = self.task.lock().expect("spv task mutex poisoned"); + if matches!(task.as_ref(), Some(handle) if !handle.is_finished()) { + tracing::warn!( + "spawn_run_loop called while a task is already running; ignoring" + ); + return; + } + task.take(); let handle = tokio::spawn(async move { if let Err(e) = this.run().await { tracing::warn!("SpvRuntime background run loop exited with error: {}", e); } }); - *self.task.lock().expect("spv task mutex poisoned") = Some(handle); + *task = Some(handle); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/rs-platform-wallet/src/spv/runtime.rs` around lines 196 - 215, spawn_run_loop currently treats any Some(JoinHandle) as "running" and performs the check/store outside a single mutex hold, allowing races and preventing reuse of finished handles; change it so you hold the task mutex across the check-and-set: lock self.task, if Some(h) && !h.is_finished() then warn+return, otherwise create the tokio::spawn handle while still holding the lock and store it into *self.task; additionally wrap the spawned future so when this.run().await finishes you re-lock self.task and set it to None (so finished JoinHandles are removable); reference symbols: spawn_run_loop, task, run, JoinHandle::is_finished.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/rs-platform-wallet/src/spv/runtime.rs`:
- Around line 196-215: spawn_run_loop currently treats any Some(JoinHandle) as
"running" and performs the check/store outside a single mutex hold, allowing
races and preventing reuse of finished handles; change it so you hold the task
mutex across the check-and-set: lock self.task, if Some(h) && !h.is_finished()
then warn+return, otherwise create the tokio::spawn handle while still holding
the lock and store it into *self.task; additionally wrap the spawned future so
when this.run().await finishes you re-lock self.task and set it to None (so
finished JoinHandles are removable); reference symbols: spawn_run_loop, task,
run, JoinHandle::is_finished.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2407b569-6c40-458a-a34d-74e5f4f4aac6
📒 Files selected for processing (4)
packages/rs-platform-wallet-ffi/src/spv.rspackages/rs-platform-wallet/src/error.rspackages/rs-platform-wallet/src/manager/accessors.rspackages/rs-platform-wallet/src/spv/runtime.rs
💤 Files with no reviewable changes (1)
- packages/rs-platform-wallet/src/error.rs
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
PR correctly propagates SPV start errors to the FFI caller, but breaks the SPV integration test which still uses the now-private run(config) signature — cargo test -p platform-wallet will fail to compile. The new split-lifecycle also opens a half-success window where spawn_run_loop silently no-ops on a stale JoinHandle, and the synchronous start() is now polled on the (small-stack) FFI caller thread instead of via block_on_worker. Several stale references to the old spawn_in_background name remain.
🔴 1 blocking | 🟡 2 suggestion(s) | 💬 3 nitpick(s)
4 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/tests/spv_sync.rs`:
- [BLOCKING] packages/rs-platform-wallet/tests/spv_sync.rs:223-227: Integration test breaks compilation: `run(config)` no longer exists
`SpvRuntime::run` is now a private, zero-argument helper, but this `#[ignore]`d test still calls `manager_for_spv.spv().run(config).await`. `#[ignore]` only skips execution — cargo still typechecks the file, so `cargo test -p platform-wallet --test spv_sync` will fail to compile after this PR. Update the test to drive the new lifecycle: `start(config).await` followed by `spv_arc().spawn_run_loop()` (and await `stop()` plus the join handle on teardown).
In `packages/rs-platform-wallet/src/spv/runtime.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/spv/runtime.rs:196-216: `spv_start` can return Ok with the client started but no run loop driving it
The new split lifecycle reintroduces the silent-success failure mode the PR is trying to eliminate. `run()` clears `self.client` when it exits naturally, but never clears `self.task`, so the `JoinHandle` of a previously error-exited run loop stays parked in `self.task`. On the next FFI call:
1. `start()` succeeds and stores a fresh client.
2. `spawn_run_loop()` observes `self.task.is_some()`, emits a `warn!`, and returns.
3. The FFI returns `PlatformWalletFFIResult::ok()` — but no future is driving `client.run()`.
Additionally, the check-and-set on `self.task` releases the mutex between the `is_some()` check and the `*self.task = Some(handle)` write, so two concurrent spawners can both observe `None` and both spawn run loops on the same client (one handle is then dropped and orphaned).
Fix both at once by: (a) clearing `self.task` (or replacing with `JoinHandle::is_finished()` + take) at the end of `run()`, and (b) holding the `self.task` mutex across the `tokio::spawn` and store so the check-and-set is atomic. Returning a `Result` from `spawn_run_loop` and surfacing it through the FFI would also let the caller observe the failure instead of relying on logs.
In `packages/rs-platform-wallet-ffi/src/spv.rs`:
- [SUGGESTION] packages/rs-platform-wallet-ffi/src/spv.rs:361-366: Poll `spv().start(config)` on the worker runtime, not the FFI caller thread
`runtime().block_on(manager.spv().start(config))` polls the new (substantial) startup future — `PeerNetworkManager::new`, `DiskStorageManager::new`, `DashSpvClient::new` — directly on the foreign caller's thread. `packages/rs-platform-wallet-ffi/src/runtime.rs` explicitly documents that iOS dispatch/concurrency threads have ~512 KB stacks and that async FFI work should use `block_on_worker`, which parks the caller on a oneshot and runs the future on the 8 MB-stack worker. The rest of this crate consistently follows that pattern. Previously this work happened inside `tokio::spawn` on the runtime; the PR's new ordering reintroduces the exact stack-sensitive pattern the runtime module was built to avoid. Switch to `block_on_worker` using the `Arc<SpvRuntime>` so the future is `Send + 'static`.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3810 +/- ##
=========================================
Coverage 71.20% 71.20%
=========================================
Files 20 20
Lines 2837 2837
=========================================
Hits 2020 2020
Misses 817 817
🚀 New features to boost your workflow:
|
db78106 to
5045e85
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Verified PR head 5045e85 against prior findings. Prior #1 (broken spv_sync test) and #5 (malformed run() rustdoc orphan link) are FIXED. Prior #2 (spawn_run_loop check/set race + stale finished JoinHandle blocking restart), #3 (start polled on FFI caller thread instead of block_on_worker), #4 (stale spawn_in_background warning log), and #6 (stale FFI comment) are still present at the listed line numbers. One additional latest-delta concern surfaced by codex agents — dash-spv's own internal startup is performed in DashSpvClient::run(), so some startup-class errors will still only surface from the spawned task — is recorded as a suggestion because the PR does correctly propagate errors from PeerNetworkManager/DiskStorageManager/DashSpvClient::new. No in-scope blocking issues.
🟡 3 suggestion(s) | 💬 2 nitpick(s)
4 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/spv/runtime.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/spv/runtime.rs:195-215: spawn_run_loop can silently skip spawning after a prior run-loop exit, and races check-and-set
Two issues remain at the head:
1. `run()` (137-156) clears `self.client` when the background loop exits but never clears `self.task`. If the prior run loop terminated (naturally or via the `tracing::warn!` error path at 210), `self.task` still holds the finished `JoinHandle`. A subsequent `platform_wallet_manager_spv_start` then: `start(config)` succeeds (client field is `None`), but `spawn_run_loop()` (197-203) sees `self.task.is_some()`, logs a warning, and returns. The FFI returns `ok()` to Swift but no task is driving the just-installed `SpvClient` — the exact silent-success failure mode this PR set out to remove, just shifted one stage later.
2. `spawn_run_loop` holds the `task` mutex only inside the inner block (197-204), then re-acquires it at 214 to install the new `JoinHandle`. Two concurrent callers can both observe `None`, both `tokio::spawn` a run loop, and the second store overwrites the first handle — leaking a `JoinHandle` (the orphan can no longer be awaited or aborted by `stop()`) and double-driving the same `SpvClient`.
Hold the `task` mutex across the entire check-and-set, treat `existing.as_ref().map(JoinHandle::is_finished)` as `None` so a finished handle is replaced, and consider making `spawn_run_loop` return a `Result` so FFI can surface the refusal rather than swallowing it.
- [SUGGESTION] packages/rs-platform-wallet/src/spv/runtime.rs:51-87: Some upstream dash-spv startup still happens inside run() and bypasses FFI propagation
`SpvRuntime::start` returns Ok once `PeerNetworkManager::new`, `DiskStorageManager::new`, and `DashSpvClient::new` succeed — these errors are now properly propagated, which is the PR's stated win. However, `DashSpvClient::run()` (driven from the spawned task at 147-150) is what brings the client to the running state — connecting peers, starting the sync coordinator, etc. Failures in that internal start path still only surface via `tracing::warn!` at 210 and are not visible to the FFI caller, who has already been told `spv_start` succeeded. If the PR's intent is end-to-end startup error propagation, the upstream client's internal start needs to be invoked synchronously from `SpvRuntime::start` (before spawning the run loop) rather than from `run()`.
In `packages/rs-platform-wallet-ffi/src/spv.rs`:
- [SUGGESTION] packages/rs-platform-wallet-ffi/src/spv.rs:361-366: SPV startup is polled on the FFI caller thread instead of block_on_worker
`runtime().block_on(manager.spv().start(config))` polls `PeerNetworkManager::new`, `DiskStorageManager::new`, and `DashSpvClient::new` (storage + crypto + network setup) directly on the foreign caller thread. The crate's `runtime` module documents that iOS dispatch/concurrency threads have small (~512 KB) stacks and that substantial async FFI work should use `block_on_worker`, which polls on the 8 MB-stacked Tokio worker; the rest of `rs-platform-wallet-ffi` follows that pattern uniformly. Pre-PR, this work ran inside the `tokio::spawn`ed run loop and so was already on a worker stack — moving startup ahead of the spawn re-introduces the stack-sensitive path on the caller thread.
Clone the `Arc<SpvRuntime>` outside the closure, `block_on_worker(async move { spv.start(config).await })`, then on success `runtime().enter()` and call `spv.spawn_run_loop()`.
5045e85 to
416faba
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Verified all five prior findings against the current head. Prior finding #2 (FFI start polling on the caller thread) is FIXED — packages/rs-platform-wallet-ffi/src/spv.rs:362-365 now wraps spv.start in block_on_worker, with the runtime entered separately before spawn_run_loop. The other four prior findings are STILL VALID at this head and are carried forward: the spawn_run_loop TOCTOU race + never-cleared stale JoinHandle, DashSpvClient::run() startup phase failures still bypassing FFI propagation, and two stale references to the removed spawn_in_background API in a log message and a comment. No new latest-delta defects beyond the carried-forward issues.
🟡 2 suggestion(s) | 💬 2 nitpick(s)
4 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/spv/runtime.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/spv/runtime.rs:195-215: spawn_run_loop has check-and-set race and never recycles a finished JoinHandle
Two related lifecycle problems remain at this head and undermine the PR's goal that `ok` from `platform_wallet_manager_spv_start` means SPV is actually running:
1. Stale handle blocks restart silently. When the background loop exits, `run()` clears `self.client` at line 153 but never clears `self.task`. A subsequent FFI call to `platform_wallet_manager_spv_start` will: succeed at recreating the client via `spv.start(config)`, then call `spawn_run_loop`, which observes `existing.is_some()` (the finished JoinHandle from the previous run), logs a warning, and returns without spawning. FFI reports `ok`, `is_started()` returns true, but nothing drives sync.
2. TOCTOU race between the two `self.task.lock()` acquisitions. The check at line 197 drops the lock before `tokio::spawn`, and the store at line 214 reacquires it. Two concurrent callers can both observe `None`, both spawn, and the second store orphans the first handle so `stop()` will never await or abort it.
Fix: hold the task mutex across the entire check/cleanup/spawn/store window, and treat `JoinHandle::is_finished()` (or a `take()`) as stale so a finished handle is replaced rather than refused. Returning a `Result` would let FFI distinguish 'already running' from 'silently refused'.
- [SUGGESTION] packages/rs-platform-wallet/src/spv/runtime.rs:137-156: DashSpvClient::run startup failures still bypass FFI propagation
`SpvRuntime::start` correctly propagates failures from `PeerNetworkManager::new`, `DiskStorageManager::new`, and `DashSpvClient::new` constructor errors up to the FFI caller. But `DashSpvClient::run()` (line 148) is what brings the client to the running state — connecting peers, starting the sync coordinator, and starting masternode/ChainLock/InstantSend managers — and its failure is observed inside the spawned task at lines 209-211 and only logged via `tracing::warn!`. By then `platform_wallet_manager_spv_start` has already returned `ok`.
The PR description states 'start errors are no longer discarded', and callers may reasonably assume `ok` means the SPV client is genuinely running. Currently a `client.run()` startup failure leaves `self.client = Some(...)` and `self.task = Some(handle)` populated, so `is_started()` returns true while sync is dead — recreating the silent-failure mode the PR title says it is fixing. To close the gap, extract the deterministic upstream start phase into `SpvRuntime::start` before `spawn_run_loop`, or expose run-loop startup failure via a `oneshot` that FFI awaits before declaring success.
Because start was being called inside the run method, and spawn in background was calling this run method without returning any start error, the spv client could fail to start and nobody was reciting the error. With this PR I force the start method to be called independently, this way start errors are no longer discarded.
Regression tests in PR #3712
Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit
Bug Fixes
Refactor