Skip to content

fix(tool): surface MessagePool service task errors in api_cmd test ctx#6966

Open
0xDevNinja wants to merge 2 commits into
ChainSafe:mainfrom
0xDevNinja:0xdevninja/issue-6505-joinset-errors
Open

fix(tool): surface MessagePool service task errors in api_cmd test ctx#6966
0xDevNinja wants to merge 2 commits into
ChainSafe:mainfrom
0xDevNinja:0xdevninja/issue-6505-joinset-errors

Conversation

@0xDevNinja
Copy link
Copy Markdown
Contributor

@0xDevNinja 0xDevNinja commented Apr 24, 2026

Summary of changes

Changes introduced in this pull request:

  • Own the JoinSet in test_snapshot::ctx and generate_test_snapshot::ctx instead of passing &mut JoinSet::new() into MessagePool::new. The inline temporary was dropped on return, aborting every spawned mpool service task and silently swallowing any errors they emitted pre-cancellation.
  • Move the owned JoinSet into a detached tokio::spawn(drain_mpool_services(...)) that drains it with the canonical while let Some(result) = services.join_next().await loop from daemon/mod.rs::propagate_error, logging errors and non-cancel panics via tracing::warn!.
  • Add a shared drain_mpool_services helper in test_snapshot.rs (visibility pub(super)) and call it from both ctx() sites.

Detaching (vs. awaiting inline) is deliberate: the mpool services are long-lived, so an inline while let Some(...) would never return — that is precisely what caused codecov CI to time out and led to the earlier attempt being reverted from #6493. A detached drainer surfaces errors without blocking the caller.

Reference issue to close (if applicable)

Closes #6505

Other information and links

  • Canonical drain pattern: src/daemon/mod.rs:842-851 (propagate_error).
  • MessagePool::new signature confirmed at src/message_pool/msgpool/msg_pool.rs:~721 — takes services: &mut JoinSet<anyhow::Result<()>>, which matches the generic used here.
  • Prior related PR (different file, same pattern): test: unit test for start_rpc #6493.
  • No CHANGELOG.md entry — this is an internal test utility change and is not user-facing per the changelog guide at the top of that file.
  • No new test: the fix is observability-only (previously-silent warnings now surface via tracing); asserting on log output would be flaky. Existing rpc_regression_tests_print_uncovered passes locally; the heavier regression harness defers to CI.

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

AI Usage Disclosure

This PR was prepared with assistance from Claude Code (Anthropic). Extent:

  • Issue analysis, implementation, self-review, and PR drafting were AI-assisted.
  • All changes were compiled and tested locally by me before pushing — cargo fmt --all -- --check, cargo clippy --all-targets --no-deps -- -D warnings (with FOREST_F3_SIDECAR_FFI_BUILD_OPT_OUT=1), cargo check --lib --tests, and cargo test --lib rpc_regression_tests_print_uncovered (1/1 pass) all green.
  • I have reviewed every diff line and take full responsibility for correctness.

Summary by CodeRabbit

  • Tests
    • Track and drain background message-pool tasks created during snapshot tests.
    • Immediately abort these background tasks after setup, then await their completion so errors or panics are surfaced rather than silently cancelled.
    • Add warning-level logging for task errors and panics to improve test diagnostics and make failures clearer.

Review Change Stack

@0xDevNinja 0xDevNinja requested a review from a team as a code owner April 24, 2026 06:33
@0xDevNinja 0xDevNinja requested review from LesnyRumcajs and hanabi1224 and removed request for a team April 24, 2026 06:33
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 24, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 23bfb6dc-7ce5-48db-9342-98212c4e2b96

📥 Commits

Reviewing files that changed from the base of the PR and between 85a3703 and 860ec51.

📒 Files selected for processing (1)
  • src/tool/subcommands/api_cmd/test_snapshot.rs

Walkthrough

A shared JoinSet<anyhow::Result<()>> is created and passed into MessagePool::new, immediately aborted, and then drained by a new drain_mpool_services async function that logs task errors and panics while ignoring cancellations.

Changes

JoinSet Error Handling in Test Utilities

Layer / File(s) Summary
Create shared JoinSet and abort after MessagePool construction
src/tool/subcommands/api_cmd/test_snapshot.rs, src/tool/subcommands/api_cmd/generate_test_snapshot.rs
Instantiates a shared JoinSet<anyhow::Result<()>>, passes it into MessagePool::new, and calls abort_all() immediately after construction; spawns a background task to drain the join set.
Drain JoinSet and log outcomes
src/tool/subcommands/api_cmd/test_snapshot.rs
Adds pub(super) async fn drain_mpool_services(mut services: JoinSet<anyhow::Result<()>>) which loops on join_next(), logs warnings for task errors and panics, and ignores cancelled joins.

Sequence Diagram

sequenceDiagram
  participant ctx
  participant MessagePool
  participant JoinSet
  participant drain_task

  ctx->>JoinSet: create JoinSet<anyhow::Result<()>>
  ctx->>MessagePool: MessagePool::new(&mut JoinSet)
  ctx->>JoinSet: abort_all()
  ctx->>drain_task: spawn drain_mpool_services(JoinSet)
  drain_task->>JoinSet: join_next() loop
  JoinSet-->>drain_task: Result (Ok / Err / Cancelled)
  drain_task->>drain_task: log warnings for Err and panic, ignore Cancelled
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • LesnyRumcajs
  • hanabi1224
  • sudo-shashank
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: surfacing MessagePool service task errors in the api_cmd test context, which aligns with the primary objective of handling JoinSet task errors.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #6505: adopting the canonical draining pattern with proper error/panic logging, applying fixes to both target files, and implementing services.abort_all() to prevent test blocking while surfacing pre-abort spawn errors.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue #6505: modifying only the two specified test utility files to handle JoinSet task errors properly with detached draining and abort_all() calls.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@akaladarshi akaladarshi added the RPC requires calibnet RPC checks to run on CI label Apr 28, 2026
@LesnyRumcajs
Copy link
Copy Markdown
Member

Seems to timeout the tests. Please first assert locally that it works.

@LesnyRumcajs LesnyRumcajs marked this pull request as draft April 29, 2026 11:03
@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from c0581cb to 63bad6a Compare May 13, 2026 14:18
@0xDevNinja
Copy link
Copy Markdown
Contributor Author

Hey, thanks for catching that — local repro confirmed your concern. With just the detached drain_mpool_services spawn, the mpool background reactors stayed alive for the duration of each snapshot test. Running the full snapshot suite (370 tests) with --test-threads=8 had 5 tests stall on the parallel runtime under contention, exactly matching the codecov pattern from #6493.

Pushed a follow-up: services.abort_all() right after MessagePool::new(...) in both test_snapshot::ctx and generate_test_snapshot::ctx. The mpool services were never meant to run in this snapshot ctx (the inherited &mut JoinSet::new() pattern was a temporary that dropped immediately — same end state). The detached drain still polls the aborted JoinSet so any pre-abort spawn error or panic is surfaced via tracing::warn! instead of being silently dropped.

Locally:

$ FOREST_TEST_SKIP_PROOF_PARAM_CHECK=1 cargo test --lib --features cargo-test cargo_test_rpc_snapshot_test_filecoin_ -- --test-threads=8
test result: ok. 370 passed; 0 failed; 0 ignored; 0 measured; 2045 filtered out; finished in 31.87s

No over 60 seconds warnings, no panics. The single-commit branch is rebased on main and CI should pick it up shortly — happy to undraft once you take a look.

@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch 2 times, most recently from 63a18e4 to 6309c54 Compare May 14, 2026 10:39
@0xDevNinja 0xDevNinja marked this pull request as ready for review May 14, 2026 12:45
@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from 6309c54 to 70c7855 Compare May 18, 2026 06:41
Both `test_snapshot::ctx` and `generate_test_snapshot::ctx` passed
`&mut JoinSet::new()` to `MessagePool::new`. The temporary was dropped
on return, aborting every service task spawned by the pool and silently
swallowing any error they emitted pre-cancellation.

Own the `JoinSet` locally and move it into a detached `tokio::spawn`
that drains it with the canonical `while let Some(result) =
services.join_next().await` loop from `daemon/mod.rs::propagate_error`,
logging errors and non-cancel panics via `tracing::warn!`. Detaching
(vs. awaiting inline) avoids the codecov-timeout regression that caused
the earlier attempt to be reverted in ChainSafe#6493: the mpool services are
long-lived and would never complete the inline await.

Refs ChainSafe#6505
@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from 70c7855 to 85a3703 Compare May 19, 2026 06:46
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.22%. Comparing base (e0bca2a) to head (85a3703).
⚠️ Report is 16 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...tool/subcommands/api_cmd/generate_test_snapshot.rs 0.00% 4 Missing ⚠️
src/tool/subcommands/api_cmd/test_snapshot.rs 66.66% 4 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
...tool/subcommands/api_cmd/generate_test_snapshot.rs 7.00% <0.00%> (-0.10%) ⬇️
src/tool/subcommands/api_cmd/test_snapshot.rs 82.09% <66.66%> (-1.35%) ⬇️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f4957f...85a3703. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@hanabi1224 hanabi1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hanabi1224 hanabi1224 enabled auto-merge May 26, 2026 11:18
@hanabi1224 hanabi1224 disabled auto-merge May 26, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle JoinSet task errors in test utility functions

5 participants