Skip to content

fix(consensus): Enforce Verifier L1 Confs In Derivation#2259

Open
refcell wants to merge 8 commits intomainfrom
rf/fix-consensus/verifier-l1-confs-safe-head
Open

fix(consensus): Enforce Verifier L1 Confs In Derivation#2259
refcell wants to merge 8 commits intomainfrom
rf/fix-consensus/verifier-l1-confs-safe-head

Conversation

@refcell
Copy link
Copy Markdown
Contributor

@refcell refcell commented Apr 17, 2026

Summary

The verifier_l1_confs (aka BASE_NODE_VERIFIER_L1_CONFS) configuration was not actually constraining the derivation pipeline's view of L1. The L1 watcher delayed the head signal sent to the derivation actor, but the pipeline's AlloyChainProvider fetched L1 blocks directly from the RPC with no upper bound, so the safe head advanced as if no confirmation depth were set.

This adds a ConfDepthProvider wrapper (matching op-node's ConfDepth pattern) that intercepts block_info_by_number calls and returns a temporary error when the requested block exceeds l1_head - conf_depth, causing the pipeline to yield. The L1 watcher now updates a shared atomic with the real L1 head on every poll, and the pipeline reads it to enforce the cutoff. The fix also wires BASE_NODE_VERIFIER_L1_CONFS into the docker-compose devnet so it can be tested locally with just devnet up and the included compare-heads.sh script.

The verifier_l1_confs config delayed the L1 head signal sent to the
derivation actor, but the pipeline's AlloyChainProvider fetched L1
blocks directly with no upper bound. The safe head was unaffected
because the signal was only used as a wake-up trigger — the pipeline
could still derive from the latest L1 data.

Add a ConfDepthProvider wrapper (matching op-node's ConfDepth pattern)
that intercepts block_info_by_number calls and returns a temporary
BlockNotFound error when the requested block is beyond l1_head minus
conf_depth. The L1 watcher now updates a shared Arc<AtomicU64> with
the real L1 head on every poll, and the pipeline reads it to enforce
the cutoff.
@refcell refcell added bug Flag: Something isn't working consensus Area: consensus labels Apr 17, 2026
@refcell refcell self-assigned this Apr 17, 2026
@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented Apr 17, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
base Ignored Ignored Preview Apr 17, 2026 5:44pm

Request Review

Comment thread crates/consensus/providers/src/conf_depth.rs Outdated
Comment thread crates/consensus/service/src/service/node.rs Outdated
Comment thread crates/consensus/service/src/service/node.rs Outdated
Comment thread etc/docker/devnet-env
Comment thread crates/consensus/service/src/service/follow.rs
@refcell refcell requested a review from danyalprout April 17, 2026 15:28
@refcell refcell marked this pull request as ready for review April 17, 2026 15:44
refcell added a commit that referenced this pull request Apr 17, 2026
Backport of #2259 (includes prerequisite feat from #2226).
Comment thread crates/consensus/providers/src/pipeline.rs
…test

The test was sampling safe/finalized block tags immediately after
waiting for gossip-synced latest blocks. The safe head requires the
batcher → L1 mining → derivation pipeline cycle, which takes longer.
Both nodes showed safe=0 (genesis) because derivation hadn't confirmed
any blocks yet when sampling started.

- Add explicit 120s wait for builder safe head > 0 before sampling
- Extend sampling from 5 to 10 rounds (20s window) so the client's
  delayed derivation also has time to advance and show the lag
- Fix BUG annotation to only fire when b > 0 && b == c (not when
  both are 0, which is just 'not ready yet')
refcell added 3 commits April 17, 2026 11:32
…for CI

Lower VERIFIER_L1_CONFS from 4 to 2 (4s lag instead of 8s) to reduce the
time CI needs for the client to show a lagging safe head on slow L1.

Increase the builder safe-head wait from 120s to 300s so the
batcher→L1→derivation cycle has room to complete on resource-constrained
CI runners.
@github-actions
Copy link
Copy Markdown
Contributor

Review Summary

The fix correctly identifies and addresses the root cause: the derivation pipeline's AlloyChainProvider was fetching L1 blocks directly from the RPC without respecting the verifier_l1_confs cutoff, bypassing the L1 watcher's head-delay signal.

The ConfDepthProvider wrapper is well-designed — gating only block_info_by_number is sufficient because the pipeline can only discover new L1 origins through number-based lookups (PollingTraversal.advance_origin()). All hash-based methods (header_by_hash, receipts_by_hash, block_info_and_transactions_by_hash) are downstream consumers that require a hash obtained from a previously discovered block.

Ordering::Relaxed is appropriate for the L1HeadNumber atomic: there's no other shared state that needs happens-before ordering, and a stale read is conservatively safe (the pipeline yields and retries).

Issues raised in inline comments (prior run, still applicable)

  1. etc/docker/devnet-env: The comment says "Try 4" but the value is 15. A non-zero default will surprise anyone running just devnet up. Recommend defaulting to 0.

  2. crates/consensus/providers/src/pipeline.rs: The #[allow(clippy::too_many_arguments)] on both new and new_polled could be avoided by accepting a pre-built ConfDepthProvider instead of (AlloyChainProvider, L1HeadNumber, u64).

  3. crates/consensus/service/src/service/follow.rs: The Arc::new(AtomicU64::new(0)) passed to the L1 watcher is a dead write — FollowNode has no ConfDepthProvider consuming it. Worth a doc comment noting this.

Architecture note

The L1 watcher's existing head-delay mechanism (fetching block at head - conf_depth and sending that to derivation) and the new ConfDepthProvider both enforce the same constraint. The PR description explains why both are needed — the old mechanism alone was insufficient. With the ConfDepthProvider now in place, the old mechanism provides defense-in-depth: the watcher delays the L1 origin signal while the provider prevents the pipeline from reading ahead. This belt-and-suspenders approach is reasonable.

Test coverage is thorough — unit tests for the ConfDepthProvider boundary conditions, integration tests wiring the L1 watcher atomic to the provider, and a full devnet E2E test verifying safe head lag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Flag: Something isn't working consensus Area: consensus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants