Skip to content

feat(eth-sparse-mpt): add ~3x faster v3 state-root implementation and harness tools#887

Open
bertmiller wants to merge 5 commits intodevelopfrom
v3-stateroot-work
Open

feat(eth-sparse-mpt): add ~3x faster v3 state-root implementation and harness tools#887
bertmiller wants to merge 5 commits intodevelopfrom
v3-stateroot-work

Conversation

@bertmiller
Copy link
Member

@bertmiller bertmiller commented Feb 28, 2026

📝 Summary

I will state upfront that this code is almost entirely written with Codex. However, the results seem promising and it shares much overlap with the existing v2 state root anyway.

I asked Codex to setup a harness that measured performance across different depths of cache warmth and compared state root implementations to each other and Reth's for correctness. Then I asked it to optimize v2 and implement a v3 state root. This was effective because the agent could simply iterate if it noticed it had the wrong root or if it was slower. Finally, I ran the harness for 1000+ blocks and the new v3 always produced the same root as v2 and reth, so I have some confidence in the implementation. You can find the testing scripts in this PR, specifically correct_and_bench.rs. Note that these require unwinding a reth node block by block (thank you Vitaly for this trick).

The results are a significant speedup relative to v2. Over 1,000 blocks:

Scope v2 p50 v3 p50 v2 p99 v3 p99 v2 mean v3 mean
Overall (excluding warm=100, n=3213) 3.01 0.88 4.71 1.87 3.19 0.97
Warm % v2 p50 v3 p50 p50 speedup v2 p99 v3 p99 p99 speedup v2 mean v3 mean mean speedup
70 4.16 1.09 3.82x 5.96 2.22 2.68x 4.32 1.21 3.58x
80 3.05 0.94 3.26x 4.75 1.93 2.46x 3.23 1.03 3.14x
90 1.83 0.61 3.00x 3.41 1.45 2.35x 2.00 0.68 2.95x

Here's a moving average:
image

The one notable regression is that in ~2.5% of blocks there were v3 p99 cases that were greater than v2. On average these regressions added 1.89ms, and the worst observed was 23.9ms for a single block.

💡 Motivation and Context

Calculating the state root is in the hot path for a new block so making it faster lowers latency for all our users.


✅ I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@claude
Copy link
Contributor

claude bot commented Feb 28, 2026

Claude finished @bertmiller's task in 3m 26s —— View job


PR Review: v3 state-root implementation

  • Read repository guidelines
  • Gather context and understand the changes
  • Review v3 core implementation (v3/mod.rs, v3/trie/mod.rs)
  • Review v3 supporting files (v3/fetch.rs, v3/trie/proof_store.rs, v3/trie/tests.rs)
  • Review harness/bin files
  • Review lib.rs and Cargo changes
  • Post inline comments for issues found
  • Post final review summary

Summary

The v3 implementation is structurally a copy of v2 with the same overall architecture (arena-based trie, proof store, incremental revert/apply pattern). The 3x speedup likely comes from the optimized data layout (SoA with Range<usize> indices into flat Vec<u8> buffers instead of heap-allocated Vec<u8> per node). The harness tools for correctness and benchmarking are thorough.

Key Issues

Production stability concerns:

  1. Panicking assertions on the hot path — The retry loops in calculate_root_hash_with_sparse_trie (lines 996, 1018, 1036, 1061 of v3/mod.rs) use assert!(loop_break, ...) which will crash the builder if convergence doesn't happen within 10 iterations. These should return errors instead. See inline comment

  2. Potential index-out-of-bounds panicinsert_nibble_slice accesses ins_key[path_walked] without checking that path_walked < ins_key.len() when encountering a branch node. See inline comment

Correctness/concurrency:
3. TOCTOU race in ProofStore::add_proof — The contains_key/insert pair on the DashMap is not atomic. Concurrent callers with the same key can both pass the check and both insert, appending duplicate entries. See inline comment

  1. Unsafe aliasing in parallel hashingcalculate_rlp_pointer_node creates &mut references from &self via raw pointer casts. While the tree structure ensures disjoint indices, this technically violates Rust's aliasing rules and is UB under Miri. See inline comment

Design/maintenance:
5. Confusing type aliasesSharedCacheV3 = SharedCacheV2 and RootHashCalculatorV3 = RootHashCalculator where SharedCacheV2 is a local struct (not the v2 module's type) is misleading. See inline comment

  1. Unbounded arena growth — The Trie append-only arena (values, keys, branch_node_children) never reclaims memory from deleted/updated nodes. Over long-running sessions this is a slow memory leak. See inline comment

Minor:
7. Typos: "hahsing", "fethed", "addres"see inline comments

Other Observations

  • The v2 changes (silencing KeyNotFound errors during delete/revert) are a behavior change that could mask bugs — previously these were hard errors. The PR description doesn't call this out. Worth confirming these changes are intentional and tested.
  • The v3 module is ~4700 new lines, most of which is structurally identical to v2. This creates a significant maintenance burden — any bug fix in v2's trie logic will need to be applied to v3 as well. Consider whether the v3 Trie could share more code with v2 (e.g., via generics or a shared trie implementation with different node layouts).
  • The three new binaries (correctness-harness, cache-warm-compare, correct_and_bench) add heavy dependencies to the crate (reth-db, reth-evm, reth-node-ethereum, etc.). These are useful development tools but significantly increase compile times. Consider gating them behind a feature flag or moving them to a separate crate.

Comment on lines +59 to +61
if self.proofs.contains_key(&key) {
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU race: contains_key + insert are not atomic on the DashMap. Two threads with the same key can both pass the contains_key check, both decode the proof, and both insert — doing redundant work and appending duplicate entries to keys/values/rlp_ptrs. Consider using self.proofs.entry(key).or_try_insert_with(|| { ... }) instead, or at minimum use insert with or_insert to make this safe. This same pattern exists in v2, but worth fixing here since this is new code.

Comment on lines +786 to +806
fn calculate_rlp_pointer_node(
&self,
node_idx: usize,
rlp: &mut Vec<u8>,
proof_store: &ProofStore,
) {
self.rlp_encode_node(node_idx, rlp, proof_store);
let result =
unsafe { &mut *(self.rlp_ptrs_local.as_ptr().add(node_idx) as *mut ArrayVec<u8, 33>) };
result.clear();
if rlp.len() < 32 {
result.try_extend_from_slice(rlp).unwrap();
} else {
let hash = keccak256(rlp);
result.push(EMPTY_STRING_CODE + 32);
result.try_extend_from_slice(hash.as_slice()).unwrap();
}

let hashed_node = unsafe { &mut *(self.hashed_nodes.as_ptr().add(node_idx) as *mut bool) };
*hashed_node = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe block creates mutable references from shared &self to update rlp_ptrs_local and hashed_nodes. The comment says this is safe because root_hash takes &mut self, but root_hash actually takes &self (line 811), and root_hash_node spawns rayon tasks that can mutate concurrently. This is undefined behavior when two rayon tasks attempt to hash different children of the same branch — they both call calculate_rlp_pointer_node through &self, creating aliasing &mut references to different indices.

The tree structure guarantees that different spawned tasks hash disjoint subtrees (so different indices), which makes this safe in practice, but the Rust aliasing model is not index-aware — two &mut from the same Vec are UB even if they target different indices. This should use UnsafeCell or Cell wrappers to be sound according to Rust's aliasing rules, or at minimum add // SAFETY: comments documenting why concurrent access to distinct indices is safe.

(I note this pattern is inherited from v2, but worth flagging.)

Comment on lines +226 to +246
fn insert_nibble_slice(
&mut self,
ins_key: &[u8],
insert_value: InsertValue<'_>,
) -> Result<Option<Range<usize>>, NodeNotFound> {
let mut current_node = 0;
let mut path_walked = 0;

let mut old_value = None;

loop {
let node = self
.nodes
.get(current_node)
.ok_or_else(|| NodeNotFound::new(ins_key))?;
self.hashed_nodes[current_node] = false;
match node {
DiffTrieNode::Branch { children } => {
let children = *children;

let n = ins_key[path_walked] as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential panic: ins_key[path_walked] on line 246 will panic with index-out-of-bounds if path_walked == ins_key.len() when we reach a Branch node. This can happen when the key has been fully consumed but we hit a branch (i.e., the key is a prefix of existing keys). Consider adding a bounds check or returning an error instead of panicking.

Comment on lines 996 to 1012
for _ in 0..10 {
stats.start();
let ok = self.process_storage_tries_updates()?;
stats.measure_insert(true);
if !ok {
stats.start();
self.fetch_missing_storage_nodes(&consistent_db_view, &mut stats)?;
stats.measure_proof_fetch(true);
continue;
}
stats.start();
self.hash_storage_tries();
stats.measure_hash(true);
loop_break = true;
break;
}
assert!(loop_break, "storage trie are not processed after 10 iters");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 10 for the retry loop upper bound is used in multiple places (storage tries, account tries, account trie proofs) with assert!(loop_break, ...) that will panic in production if the loop doesn't converge. Consider:

  1. Making this a named constant (e.g., const MAX_FETCH_ITERATIONS: usize = 10;)
  2. Returning an error instead of panicking on exceeding max iterations — a panic in a hot path will bring down the builder

Comment on lines +30 to +31
pub type SharedCacheV3 = SharedCacheV2;
pub type RootHashCalculatorV3 = RootHashCalculator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type aliases (SharedCacheV3 = SharedCacheV2, RootHashCalculatorV3 = RootHashCalculator) are confusing. The V3 module defines its own SharedCacheV2 and RootHashCalculator structs locally (not re-exported from v2), so these aliases just rename the local types. This makes it look like V3 shares V2's cache at the integration level (in lib.rs), when in fact each module has independent copies of identically-structured types. Consider renaming the local structs to SharedCacheV3 / RootHashCalculatorV3 directly to avoid the confusing indirection.

let hash = storage_calc
.trie
.root_hash(PARALLEL_HASHING_STORAGE_NODES, &storage_calc.proof_store)
.expect("missing node while hahsing storage trie");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "missing node while hahsing storage trie""missing node while hashing storage trie"

Comment on lines +1076 to +1077
"Proof was not fethed correctly"
)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Proof was not fethed correctly""Proof was not fetched correctly"

}

impl StorageCalculator {
fn new(addres: Address, shared_cache: &SharedCacheV2) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: addresaddress

Comment on lines +47 to +60
#[derive(Debug, Clone, Default)]
pub struct Trie {
// 3 arrays below are of the same length
hashed_nodes: Vec<bool>,
rlp_ptrs_local: Vec<ArrayVec<u8, 33>>,
nodes: Vec<DiffTrieNode>,

values: Vec<u8>,
keys: Vec<u8>,
branch_node_children: Vec<[Option<NodePtr>; 16]>,

// scratchpad
walk_path: Vec<(usize, u8)>, // node index, nibble
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Trie struct uses an append-only arena pattern for values, keys, and branch_node_children. As nodes are inserted, updated, and deleted, old data in these vectors becomes unreachable but is never freed (only clear() resets everything). Over many blocks, this means unbounded memory growth within a single Trie instance.

Given this is on a hot path that persists across blocks (the StorageCalculator.trie and AccountTrieCalculator.trie survive between calculate_root_hash_with_sparse_trie calls), this could be a slow memory leak for long-running builders. Consider periodically compacting or tracking waste ratio. (This is also inherited from v2.)

Comment on lines +600 to +620
fn process_storage_tries_updates(&mut self) -> eyre::Result<bool> {
let all_changed_processed = Arc::new(Mutex::new(true));
self.changed_account
.write()
.par_iter_mut()
.map(|(address, status)| -> eyre::Result<()> {
// if account is done, just return
if !status.needs_processing() {
return Ok(());
}
if self.process_storage_tries_update(*address)? {
*status = StorageTrieStatus::InsertsProcessed;
} else {
*all_changed_processed.lock() = false;
}
Ok(())
})
.collect::<Result<(), _>>()?;
let res = *all_changed_processed.lock();
Ok(res)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_storage_tries_updates takes a write lock on changed_account and then calls par_iter_mut(). This holds a write lock for the duration of the parallel iteration — no other method can read changed_account until all parallel tasks complete. While this works functionally (no other thread accesses it concurrently at this point), it means the locking granularity is coarser than necessary. If the design changes to allow concurrent phases, this could become a bottleneck. Consider whether a lock-free approach (e.g., collecting the results and writing back) would be cleaner.

@bertmiller
Copy link
Member Author

@dvush if you would like me to address Claude's input I'll pass it to Codex, let me know either way.

@claude
Copy link
Contributor

claude bot commented Feb 28, 2026

Claude finished @bertmiller's task in 2m 41s —— View job


PR Review: v3 state-root implementation (updated)

  • Read repository guidelines
  • Gather context on latest changes (2 new commits since last review)
  • Review v3 core implementation (v3/mod.rs)
  • Review v3 trie implementation (v3/trie/mod.rs)
  • Review v3 supporting files (v3/fetch.rs, v3/trie/proof_store.rs, v3/trie/tests.rs)
  • Review harness/bin files and Cargo changes
  • Review lib.rs changes
  • Post inline comments for issues found
  • Post final review summary

What changed since last review

Two commits address feedback from the previous review:

  • bd9e3cf — Retry-loop exhaustion now returns SparseTrieError instead of panicking via assert!. Uses a named MAX_PROCESS_ITERS constant. This addresses issue chore(deps): bump built from 0.7.1 to 0.7.3 #1 from the prior review.
  • 6d9baf3 — Dev binaries gated behind dev-tools feature flag. This addresses the compile-time concern about heavy reth dependencies.

Remaining issues

Production stability — remaining panics on hot path:
Several panic!/assert!/.expect() calls remain inside fetch_missing_storage_nodes and fetch_missing_account_trie_nodes (lines 643-672, 925-947 of v3/mod.rs), as well as .expect() during hashing (lines 698, 959). These will crash the builder on unexpected state. See inline comment

Potential index-out-of-bounds panic (unchanged):
insert_nibble_slice at v3/trie/mod.rs:246ins_key[path_walked] has no bounds check when encountering a Branch node. See inline comment

Unsafe aliasing in parallel hashing (unchanged):
calculate_rlp_pointer_node creates &mut references from &self via raw pointer casts (v3/trie/mod.rs:794,804). While safe in practice due to disjoint indices, this is technically UB. Needs UnsafeCell or at minimum // SAFETY: documentation. See inline comment

TOCTOU race in ProofStore::add_proof (unchanged):
contains_key/insert pair on DashMap at v3/trie/proof_store.rs:59-97 is not atomic. See inline comment

Confusing type aliases (unchanged):
SharedCacheV3 = SharedCacheV2 at v3/mod.rs:31 where SharedCacheV2 is a local struct. See inline comment

V2 behavior change not documented:
The v2 changes silencing KeyNotFound errors during delete/revert (4 locations in v2/mod.rs) change previously-hard-error behavior to silent success. The PR description doesn't mention this. Worth confirming these are intentional — silently succeeding could mask trie state desync bugs.

Minor typos: "hahsing" (line 698) and "fethed" (line 1086) in v3/mod.rs.

Summary

The two new commits address the most critical prior feedback (panic on retry exhaustion, compile-time bloat from dev binaries). The remaining issues are lower severity but still worth addressing before merging to production, particularly the remaining panics/expects on the hot path and the unsafe aliasing pattern.

let ok = storage_calc.trie.try_add_proof_from_proof_store(&missing_node, &storage_calc.proof_store).expect("should be able to insert proofs from proof store when they are found (storage trie)");
assert!(ok, "proof is not added (storage trie)");
} else {
panic!("Missing node that was just fetched is not there (storage trie)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several panic! / assert! / .expect() calls remain on the hot path after the retry-loop fix. Specifically:

  • Lines 643–644, 669–670: expect()+assert! in fetch_missing_storage_nodes
  • Line 672: panic!("Missing node that was just fetched is not there (storage trie)")
  • Lines 925–926, 944–945: same pattern in fetch_missing_account_trie_nodes
  • Line 947: panic!("Missing node that was just fetched is not there (account trie)")
  • Line 698: .expect("missing node while hashing storage trie")
  • Line 959: .expect("failed to hash account trie")

These will crash the builder on any unexpected state. Consider converting them to return SparseTrieError instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior as v2

DiffTrieNode::Branch { children } => {
let children = *children;

let n = ins_key[path_walked] as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential panic: ins_key[path_walked] will panic with index-out-of-bounds if path_walked == ins_key.len() when encountering a Branch node (i.e., the key has been fully consumed but we hit a branch). This was flagged in the previous review and still applies. Consider adding a bounds check or returning an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior as v2

) {
self.rlp_encode_node(node_idx, rlp, proof_store);
let result =
unsafe { &mut *(self.rlp_ptrs_local.as_ptr().add(node_idx) as *mut ArrayVec<u8, 33>) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe aliasing concern from the previous review still applies. calculate_rlp_pointer_node creates &mut references from &self via raw pointer casts (lines 794, 804). While the tree structure ensures disjoint index access in practice, this violates Rust's aliasing rules. Consider using UnsafeCell wrappers, or at minimum add // SAFETY: comments documenting the disjointness invariant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior as v2

key: Nibbles,
proof: Vec<(Nibbles, P)>,
) -> Result<(), alloy_rlp::Error> {
if self.proofs.contains_key(&key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU race still present: contains_key (line 59) and insert (line 97) are not atomic on the DashMap. Two threads with the same key can both pass the check and both decode/insert, appending duplicate entries to keys/values/rlp_ptrs. Consider using self.proofs.entry(key).or_try_insert_with(|| { ... }) to make this atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior as v2

const PARALLEL_HASHING_STORAGE_NODES: bool = true;
const MAX_PROCESS_ITERS: usize = 10;

pub type SharedCacheV3 = SharedCacheV2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aliases are still confusing. SharedCacheV3 = SharedCacheV2 where SharedCacheV2 is a local struct in the v3 module (not the v2 module's type). Consider renaming the local struct directly to SharedCacheV3 / RootHashCalculatorV3 to avoid the indirection.

let hash = storage_calc
.trie
.root_hash(PARALLEL_HASHING_STORAGE_NODES, &storage_calc.proof_store)
.expect("missing node while hahsing storage trie");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "hahsing""hashing"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as v2, but happy to fix if we want

for proof_target in proof_targets {
if !proofs.contains_key(proof_target) {
return Err(SparseTrieError::Other(eyre::eyre!(
"Proof was not fethed correctly"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "fethed""fetched"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as v2, but happy to fix if we want

@bertmiller
Copy link
Member Author

bertmiller commented Mar 1, 2026

I think the remaining Claude comments are things that were copied over from v2 (except the small changes in v2, which I can revert if you want. not sure why it did that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant