Skip to content

Conversation

@mariusmihaic
Copy link
Contributor

@mariusmihaic mariusmihaic commented Jan 14, 2026

cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Final Benchmark Comparison: TxListForSender Optimization

This report compares three implementations of txListForSender:

  1. Old (Linear List): The original container/list implementation.
  2. RBT (Red-Black Tree): The intermediate gods/trees/redblacktree implementation. from Improve AddTx in list for sender #7525
  3. New (Ordered List): The optimized slice-based implementation (orderedTransactionsList.go).

1. Execution Time (N=1000 Transactions per 1000 accounts)

Operation Scenario Old (Linear List) RBT (Red-Black Tree) New (Ordered List) Comparison vs Old Comparison vs RBT
Add Transaction Ordered (Common) ~ 3,000µs ~460 µs ~180µs ~16x Faster ~2.5x Faster
Add Transaction Reverse (Worst Case) ~2,800 µs ~460 µs ~144 µs ~19x Faster ~3.2x Faster
Add Transaction Random ~1,600 µs ~500 µs ~178 µs ~9x Faster ~2.8x Faster
Remove Block Processing ~14 µs ~58 µs ~9.6 µs ~1.5x Faster ~6x Faster
Eviction Size Constraint ~30 µs ~64 µs ~21 µs ~1.4x Faster ~3x Faster

Note: The New implementation is the fastest across all metrics.

First 3 operations include both AddTx + removals(like BenchmarkTxList_removeTransactionsWithHigherOrEqualNonce)
Last 2 operations include only removals/evictions (without adding txs)

2. Memory & Allocation Comparison (N=1000)

Metric Old (Linear List) RBT (Red-Black Tree) New (Ordered List) Conclusion
Allocations/Op ~1003 allocs ~1002005+ allocs ~14 allocs 98% reduction in allocations.
Bytes/Op ~48 KB ~1+ MB ~17 KB 65% reduction in memory churn.
  • Old/RBT: require allocating a separate node object on the heap for every single transaction inserted.
  • New: uses a contiguous slice. It only allocates when the slice capacity needs to grow (amortized sorting). This drastically reduces garbage collector (GC) pressure.

3. Algorithmic Complexity Comparison

Implementation Best Case (Ordered) Worst Case (Reverse) Random Insert Memory Overhead
Old (Linear List) O(1) (Append) O(N²) (Scan from back) O(N²) High (Node pointers)
RBT O(log N) O(log N) O(log N) High (Tree Nodes)
New (Ordered List) O(1) (Append) O(N) (Shift/Copy) O(N) Low (Contiguous)

Conclusion

The New Ordered List (Slice-based) implementation is superior because:

  1. Complexity: It avoids the worst-case of the Linked List while maintaining O(1) performance for the most common case (Ordered append). Although random insertion is O(N) due to shifting, copy() in Go is assembly-optimized and extremely fast for keeping N=5000 lists sorted, beating the O(log N) overhead of RBT.
  2. Memory: It creates ~50x fewer allocations than both the Linked List and RBT approaches, significantly reducing GC pause times.
  3. Cache Locality: Contiguous memory access (slice) is much friendlier to the CPU cache than pointer-chasing (linked list/tree).

@mariusmihaic mariusmihaic changed the title Mx 17423 optimize tx pool Optimize tx pool Jan 14, 2026
@mariusmihaic mariusmihaic self-assigned this Jan 14, 2026
@mariusmihaic mariusmihaic marked this pull request as ready for review January 14, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants