Skip to content

perf: avoid Rc refcount overhead in DomSlot chain traversal#4088

Open
Madoshakalaka wants to merge 2 commits intomasterfrom
perf/domslot-path-compression
Open

perf: avoid Rc refcount overhead in DomSlot chain traversal#4088
Madoshakalaka wants to merge 2 commits intomasterfrom
perf/domslot-path-compression

Conversation

@Madoshakalaka
Copy link
Copy Markdown
Member

This addresses a TODO left by @WorldSEnder

with_next_sibling is called on every DomSlot::insert, which backs every DOM node insertion.
For a BList of N component children processed right-to-left, the total chain hops across all
insertions is 1 + 2 + ... + N = O(N^2). Reducing the constant factor per hop directly improves
large-list creation and reconciliation.

Traverse the DynamicDomSlot chain using raw pointers instead of
Rc::clone/drop per hop. The chain is transitively kept alive by
the borrowed &self, so raw pointer access is sound.
@Madoshakalaka Madoshakalaka added performance A-yew Area: The main yew crate labels Mar 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Visit the preview URL for this PR (updated for commit c2baa85):

https://yew-rs-api--pr4088-perf-domslot-path-co-hpigyof4.web.app

(expires Mon, 06 Apr 2026 08:36:23 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Benchmark - core

Yew Master

vnode           fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ vnode_clone  2.127 ns      │ 3.386 ns      │ 2.131 ns      │ 2.407 ns      │ 100     │ 1000000000

Pull Request

vnode           fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ vnode_clone  2.094 ns      │ 3.342 ns      │ 2.097 ns      │ 2.126 ns      │ 100     │ 1000000000

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Size Comparison

Details
examples master (KB) pull request (KB) diff (KB) diff (%)
async_clock 100.818 100.731 -0.087 -0.086%
boids 168.448 168.388 -0.061 -0.036%
communication_child_to_parent 94.076 94.019 -0.058 -0.061%
communication_grandchild_with_grandparent 105.914 105.854 -0.061 -0.057%
communication_grandparent_to_grandchild 102.257 102.196 -0.061 -0.059%
communication_parent_to_child 91.486 91.429 -0.058 -0.063%
contexts 105.977 105.916 -0.061 -0.057%
counter 86.798 86.711 -0.087 -0.100%
counter_functional 88.834 88.753 -0.081 -0.091%
dyn_create_destroy_apps 90.713 90.626 -0.087 -0.096%
file_upload 99.812 99.725 -0.087 -0.087%
function_delayed_input 94.807 94.720 -0.087 -0.092%
function_memory_game 173.664 173.604 -0.061 -0.035%
function_router 395.375 395.314 -0.061 -0.015%
function_todomvc 164.955 164.895 -0.061 -0.037%
futures 235.551 235.464 -0.087 -0.037%
game_of_life 105.099 105.012 -0.087 -0.083%
immutable 259.314 259.255 -0.060 -0.023%
inner_html 81.341 81.260 -0.081 -0.100%
js_callback 109.957 109.896 -0.061 -0.055%
keyed_list 180.407 180.350 -0.058 -0.032%
mount_point 84.714 84.627 -0.087 -0.103%
nested_list 113.660 113.600 -0.061 -0.053%
node_refs 92.087 92.029 -0.058 -0.063%
password_strength 1719.252 1719.191 -0.061 -0.004%
portals 93.560 93.502 -0.058 -0.062%
router 366.017 365.956 -0.061 -0.017%
suspense 113.962 113.901 -0.061 -0.053%
timer 88.943 88.856 -0.087 -0.098%
timer_functional 99.368 99.308 -0.061 -0.061%
todomvc 142.661 142.574 -0.087 -0.061%
two_apps 86.711 86.624 -0.087 -0.100%
web_worker_fib 136.457 136.396 -0.061 -0.044%
web_worker_prime 187.640 187.579 -0.061 -0.032%
webgl 83.486 83.399 -0.087 -0.104%

✅ None of the examples has changed their size significantly.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Benchmark - SSR

Yew Master

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 311.040 311.247 311.130 0.070
Hello World 10 482.181 508.529 490.923 11.292
Function Router 10 33026.191 33405.725 33224.100 118.691
Concurrent Task 10 1005.324 1008.040 1007.043 0.871
Many Providers 10 1056.733 1103.552 1083.722 13.642

Pull Request

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 310.955 311.758 311.248 0.274
Hello World 10 476.781 487.145 479.195 2.959
Function Router 10 37761.665 45142.578 42239.342 2428.105
Concurrent Task 10 1006.418 1007.650 1007.174 0.413
Many Providers 10 1071.065 1106.065 1089.853 10.287

@Madoshakalaka
Copy link
Copy Markdown
Member Author

benchmarks looking good 0454fc7#commitcomment-180899917

@Madoshakalaka Madoshakalaka marked this pull request as ready for review March 29, 2026 13:14
github-actions[bot]
github-actions bot previously approved these changes Mar 29, 2026
@Madoshakalaka
Copy link
Copy Markdown
Member Author

Madoshakalaka commented Mar 29, 2026

To clarify, it helps with the performance but doesn't actually address the original comment.

The 𝒪(𝑁²) comes from the chain structure itself: child[0] chains through 𝑁-1 hops, child[1] through 𝑁-2, etc. To eliminate this, BList would need to resolve insertion positions without chaining, for example by maintaining a side structure that maps each child index to its resolved DOM node reference, updated eagerly when positions change. That will be a larger refactor of how BList and NodeWriter interact with DomSlot.

I explored a possibility in #4090 briefly but couldn't make it work correctly

@WorldSEnder
Copy link
Copy Markdown
Member

WorldSEnder commented Mar 29, 2026

What you would need is a (self-balancing) tree structure, such as an AVL or Splay (not sure what the best here is - depends on usage I suppose). You need to be able to split and merge the tree (when you reassign a node in the middle) and find the right-most node starting from any node in the middle. Worst case could be O(log n) and most likely even close to O(1) if you splay correctly. I never did get it to work either though when I tried. Something about the intrusive nature of parent pointers in trees and the balancing operations needed and I didn't find an existing impl that fit to not have to invent it here.

EDIT: "worst case" should probably be "amortized" in the above, a true online worst case structure would be more complicated, see e.g. Scapegoat tree or Multi-Splay tree. It really does get hairy!

let cell = unsafe { &*ptr };
let slot_ref = cell.borrow();
match &slot_ref.variant {
DomSlotVariant::Node(ref n) => break f(n.as_ref()),
Copy link
Copy Markdown
Member

@WorldSEnder WorldSEnder Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the arbitrary function f invalidates the safety claim made above, since this f can indeed - in theory - access any node on the chain by capturing a reference to any of the Rc's therein.

Don't get me wrong, I don't immediately see any code doing this in this file. In practice, we do call gloo::console::error - which can call out to javascript - we run code of a tracing subscriber, we might panic and we might run some drop glue that I'm unaware of. If not now, then perhaps after a refactor. In any case the current API here is NOT safe.

The traversal is fine, but I think in general the refcount increase of the last Rc - which needs to exist and contain a borrowed cell during the call to f - needs to happen. Or, you could clone the Node and call by value.

@WorldSEnder
Copy link
Copy Markdown
Member

WorldSEnder commented Mar 30, 2026

Actually, the current structure might not actually form a simple chain, but a tree. You can tell multiple DynamicDomSlots to track the same DomSlot, which can be another DynamicDomSlot etc forming a tree (although in code you go from child to parent). I'm not sure if this capability is used, or if it's simply accidental because it's not so easy to break the assignment of any other DynamicDomSlot that might be assigned to it. In this general form one would need to form a link/cut tree, which I don't fully understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-yew Area: The main yew crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants