Skip to content

DAOS-0000 placement: Introduce O(1) fast path for massive GX objects#17667

Draft
wangshilong wants to merge 1 commit intomasterfrom
shilongw/optimize_gx
Draft

DAOS-0000 placement: Introduce O(1) fast path for massive GX objects#17667
wangshilong wants to merge 1 commit intomasterfrom
shilongw/optimize_gx

Conversation

@wangshilong
Copy link
Contributor

Problem:
The traditional jump hash placement algorithm suffers severe performance degradation when generating layouts for Object Classes with a massive number of groups (e.g., GX class). For instance, in a 500-node cluster using EC16P3GX (~16K targets), the group count (grp_nr) easily exceeds
800. During layout generation, the first ~30 groups quickly exhaust all domain usage bitsets (dom_used). The remaining ~770 groups consistently collide in d_hash_jump(), forcing the algorithm into slow O(D) fallback loops (dom_isset_2ranges). This generates tens of millions of inner-loop bitmap checks (O(GSD)), severely blocking CPU and causing unacceptable latency spikes during object creation and rebuild layout mapping.

Solution:
Introduce a fast path optimization specifically designed for objects with heavy group counts, guarded by a new pool layout version to guarantee strict backward compatibility.

  1. Bump DAOS_POOL_OBJ_VERSION to 3 (DAOS_POOL_OBJ_VERSION_3).
  2. In get_object_layout(), detect if layout_ver >= 3 and group counts are sufficiently large (jmop_grp_nr >= jmop_dom_nr * 1.5).
  3. Under these conditions, bypass the standard non-leaf hash-collision routines by heavily leaning into get_object_layout_gx_fast().
  4. The GX fast path uses an OID-seeded Fisher-Yates algorithm to pre-shuffle domains, dealing out targets to shards sequentially.

By reducing traditional collisions, the time complexity drops significantly from O(G * S * D) down to near O(D + G * S), eliminating the massive CPU stalls and providing an estimated 500x-3000x speedup for GX placement on large-scale clusters. Legacy pools (layout <= v2) naturally bypass this avoiding any unexpected layout shifts.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Problem:
The traditional jump hash placement algorithm suffers severe performance
degradation when generating layouts for Object Classes with a massive
number of groups (e.g., GX class). For instance, in a 500-node cluster
using EC16P3GX (~16K targets), the group count (grp_nr) easily exceeds
800. During layout generation, the first ~30 groups quickly exhaust all
domain usage bitsets (dom_used). The remaining ~770 groups consistently
collide in d_hash_jump(), forcing the algorithm into slow O(D) fallback
loops (dom_isset_2ranges). This generates tens of millions of inner-loop
bitmap checks (O(G*S*D)), severely blocking CPU and causing unacceptable
latency spikes during object creation and rebuild layout mapping.

Solution:
Introduce a fast path optimization specifically designed for objects with
heavy group counts, guarded by a new pool layout version to guarantee strict
backward compatibility.

1. Bump DAOS_POOL_OBJ_VERSION to 3 (DAOS_POOL_OBJ_VERSION_3).
2. In get_object_layout(), detect if layout_ver >= 3 and group counts
   are sufficiently large (jmop_grp_nr >= jmop_dom_nr * 1.5).
3. Under these conditions, bypass the standard non-leaf hash-collision
   routines by heavily leaning into get_object_layout_gx_fast().
4. The GX fast path uses an OID-seeded Fisher-Yates algorithm to pre-shuffle
   domains, dealing out targets to shards sequentially.

By reducing traditional collisions, the time complexity drops significantly
from O(G * S * D) down to near O(D + G * S), eliminating the massive CPU
stalls and providing an estimated 500x-3000x speedup for GX placement on
large-scale clusters. Legacy pools (layout <= v2) naturally bypass this
avoiding any unexpected layout shifts.

Signed-off-by: Wang Shilong  <shilong.wang@hpe.com>
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-0000

@daosbuild3
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants