Skip to content

DAOS-18541 rebuild: accumulate more OIDs per migrate RPC to reduce RPC count#17704

Open
wangshilong wants to merge 1 commit intomasterfrom
shilongw/DAOS-18541
Open

DAOS-18541 rebuild: accumulate more OIDs per migrate RPC to reduce RPC count#17704
wangshilong wants to merge 1 commit intomasterfrom
shilongw/DAOS-18541

Conversation

@wangshilong
Copy link
Contributor

@wangshilong wangshilong commented Mar 14, 2026

Fix yield-count accounting in the scanner: rebuild_object() is a pure in-memory btree insert and does not need to contribute yield pressure. A send-side batching policy is also introduced: the send ULT defers flushing until at least REBUILD_SEND_BATCH_MIN OIDs are queued or REBUILD_SEND_BATCH_TIMEOUT_SEC seconds have elapsed.

Without batching, a fast scanner floods the destination rank with many small RPCs, exhausting IB receive buffers and triggering timeouts. This is especially severe during reintegration, where all OIDs are concentrated on a single target rank.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Fix yield-count accounting in the scanner: rebuild_object() is a pure in-memory
btree insert and does not need to contribute yield pressure. A send-side batching
policy is also introduced: the send ULT defers flushing until at least REBUILD_SEND_BATCH_MIN
OIDs are queued or REBUILD_SEND_BATCH_TIMEOUT_SEC seconds have elapsed, preventing a flood
of small migrate RPCs when the scanner runs faster than the sender — particularly
under reintegration workloads.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@wangshilong wangshilong requested review from a team as code owners March 14, 2026 15:27
@github-actions
Copy link

Ticket title is 'Rebuild stuck on Bear cluster'
Status is 'In Progress'
Labels: 'test_2.8'
https://daosio.atlassian.net/browse/DAOS-18541

@wangshilong wangshilong changed the title DAOS-18541 rebuild: reduce redundant migration OID RPCs DAOS-18541 rebuild: batch migration OID send RPCs Mar 14, 2026
@wangshilong wangshilong changed the title DAOS-18541 rebuild: batch migration OID send RPCs DAOS-18541 rebuild: increase migration OID batch size to reduce RPC flood Mar 15, 2026
@wangshilong wangshilong changed the title DAOS-18541 rebuild: increase migration OID batch size to reduce RPC flood DAOS-18541 rebuild: accumulate more OIDs per migrate RPC to reduce RPC count Mar 15, 2026
if (rc)
D_GOTO(out, rc);

arg->yield_cnt--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought rebuild_object() is a btree_insert()(it is a pure memory operations) probably ok not acccounting for yield, could add it back.

if (dbtree_is_empty(tls->rebuild_tree_hdl)) {
tree_empty = dbtree_is_empty(tls->rebuild_tree_hdl);
scan_done = tls->rebuild_pool_scan_done;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "now = daos_gettime_coarse()" at here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

if (tree_empty) {
/* Reset wait clock and yield to let scan make progress. */
tls->rebuild_send_wait_start = daos_gettime_coarse();
dss_sleep(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to change this to dss_sleep(10) as well, make it consistent with the code below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

/* Minimum pending objects before the send ULT flushes a batch (25% of max). */
#define REBUILD_SEND_BATCH_MIN (REBUILD_SEND_LIMIT / 4)
/* Maximum seconds to wait for a batch to fill before flushing anyway. */
#define REBUILD_SEND_BATCH_TIMEOUT_SEC 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably change this to 1 seconds is OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants