downstreamadapter: shard dispatcher orchestrator queue by hongyunyan · Pull Request #5052 · pingcap/ticdc

hongyunyan · 2026-05-15T04:53:29Z

What problem does this PR solve?

During TiCDC bootstrap recovery, DispatcherOrchestrator currently processes all changefeed control messages through one global pending queue and one worker. When a bootstrap request blocks on sink initialization, unrelated changefeeds on the same node queue behind it and recovery lag grows quickly.

This change keeps the existing bootstrap and retry semantics intact, but removes the global head-of-line blocking at the orchestrator queue layer.

Issue Number: ref #0

What is changed and how it works?

Split the single global pending queue into 8 fixed shards keyed by changefeedID.Id.Hash(...).
Keep one worker and the existing FIFO plus (changefeedID, msgType) de-duplication semantics inside each shard.
Keep the existing handler implementations and shutdown ordering, so the behavioral change is limited to cross-shard concurrency.
Add unit tests for shard shutdown behavior and for routing different changefeeds to different shards concurrently.

Check List

Tests

Unit test
- SDKROOT=$(xcrun --show-sdk-path) CC=$(xcrun --find clang) CXX=$(xcrun --find clang++) CGO_CFLAGS="-isysroot $(xcrun --show-sdk-path)" CGO_LDFLAGS="-isysroot $(xcrun --show-sdk-path) -L$(xcrun --show-sdk-path)/usr/lib -F$(xcrun --show-sdk-path)/System/Library/Frameworks" go test ./downstreamadapter/dispatcherorchestrator

Questions

Will it cause performance regression or break compatibility?

No compatibility change is expected. The change only increases concurrency across shards; per-shard processing order and message semantics stay the same.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Reduce TiCDC dispatcher orchestrator bootstrap head-of-line blocking by sharding its control-message queue.

ti-chi-bot · 2026-05-15T04:53:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-15T04:53:36Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3f3adae9-2e58-4b55-8096-4e799c5eed64

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ti-chi-bot · 2026-05-15T04:53:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lidezhu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This reverts commit 49a4ecb.

This reverts commit fb83ebc.

This reverts commit 184413e.

gemini-code-assist

Code Review

This pull request implements sharding within the DispatcherOrchestrator to allow concurrent processing of changefeed control messages, partitioned by changefeed ID. It also enhances the MySQL sink's metadata management by introducing robust error handling and recovery logic for the ddl_ts_v1 table. Feedback suggests optimizing the shutdown process by closing all shard queues before waiting for workers to terminate and ensuring the shard Run method is idempotent to avoid starting multiple worker goroutines.

gemini-code-assist · 2026-05-15T04:58:23Z

+	for _, shard := range m.shards {
+		shard.Close()
+	}


Closing shards sequentially in a loop can lead to a cumulative shutdown delay if multiple shards are processing long-running bootstrap or sink initialization requests. Consider closing all shard queues first to signal termination, and then waiting for all shard workers to finish in parallel.

Suggested change

for _, shard := range m.shards {

shard.Close()

}

for _, shard := range m.shards {

shard.queue.Close()

}

for _, shard := range m.shards {

shard.wg.Wait()

}

gemini-code-assist · 2026-05-15T04:58:23Z

+}
+
+// Run starts the shard worker loop.
+func (s *orchestratorShard) Run() {


The Run method is not idempotent. If called multiple times, it will start multiple worker goroutines for the same shard and incorrectly increment the WaitGroup counter. Consider adding a guard (e.g., using sync.Once or an atomic.Bool) to ensure the worker loop starts only once.

ti-chi-bot · 2026-05-15T06:48:03Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

hongyunyan added 4 commits May 7, 2026 11:18

mysql: tolerate missing ddl ts metadata

184413e

mysql: address ddl ts review comments

fb83ebc

mysql: document ddl ts fallback actions

49a4ecb

downstreamadapter: shard dispatcher orchestrator queue

3a53e04

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 15, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2026

hongyunyan added 3 commits May 15, 2026 12:58

Revert "mysql: document ddl ts fallback actions"

b07a9b1

This reverts commit 49a4ecb.

Revert "mysql: address ddl ts review comments"

2d9aa36

This reverts commit fb83ebc.

Revert "mysql: tolerate missing ddl ts metadata"

94d2f74

This reverts commit 184413e.

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 15, 2026

downstreamadapter: address shard review comments

67b7413

hongyunyan changed the title ~~[codex] downstreamadapter: shard dispatcher orchestrator queue~~ downstreamadapter: shard dispatcher orchestrator queue May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downstreamadapter: shard dispatcher orchestrator queue#5052

downstreamadapter: shard dispatcher orchestrator queue#5052
hongyunyan wants to merge 8 commits into
pingcap:masterfrom
hongyunyan:codex/dispatcher-orchestrator-shards

hongyunyan commented May 15, 2026

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Review skipped

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hongyunyan commented May 15, 2026

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 15, 2026 •

edited

Loading