eventservice,schemastore: avoid stalls after dispatcher reset#5554
eventservice,schemastore: avoid stalls after dispatcher reset#5554asddongmen wants to merge 1 commit into
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request cleans up stale schema-store GC services during initialization to prevent start-ts safety check failures, and triggers a scan task upon resetting a dispatcher in the event broker. Feedback on the event broker changes highlights a potential stall risk where calling c.pushTask(newStat, false) synchronously could drop the task if the channel is full; running this in a goroutine with force = true is recommended to guarantee execution without blocking.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if c.scanReady(newStat) { | ||
| c.pushTask(newStat, false) | ||
| } |
There was a problem hiding this comment.
Potential Stall / Deadlock Risk: Calling c.pushTask(newStat, false) synchronously on the message-processing goroutine uses force = false to avoid blocking. However, if taskChan is full, the task will be dropped after a 10ms timeout.\n\nIf the task is dropped and there are no subsequent notifications from the event store (e.g., because the upstream is quiet), the dispatcher will stall permanently because the initial scan is never triggered.\n\nSolution: Run c.pushTask in a lightweight goroutine with force = true (i.e., go c.pushTask(newStat, true)). This avoids blocking the main message-processing loop (preventing deadlocks) while guaranteeing that the scan task is eventually queued and executed, completely eliminating the stall risk.
| if c.scanReady(newStat) { | |
| c.pushTask(newStat, false) | |
| } | |
| if c.scanReady(newStat) { | |
| go c.pushTask(newStat, true) | |
| } |
What problem does this PR solve?
Issue Number: close #5553
Two field stalls are addressed: schema store bootstrap can be blocked by a stale schema-store GC keeper service left by a failed CDC process with the same advertise address, and a reset dispatcher can wait for the next eventstore notify before sending its handshake.
What is changed and how it works?
The schema store now closes any stale keeper service before reading the initial GC safe point and installing its fresh barrier.
After
eventBroker.resetDispatcherreplaces the dispatcher state, it immediately checks scan readiness and pushes one scan task so ready/handshake/resolved messages do not depend on a later notify.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No. The cleanup is limited to the schema-store keeper service for the current advertise address, and reset only schedules the same scan path earlier.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note