logservice: add baseline scan task logs#5512
Conversation
Signed-off-by: hongyunyan <649330952@qq.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughSubscription registration now passes changefeed ID strings into the log puller subscription client, which stores the identifier on subscribed spans and includes it in subscription and region-scan logs. Event-store and schema-store callers, plus related tests and mocks, were updated for the new Subscribe signature. ChangesChangefeed ID propagation
Sequence Diagram(s)sequenceDiagram
participant eventStore
participant subClient
participant subscriptionClient
participant newSubscribedSpan
eventStore->>subClient: Subscribe(changefeedID.String(), subID, ...)
subClient->>subscriptionClient: Subscribe(changefeedID, subID, span, ...)
subscriptionClient->>newSubscribedSpan: newSubscribedSpan(changefeedID, subID, span, ...)
newSubscribedSpan-->>subscriptionClient: subscribedSpan.changefeedID
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces changefeedID tracking across the event store and subscription client to improve traceability. It also adds logging for enqueued region scan tasks. The reviewer pointed out that logging these tasks at the Info level could cause severe log flooding and disk I/O bottlenecks in large-scale environments, and suggested changing the log level to Debug.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| log.Info("cdc region scan task enqueued", | ||
| zap.String("changefeedID", region.subscribedSpan.changefeedID), | ||
| zap.Uint64("subscriptionID", uint64(region.subscribedSpan.subID)), | ||
| zap.Int64("tableID", region.subscribedSpan.span.TableID), | ||
| zap.Uint64("startTs", region.subscribedSpan.startTs), | ||
| zap.Uint64("regionID", region.verID.GetID()), | ||
| zap.Uint64("regionEpochVersion", region.verID.GetVer()), | ||
| zap.Uint64("regionEpochConfVer", region.verID.GetConfVer()), | ||
| zap.String("priority", taskTypeLogName(priority)), | ||
| zap.String("span", common.FormatTableSpan(®ion.span))) |
There was a problem hiding this comment.
Logging at Info level for every region scan task enqueued can lead to severe log flooding in large-scale production environments with hundreds of thousands of regions. This can cause significant disk I/O bottlenecks, high CPU overhead, and rapid disk space consumption.
Since this log is intended for baseline experiments and debugging, it should be logged at Debug level instead of Info level.
| log.Info("cdc region scan task enqueued", | |
| zap.String("changefeedID", region.subscribedSpan.changefeedID), | |
| zap.Uint64("subscriptionID", uint64(region.subscribedSpan.subID)), | |
| zap.Int64("tableID", region.subscribedSpan.span.TableID), | |
| zap.Uint64("startTs", region.subscribedSpan.startTs), | |
| zap.Uint64("regionID", region.verID.GetID()), | |
| zap.Uint64("regionEpochVersion", region.verID.GetVer()), | |
| zap.Uint64("regionEpochConfVer", region.verID.GetConfVer()), | |
| zap.String("priority", taskTypeLogName(priority)), | |
| zap.String("span", common.FormatTableSpan(®ion.span))) | |
| log.Debug("cdc region scan task enqueued", | |
| zap.String("changefeedID", region.subscribedSpan.changefeedID), | |
| zap.Uint64("subscriptionID", uint64(region.subscribedSpan.subID)), | |
| zap.Int64("tableID", region.subscribedSpan.span.TableID), | |
| zap.Uint64("startTs", region.subscribedSpan.startTs), | |
| zap.Uint64("regionID", region.verID.GetID()), | |
| zap.Uint64("regionEpochVersion", region.verID.GetVer()), | |
| zap.Uint64("regionEpochConfVer", region.verID.GetConfVer()), | |
| zap.String("priority", taskTypeLogName(priority)), | |
| zap.String("span", common.FormatTableSpan(®ion.span))) |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@logservice/logpuller/subscription_client.go`:
- Around line 832-841: The hot-path enqueue log in subscription_client.go should
not use Info for every region scan task because it creates high-volume,
high-cardinality noise on the success path. Update the logging around the region
enqueue code near the cdc region scan task message to use debug-level logging,
sampling, or an aggregated counter instead of unconditional Info, and keep the
existing context fields only if they are still needed at that lower verbosity.
Use the region enqueue logic and the log.Info call in the subscription client as
the place to make the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c4f7ae10-a91c-46cc-8e09-acb21fc7000b
📒 Files selected for processing (5)
logservice/eventstore/event_store.gologservice/eventstore/event_store_test.gologservice/logpuller/subscription_client.gologservice/logpuller/subscription_client_test.gologservice/schemastore/ddl_job_fetcher.go
| log.Info("cdc region scan task enqueued", | ||
| zap.String("changefeedID", region.subscribedSpan.changefeedID), | ||
| zap.Uint64("subscriptionID", uint64(region.subscribedSpan.subID)), | ||
| zap.Int64("tableID", region.subscribedSpan.span.TableID), | ||
| zap.Uint64("startTs", region.subscribedSpan.startTs), | ||
| zap.Uint64("regionID", region.verID.GetID()), | ||
| zap.Uint64("regionEpochVersion", region.verID.GetVer()), | ||
| zap.Uint64("regionEpochConfVer", region.verID.GetConfVer()), | ||
| zap.String("priority", taskTypeLogName(priority)), | ||
| zap.String("span", common.FormatTableSpan(®ion.span))) |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win
Avoid Info logging on every region scan enqueue.
Line 832 is on the success path for every region task enqueue. A baseline scan can enqueue one task per region, and retries/splits can enqueue more, so this adds very high-volume Info logs with high-cardinality fields (changefeedID, regionID, span). That turns an observability-only change into a hot-path CPU/disk cost and can drown out more useful signals. Please gate this behind debug/sampling or aggregate it instead. As per coding guidelines, "Logs are operational signals; see docs/agents/logging.md before adding, removing, or rewriting logs."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@logservice/logpuller/subscription_client.go` around lines 832 - 841, The
hot-path enqueue log in subscription_client.go should not use Info for every
region scan task because it creates high-volume, high-cardinality noise on the
success path. Update the logging around the region enqueue code near the cdc
region scan task message to use debug-level logging, sampling, or an aggregated
counter instead of unconditional Info, and keep the existing context fields only
if they are still needed at that lower verbosity. Use the region enqueue logic
and the log.Info call in the subscription client as the place to make the
change.
Source: Coding guidelines
What problem does this PR solve?
Issue Number: None
This PR provides a baseline-only build to verify whether an existing changefeed creates new incremental scan tasks while another old-start-ts changefeed is catching up. It is intentionally separate from the scan-priority implementation PR so the original behavior can be compared against the experiment.
What is changed and how it works?
changefeedIDintoSubscriptionClient.Subscribeand store it insubscribedSpanfor logging only.changefeedIDto the existingsubscribes span donelog.cdc region scan task enqueuedlog when a region scan task is pushed into the logpuller region task queue.changefeedID,subscriptionID,tableID,startTs,regionID, region epoch, taskpriority(highorlow), and span.This PR does not change task priority calculation, retry behavior, kvproto fields, or TiKV/CSE request behavior.
Check List
Tests
make fmtgo test ./logservice/logpuller ./logservice/eventstore ./logservice/schemastore -run '^$'Questions
Will it cause performance regression or break compatibility?
No compatibility impact. The change is observability-only. The added log is intentionally used for baseline experiments and should not be enabled for high-volume production diagnosis without considering log volume.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note
Summary by CodeRabbit