Skip to content

Conversation

@tenfyzhong
Copy link
Collaborator

@tenfyzhong tenfyzhong commented Dec 24, 2025

What problem does this PR solve?

This PR addresses a timing issue in the Kafka integration test where the changefeed is paused too quickly after creation, potentially before it has fully initialized and started processing data. This could lead to flaky test behavior where the changefeed might not capture all expected events before being paused.

Issue Number: close #3781

What is changed and how it works?

The change increases the sleep duration from 5 seconds to 20 seconds between creating the changefeed and pausing it in the Kafka integration test.

The longer wait time helps prevent race conditions where the test might pause the changefeed before it has properly started, which could cause inconsistent test results.

Check List

Tests

  • Integration test (modified existing test script)

Questions

  • Will it cause performance regression or break compatibility?
    No, this change only affects test timing and does not impact production code or performance.

  • Do you need to update user documentation, design documentation or monitoring documentation?
    No, this is an internal test modification only.

Release note

None

- Changed sleep from 5 to 20 seconds before pausing changefeed
- Provides more time for data synchronization in integration test

Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Dec 24, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @tenfyzhong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a persistent flakiness in the Kafka integration test by adjusting the timing of operations. Specifically, it extends the waiting period between the creation of a changefeed and its subsequent pausing. This modification ensures that the changefeed has ample time to properly initialize and begin processing data, thereby preventing premature pausing that could lead to inconsistent or failed test results. The change is isolated to the test environment and has no impact on the production system.

Highlights

  • Test Stability Improvement: Increased the sleep duration in the Kafka integration test to prevent a timing issue where the changefeed was paused too quickly after creation.
  • Race Condition Mitigation: The extended wait time helps avoid race conditions, ensuring the changefeed fully initializes and processes data before being paused, thus reducing test flakiness.
  • No Production Impact: This change is confined to test timing and does not affect production code, performance, or compatibility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 24, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a flaky test by increasing a sleep duration. While this might solve the immediate problem, a more robust solution is to poll for the expected state instead of using a fixed wait time. My review includes a suggestion to implement such a polling mechanism, which would make the test more reliable and potentially faster.

run_sql_file $CUR/data/ddl.sql ${UP_TIDB_HOST} ${UP_TIDB_PORT}

sleep 5
sleep 20

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed sleep can make tests brittle and slow. A more robust approach is to poll for the changefeed to reach the desired state (normal) before proceeding. This ensures the test waits only as long as necessary and provides clearer feedback if the changefeed fails to start in a reasonable time.

Consider replacing the sleep with a loop that checks the changefeed's status.

Suggested change
sleep 20
# Wait for changefeed to be ready
for i in $(seq 1 20); do
# The command may fail if the changefeed is not yet initialized in the owner.
# Grep for `"state": "normal"` to ensure the changefeed is running.
if cdc_cli_changefeed query -c ${changefeed_id} 2>/dev/null | grep -q '"state": "normal"'; then
echo "Changefeed is normal."
break
fi
if [ "$i" -eq 20 ]; then
echo "Time out waiting for changefeed to be normal."
cdc_cli_changefeed query -c ${changefeed_id}
exit 1
fi
echo "Waiting for changefeed to be normal... ($i/20)"
sleep 1
done

@tenfyzhong
Copy link
Collaborator Author

/test all

@tenfyzhong
Copy link
Collaborator Author

/test next-gen

@tenfyzhong
Copy link
Collaborator Author

/retest

@tenfyzhong
Copy link
Collaborator Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 24, 2025
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 24, 2025
@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 25, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongyunyan, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [hongyunyan,wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 25, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-24 08:50:17.472493747 +0000 UTC m=+2240562.286271319: ☑️ agreed by wk989898.
  • 2025-12-25 03:31:44.197451019 +0000 UTC m=+2307849.011228590: ☑️ agreed by hongyunyan.

@wk989898 wk989898 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 25, 2025
@wk989898
Copy link
Collaborator

/retest

@tenfyzhong
Copy link
Collaborator Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 25, 2025
@tenfyzhong
Copy link
Collaborator Author

/retest

@tenfyzhong
Copy link
Collaborator Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 25, 2025
@tenfyzhong
Copy link
Collaborator Author

/tide

@ti-chi-bot ti-chi-bot bot merged commit 2030532 into pingcap:master Dec 25, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unstable integration test: kafka_simple_handle_key_only

3 participants