Skip to content

Conversation

@wk989898
Copy link
Collaborator

@wk989898 wk989898 commented Dec 24, 2025

What problem does this PR solve?

Issue Number: close #3784

What is changed and how it works?

Index File Write Order: The process of writing the index file has been reordered to occur after the corresponding data file is successfully written, ensuring better consistency in cloud storage.
writeDataFile Signature Update: The writeDataFile function now accepts both the data file path and the index file path as distinct arguments, improving clarity and enabling the reordered write logic.

After this PR, the data file will be written before the index file. If the CDC restarts:

  1. If the index file is written, the data may be duplicated to write a new data file.
  2. If the index file is not written, the data file will be overwritten.

Check List

Tests

  • Unit test
  • Integration test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

 `None`.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 24, 2025
@wk989898
Copy link
Collaborator Author

/test all

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 24, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the cloud storage writing mechanism by changing the order in which data and index files are persisted. The primary goal is to enhance the consistency of data written to cloud storage by ensuring that the index file, which references a data file, is only created or updated after the data file itself has been successfully written. This prevents potential issues with orphaned or invalid index entries.

Highlights

  • Index File Write Order: The process of writing the index file has been reordered to occur after the corresponding data file is successfully written, ensuring better consistency in cloud storage.
  • writeDataFile Signature Update: The writeDataFile function now accepts both the data file path and the index file path as distinct arguments, improving clarity and enabling the reordered write logic.
  • Enhanced Write Consistency: By ensuring the index file is only written after the data file, the change prevents scenarios where an index might point to a non-existent or incomplete data file in cloud storage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the file writing logic in the cloud storage sink by changing the order of operations to write the data file before the index file. This is a good improvement for atomicity and reliability, as it prevents a state where an index file points to a non-existent data file. The change also correctly handles errors from writing the index file, which was a bug in the previous implementation. I have one suggestion to further improve error handling by cleaning up orphaned data files on failure, making the process even more robust.

@wk989898
Copy link
Collaborator Author

/test pull-cdc-storage-integration-heavy

@wk989898
Copy link
Collaborator Author

pull-cdc-storage-integration-light

@wk989898 wk989898 changed the title [WIP] sink: modify the write order of the index file and data file in storage writer Dec 24, 2025
@wk989898 wk989898 marked this pull request as ready for review December 24, 2025 13:22
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 24, 2025
zap.Error(err))
}

// then write the data file to external storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also update the log

d.metricFileCount.Add(1)

// write the index file to external storage in the end.
// the file content is simply the last element of the data file path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the file content is simply the last element of the data file path
// the file content is simply the last data file path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add some comment to describe how the file name determined, and why do so.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 25, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-25 05:23:17.120086604 +0000 UTC m=+2314541.933864166: ☑️ agreed by hongyunyan.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage consumer lost some dml events

4 participants