-
Notifications
You must be signed in to change notification settings - Fork 31
sink: modify the write order of the index file and data file in storage writer #3785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test all |
Summary of ChangesHello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the cloud storage writing mechanism by changing the order in which data and index files are persisted. The primary goal is to enhance the consistency of data written to cloud storage by ensuring that the index file, which references a data file, is only created or updated after the data file itself has been successfully written. This prevents potential issues with orphaned or invalid index entries. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the file writing logic in the cloud storage sink by changing the order of operations to write the data file before the index file. This is a good improvement for atomicity and reliability, as it prevents a state where an index file points to a non-existent data file. The change also correctly handles errors from writing the index file, which was a bug in the previous implementation. I have one suggestion to further improve error handling by cleaning up orphaned data files on failure, making the process even more robust.
|
/test pull-cdc-storage-integration-heavy |
|
pull-cdc-storage-integration-light |
| zap.Error(err)) | ||
| } | ||
|
|
||
| // then write the data file to external storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also update the log
| d.metricFileCount.Add(1) | ||
|
|
||
| // write the index file to external storage in the end. | ||
| // the file content is simply the last element of the data file path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // the file content is simply the last element of the data file path | |
| // the file content is simply the last data file path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add some comment to describe how the file name determined, and why do so.
[LGTM Timeline notifier]Timeline:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongyunyan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: close #3784
What is changed and how it works?
Index File Write Order: The process of writing the index file has been reordered to occur after the corresponding data file is successfully written, ensuring better consistency in cloud storage.
writeDataFile Signature Update: The writeDataFile function now accepts both the data file path and the index file path as distinct arguments, improving clarity and enabling the reordered write logic.
After this PR, the data file will be written before the index file. If the CDC restarts:
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note