-
Notifications
You must be signed in to change notification settings - Fork 31
[WIP]: support batch dml with virtual column #3787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces crucial enhancements to support batch DML operations on tables containing virtual columns. The changes primarily involve adjusting the internal logic that determines when batch DML can be utilized, specifically by removing the restriction imposed by virtual columns. Furthermore, the handling of generated columns during the construction of SQL statements has been refined to correctly exclude them, ensuring compatibility and preventing errors, thereby enabling more efficient data synchronization. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/test pull-cdc-mysql-integration-heavy |
|
/test pull-cdc-mysql-integration-light |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to add support for batch DML with virtual columns. The main changes involve removing the restriction on batching for tables with virtual columns, and ensuring that generated columns are skipped when preparing DML arguments. My review has found a couple of critical issues in pkg/sink/sqlmodel/row_change.go that could lead to panics. There's an inconsistency in how generated columns are handled between ColumnCount and getArgsWithGeneratedColumn, and a bug in whereColumnsAndValues where column names are filtered but their corresponding values are not. I've also pointed out an outdated comment and a misleading test case name that should be addressed for clarity. Overall, the direction is good, but the implementation needs to be more careful about consistency and correctness when handling generated columns.
pkg/sink/sqlmodel/row_change.go
Outdated
| c := 0 | ||
| for _, col := range r.sourceTableInfo.GetColumns() { | ||
| if !col.Hidden { | ||
| if !col.Hidden && !col.IsVirtualGenerated() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method IsVirtualGenerated does not seem to exist on *model.ColumnInfo. This will likely cause a compilation error.
Also, there's a potential inconsistency. This function uses IsVirtualGenerated (assuming it checks for virtual generated columns), but getArgsWithGeneratedColumn in pkg/sink/mysql/sql_builder.go uses IsGenerated(), which checks for both virtual and stored generated columns. This can lead to a mismatch between the number of columns counted here and the number of arguments prepared, causing a panic in NewRowChange if the table has stored generated columns.
To handle all generated columns consistently, you should probably use IsGenerated() here.
| if !col.Hidden && !col.IsVirtualGenerated() { | |
| if !col.Hidden && !col.IsGenerated() { |
pkg/sink/sqlmodel/row_change.go
Outdated
| if !column.IsGenerated() { | ||
| columnNames = append(columnNames, column.Name.O) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a bug here. You are filtering out generated columns when building columnNames, but the corresponding values in the values slice are not filtered. If columns contains any generated columns (for example, if a unique index used for the WHERE clause contains a generated column), len(columnNames) will be different from len(values). This will trigger the panic you've added on line 248. You should filter values in sync with columnNames.
| // 4. There's more than one row in the group | ||
| // 5. All events have the same safe mode status | ||
| func (w *Writer) shouldGenBatchSQL(hasPKOrNotNullUK bool, hasVirtualCols bool, events []*commonEvent.DMLEvent) bool { | ||
| func (w *Writer) shouldGenBatchSQL(hasPKOrNotNullUK bool, events []*commonEvent.DMLEvent) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| t.Run(tt.name, func(t *testing.T) { | ||
| t.Parallel() | ||
| got := writer.shouldGenBatchSQL(tt.hasPK, tt.hasVirtualCols, tt.events) | ||
| got := writer.shouldGenBatchSQL(tt.hasPK, tt.events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case "table with virtual columns should not use batch SQL" (lines 52-58) is now misleading. The shouldGenBatchSQL function no longer checks for virtual columns. This test case passes only because it uses a single-row event.
Please consider updating this test case to either:
- Rename it to reflect that it's testing single-row events.
- Create a new test case that specifically verifies that tables with virtual columns can be batched when they have multiple rows.
Also, the hasVirtualCols field in the test struct (line 37) is no longer used and can be removed.
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
/test all |
|
/retest |
|
@wk989898: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note