sink: add debezium-avro protocol (#5475)#5551
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
@wk989898 This PR has conflicts, I have hold it. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the debezium-avro protocol to support Confluent Avro encoding for Debezium, adding the corresponding encoder, decoder, configuration options, and integration tests. However, the changes contain numerous unresolved merge conflict markers across several files, including api/v2/changefeed.go, cmd/kafka-consumer/writer.go, pkg/sink/codec/common/config.go, and pkg/sink/codec/debezium/codec.go. Additionally, the reviewer noted a logic bug in writer.go that could lead to dropped fallback rows and missing variable definitions in codec.go that will cause compilation failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| <<<<<<< HEAD | ||
| ======= | ||
| func verifyTable4MQ( | ||
| replicaConfig *config.ReplicaConfig, | ||
| scheme string, | ||
| topic string, | ||
| protocol config.Protocol, | ||
| tableInfos []*common.TableInfo, | ||
| ) error { | ||
| if !config.IsMQScheme(scheme) { | ||
| return nil | ||
| } | ||
|
|
||
| isAvroLike := protocol == config.ProtocolAvro || protocol == config.ProtocolDebeziumAvro | ||
| eventRouter, err := eventrouter.NewEventRouter(replicaConfig.Sink, topic, config.IsPulsarScheme(scheme), isAvroLike) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if err = eventRouter.VerifyTables(tableInfos); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| selectors, err := columnselector.New(replicaConfig.Sink) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| return selectors.VerifyTables(tableInfos, eventRouter) | ||
| } | ||
|
|
||
| func verifyRouteConflict( | ||
| changefeedID common.ChangeFeedID, | ||
| eligibleTables []common.TableName, | ||
| ineligibleTables []common.TableName, | ||
| replicaCfg *config.ReplicaConfig, | ||
| ) error { | ||
| if len(eligibleTables)+len(ineligibleTables) == 0 || replicaCfg == nil || | ||
| replicaCfg.Sink == nil || len(replicaCfg.Sink.DispatchRules) == 0 { | ||
| return nil | ||
| } | ||
| if util.GetOrZero(replicaCfg.ForceReplicate) { | ||
| return routing.ValidateNoStaticRouteConflict( | ||
| changefeedID, | ||
| util.GetOrZero(replicaCfg.CaseSensitive), | ||
| replicaCfg.Sink.DispatchRules, | ||
| eligibleTables, | ||
| ineligibleTables, | ||
| ) | ||
| } | ||
| return routing.ValidateNoStaticRouteConflict( | ||
| changefeedID, | ||
| util.GetOrZero(replicaCfg.CaseSensitive), | ||
| replicaCfg.Sink.DispatchRules, | ||
| eligibleTables, | ||
| ) | ||
| } | ||
|
|
||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
Unresolved merge conflict markers (<<<<<<< HEAD, =======, >>>>>>>) are present in this file. Please resolve the conflict by removing the markers and keeping the correct functions.
func verifyTable4MQ(
replicaConfig *config.ReplicaConfig,
scheme string,
topic string,
protocol config.Protocol,
tableInfos []*common.TableInfo,
) error {
if !config.IsMQScheme(scheme) {
return nil
}
isAvroLike := protocol == config.ProtocolAvro || protocol == config.ProtocolDebeziumAvro
eventRouter, err := eventrouter.NewEventRouter(replicaConfig.Sink, topic, config.IsPulsarScheme(scheme), isAvroLike)
if err != nil {
return err
}
if err = eventRouter.VerifyTables(tableInfos); err != nil {
return err
}
selectors, err := columnselector.New(replicaConfig.Sink)
if err != nil {
return err
}
return selectors.VerifyTables(tableInfos, eventRouter)
}
func verifyRouteConflict(
changefeedID common.ChangeFeedID,
eligibleTables []common.TableName,
ineligibleTables []common.TableName,
replicaCfg *config.ReplicaConfig,
) error {
if len(eligibleTables)+len(ineligibleTables) == 0 || replicaCfg == nil ||
replicaCfg.Sink == nil || len(replicaCfg.Sink.DispatchRules) == 0 {
return nil
}
if util.GetOrZero(replicaCfg.ForceReplicate) {
return routing.ValidateNoStaticRouteConflict(
changefeedID,
util.GetOrZero(replicaCfg.CaseSensitive),
replicaCfg.Sink.DispatchRules,
eligibleTables,
ineligibleTables,
)
}
return routing.ValidateNoStaticRouteConflict(
changefeedID,
util.GetOrZero(replicaCfg.CaseSensitive),
replicaCfg.Sink.DispatchRules,
eligibleTables,
)
}| <<<<<<< HEAD | ||
| case config.ProtocolCanalJSON, config.ProtocolOpen, config.ProtocolAvro: | ||
| ======= | ||
| case config.ProtocolCanalJSON, config.ProtocolOpen, config.ProtocolAvro, config.ProtocolSimple, | ||
| config.ProtocolDebezium, config.ProtocolDebeziumAvro: | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
| <<<<<<< HEAD | ||
| group.Append(dml, false) | ||
| log.Info("DML event append to the group", | ||
| zap.Int32("partition", group.Partition), zap.Any("offset", offset), | ||
| zap.Uint64("commitTs", commitTs), zap.Uint64("highWatermark", group.HighWatermark), | ||
| zap.Uint64("appliedWatermark", group.AppliedWatermark), | ||
| zap.String("schema", schema), zap.String("table", table), zap.Int64("tableID", tableID), | ||
| zap.Stringer("eventType", dml.RowTypes[0])) | ||
| ======= | ||
| switch w.protocol { | ||
| case config.ProtocolSimple: | ||
| // simple protocol set the table id for all row message, it can be known which table the row message belongs to, | ||
| // also consider the table partition. | ||
| // open protocol set the partition table id if the table is partitioned. | ||
| // for normal table, the table id is generated by the fake table id generator by using schema and table name. | ||
| // so one event group for one normal table or one table partition, replayed messages can be ignored. | ||
| log.Warn("DML event fallback row, since less than the group high watermark, ignore it", | ||
| zap.Int32("partition", progress.partition), zap.Any("offset", offset), | ||
| zap.Uint64("commitTs", commitTs), zap.Uint64("highWatermark", group.HighWatermark), | ||
| zap.Any("partitionWatermark", progress.watermark), zap.Any("watermarkOffset", progress.watermarkOffset), | ||
| zap.String("schema", schema), zap.String("table", table), zap.Int64("tableID", tableID), | ||
| zap.Stringer("eventType", dml.RowTypes[0]), | ||
| // zap.Any("columns", row.Columns), zap.Any("preColumns", row.PreColumns), | ||
| zap.Any("protocol", w.protocol), zap.Bool("IsPartition", dml.TableInfo.TableName.IsPartition)) | ||
| case config.ProtocolCanalJSON, config.ProtocolOpen, config.ProtocolAvro, | ||
| config.ProtocolDebezium, config.ProtocolDebeziumAvro: | ||
| // for partition table, these protocols cannot assign physical table id to each dml message, | ||
| // we cannot distinguish whether it's a real fallback event or not, still append it. | ||
| if w.partitionTableAccessor.IsPartitionTable(schema, table) { | ||
| log.Warn("DML events fallback, but the table is a partition table, still append it", | ||
| zap.Int32("partition", group.Partition), zap.Any("offset", offset), | ||
| zap.Uint64("commitTs", commitTs), zap.Uint64("highWatermark", group.HighWatermark), | ||
| zap.String("schema", schema), zap.String("table", table), zap.Int64("tableID", tableID), | ||
| zap.Stringer("eventType", dml.RowTypes[0]), zap.Any("protocol", w.protocol)) | ||
| group.Append(dml, true) | ||
| return | ||
| } | ||
| log.Warn("DML event fallback row, since less than the group high watermark, ignore it", | ||
| zap.Int32("partition", progress.partition), zap.Any("offset", offset), | ||
| zap.Uint64("commitTs", commitTs), zap.Uint64("HighWatermark", group.HighWatermark), | ||
| zap.Any("partitionWatermark", progress.watermark), zap.Any("watermarkOffset", progress.watermarkOffset), | ||
| zap.String("schema", schema), zap.String("table", table), zap.Int64("tableID", tableID), | ||
| zap.Stringer("eventType", dml.RowTypes[0]), | ||
| // zap.Any("columns", row.Columns), zap.Any("preColumns", row.PreColumns), | ||
| zap.Any("protocol", w.protocol), zap.Bool("IsPartition", dml.TableInfo.TableName.IsPartition)) | ||
| default: | ||
| log.Panic("unknown protocol", zap.Any("protocol", w.protocol)) | ||
| } | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
Unresolved merge conflict markers are present here. Additionally, the switch statement from the cherry-picked commit re-introduces a bug where fallback rows are dropped for non-partition tables. HEAD's logic (which always appends fallback rows) should be kept to prevent data loss.
group.Append(dml, false)
log.Info("DML event append to the group",
zap.Int32("partition", group.Partition), zap.Any("offset", offset),
zap.Uint64("commitTs", commitTs), zap.Uint64("highWatermark", group.HighWatermark),
zap.Uint64("appliedWatermark", group.AppliedWatermark),
zap.String("schema", schema), zap.String("table", table), zap.Int64("tableID", tableID),
zap.Stringer("eventType", dml.RowTypes[0]))| <<<<<<< HEAD | ||
| !(c.Protocol == config.ProtocolCanalJSON || c.Protocol == config.ProtocolAvro || c.Protocol == config.ProtocolDebezium) { | ||
| ======= | ||
| (c.Protocol != config.ProtocolCanalJSON && c.Protocol != config.ProtocolAvro && | ||
| c.Protocol != config.ProtocolDebezium && c.Protocol != config.ProtocolDebeziumAvro) { | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
Unresolved merge conflict markers are present here. Please resolve the conflict by keeping the incoming side which includes ProtocolDebeziumAvro.
(c.Protocol != config.ProtocolCanalJSON && c.Protocol != config.ProtocolAvro &&
c.Protocol != config.ProtocolDebezium && c.Protocol != config.ProtocolDebeziumAvro) {| <<<<<<< HEAD | ||
| jWriter.WriteStringField("name", | ||
| fmt.Sprintf("%s.Key", getSchemaTopicName(c.clusterID, e.TableInfo.GetSchemaName(), e.TableInfo.GetTableName()))) | ||
| ======= | ||
| jWriter.WriteStringField("name", c.keySchemaName(schemaName, tableName)) | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
Unresolved merge conflict markers are present here. Please resolve the conflict by keeping the incoming side. Note that schemaName and tableName must be defined in this scope to avoid compilation errors.
schemaName := e.TableInfo.GetTargetSchemaName()
tableName := e.TableInfo.GetTargetTableName()
jWriter.WriteStringField("name", c.keySchemaName(schemaName, tableName))| <<<<<<< HEAD | ||
| jWriter.WriteStringField("snapshot", "false") | ||
| jWriter.WriteStringField("db", e.TableInfo.GetSchemaName()) | ||
| jWriter.WriteStringField("table", e.TableInfo.GetTableName()) | ||
| ======= | ||
| if c.isDebeziumAvro() { | ||
| jWriter.WriteNullField("snapshot") | ||
| } else { | ||
| jWriter.WriteStringField("snapshot", "false") | ||
| } | ||
| jWriter.WriteStringField("db", schemaName) | ||
| jWriter.WriteStringField("table", tableName) | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
There was a problem hiding this comment.
Unresolved merge conflict markers are present here. Please resolve the conflict by keeping the incoming side.
if c.isDebeziumAvro() {
jWriter.WriteNullField("snapshot")
} else {
jWriter.WriteStringField("snapshot", "false")
}
jWriter.WriteStringField("db", schemaName)
jWriter.WriteStringField("table", tableName)| <<<<<<< HEAD | ||
| jWriter.WriteStringField("name", | ||
| fmt.Sprintf("%s.Envelope", getSchemaTopicName(c.clusterID, e.TableInfo.GetSchemaName(), e.TableInfo.GetTableName()))) | ||
| ======= | ||
| jWriter.WriteStringField("name", c.envelopeSchemaName(schemaName, tableName)) | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
| <<<<<<< HEAD | ||
| jWriter.WriteStringField("name", | ||
| fmt.Sprintf("%s.Value", getSchemaTopicName(c.clusterID, e.TableInfo.GetSchemaName(), e.TableInfo.GetTableName()))) | ||
| ======= | ||
| jWriter.WriteStringField("name", c.valueSchemaName(schemaName, tableName)) | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
| <<<<<<< HEAD | ||
| jWriter.WriteStringField("name", | ||
| fmt.Sprintf("%s.Value", getSchemaTopicName(c.clusterID, e.TableInfo.GetSchemaName(), e.TableInfo.GetTableName()))) | ||
| ======= | ||
| jWriter.WriteStringField("name", c.valueSchemaName(schemaName, tableName)) | ||
| >>>>>>> d220ee9b8 (sink: add debezium-avro protocol (#5475)) |
This is an automated cherry-pick of #5475
What problem does this PR solve?
Issue Number: close #5476
What is changed and how it works?
This PR introduces a new Kafka sink protocol:
protocol=debezium-avro.The new protocol is intended for users who need both:
Before this change, TiCDC had two separate protocols that each solved only
one side of the requirement:
protocol=debeziumemits Debezium-style key/value messages, but themessages are JSON and the schema is embedded in each message.
protocol=avroemits Confluent Avro messages and registers schemas inSchema Registry, but the payload is TiCDC's flat row format rather than a
Debezium Envelope.
protocol=debezium-avrocombines these two capabilities. It reuses theDebezium codec's event model and field extraction logic, and reuses the
existing Avro Schema Registry registration and Confluent wire-format flow.
Users can enable it with a sink URI like:
Protocol and configuration
This PR adds ProtocolDebeziumAvro to the sink protocol enum and protocol
parser, so protocol=debezium-avro can be used in Kafka sink URIs and
changefeed configs.
The protocol is treated as an Avro-like protocol where needed, including:
For the MVP, Debezium Avro supports Confluent Schema Registry through
schema-registry and AWS Glue Schema Registry.
The protocol also keeps the existing Avro type-mode options where
applicable:
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit
Summary
debezium-avrosink protocol support with Confluent Schema Registry–based encoding/decoding.debezium-avro.debezium-avro.debezium-avro, including protocol parsing and schema/encoding behaviors.