Kafka Connect: Enable Parquet variant shredding for generic Record writes by soumilshah1995 · Pull Request #16370 · apache/iceberg

soumilshah1995 · 2026-05-16T21:58:32Z

Summary

Kafka Connect uses Iceberg’s generic Record model with a Void engine schema. Parquet variant shredding was ineffective on that path because the generic ParquetFormatModel did not use a variant shredding analyzer / row copier, and RecordVariantShreddingAnalyzer could not resolve VARIANT columns (engine resolveColumnIndex is a dead end for Void).

This PR wires RecordVariantShreddingAnalyzer and Record::copy into GenericFormatModels and analyzes VARIANT columns by Iceberg Schema#columns() order so buffered inference and shredded Parquet columns work for Connect.

Changes

GenericFormatModels: register ParquetFormatModel with RecordVariantShreddingAnalyzer + Record::copy.
RecordVariantShreddingAnalyzer: implement analyzeVariantColumns using positional indices aligned with Record#get.

Config (Connect)

Table write properties (e.g. via iceberg.tables.write-props):

write.parquet.shred-variants=true
write.parquet.variant-inference-buffer-size=<rows>

Test plan

./gradlew :iceberg-data:check :iceberg-kafka-connect:iceberg-kafka-connect:check (or CI green).
Connect sink writing VARIANT with write.parquet.shred-variants=true; inspect Parquet for typed_value paths / higher physical column count vs false.
Regression: Connect append with shredding disabled still succeeds.

Register ParquetFormatModel with RecordVariantShreddingAnalyzer and Record::copy, and analyze VARIANT columns using Iceberg schema column order so shredding works with Void engine schemas (Kafka Connect).

github-actions Bot added the data label May 16, 2026

soumilshah1995 force-pushed the kafka-connect-shredding branch 2 times, most recently from 8ce1a9d to ccb48f1 Compare May 16, 2026 22:00

Kafka Connect: Enable Parquet variant shredding for generic Record

67bfc1a

Register ParquetFormatModel with RecordVariantShreddingAnalyzer and Record::copy, and analyze VARIANT columns using Iceberg schema column order so shredding works with Void engine schemas (Kafka Connect).

soumilshah1995 force-pushed the kafka-connect-shredding branch from ccb48f1 to 67bfc1a Compare May 16, 2026 22:05

This was referenced May 17, 2026

Kafka Connect: Enable Parquet variant shredding for generic Record writes#16370 #16387

Open

Kafka Connect: Support VARIANT when record convert #15283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka Connect: Enable Parquet variant shredding for generic Record writes#16370

Kafka Connect: Enable Parquet variant shredding for generic Record writes#16370
soumilshah1995 wants to merge 1 commit into
apache:mainfrom
soumilshah1995:kafka-connect-shredding

soumilshah1995 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

soumilshah1995 commented May 16, 2026

Summary

Changes

Config (Connect)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants