Background
Apache Doris already supports reading Apache Paimon tables through External Catalog. To complete the read/write experience for lakehouse workloads, we plan to support writing data from Doris into Paimon tables, so users can write query results, load results, and computed data into Paimon tables using Doris SQL.
This issue tracks the overall plan and follow-up PRs for Paimon write support in Doris.
Goals
- Support
INSERT INTO for Paimon tables.
- Support
INSERT OVERWRITE for Paimon tables.
- Support writing append-only tables and primary-key tables.
- Support writing partitioned tables, fixed bucket tables, and dynamic bucket tables.
- Support writing both primitive types and complex types.
- Support concurrent writes from multiple Doris backends and fragments without degrading the write path into a single-writer bottleneck.
- Align Doris transaction semantics with Paimon commit and abort semantics.
- Preserve correctness for failure, retry, and rollback scenarios.
- Continuously improve write throughput, file size control, and conflict reduction.
Scope
Basic Write Path
- Support Paimon tables as Doris write targets.
- Support
INSERT INTO Paimon table SELECT ....
- Commit Paimon write results only after the Doris transaction succeeds.
- Clean up uncommitted write results when the Doris transaction fails.
- Preserve consistency for commit retry and fragment retry scenarios.
- Support primitive data type writes.
Concurrent Writes and Table Layout
- Support concurrent writes to non-partitioned tables.
- Support writes to partitioned tables.
- Support writes to fixed bucket tables.
- Support writes to dynamic bucket tables.
- Organize Doris write distribution according to Paimon table layout where possible, reducing small files and write conflicts.
- Continue improving writer control for the same partition and bucket.
Note: strictly guaranteeing that the same (partition, bucket) is written by only one Doris writer requires Doris distribution to be aware of Paimon bucket semantics. This can be tracked as a dedicated enhancement.
Table Types and Write Semantics
- Support append-only table writes.
- Support primary-key table writes.
- Support full-row writes for primary-key tables.
- Plan follow-up support for partial update, delete, update, merge, and Paimon merge-engine related semantics.
INSERT OVERWRITE
- Support
INSERT OVERWRITE for Paimon tables.
- Support overwrite for non-partitioned tables.
- Support static partition overwrite.
- Support dynamic partition overwrite.
- Preserve overwrite correctness for failure, retry, rollback, and empty-input scenarios.
Types and Storage Environments
- Support primitive Doris types mapped to Paimon types.
- Support complex types such as array, map, and struct.
- Validate decimal, timestamp, timezone, and binary semantics.
- Support writes on HDFS and object storage environments such as S3 and OSS.
- Support write compatibility with schema evolution scenarios.
PR Tracking
Risks and Notes
- Bucketed table writes need careful handling of concurrent writers, small files, and commit conflicts.
- Dynamic bucket support requires additional planning around data distribution and bucket management.
- Primary-key table writes need clear semantic boundaries for full-row writes, partial updates, delete, and update.
INSERT OVERWRITE needs strict correctness for failure, retry, rollback, and empty-input cases.
- Doris transaction semantics and Paimon commit/abort semantics must remain consistent under failures and retries.
- Complex types, decimal, timestamp, timezone, and binary values need dedicated validation.
- Writes, commits, and cleanup on object storage need dedicated validation.
Expected Benefits
With Paimon write support, Doris users will be able to write query results, load results, and computed data directly into Paimon tables using Doris SQL. This completes the read/write loop for lakehouse workloads and lays the foundation for more complete Paimon data write, update, and table management capabilities in Doris.
Background
Apache Doris already supports reading Apache Paimon tables through External Catalog. To complete the read/write experience for lakehouse workloads, we plan to support writing data from Doris into Paimon tables, so users can write query results, load results, and computed data into Paimon tables using Doris SQL.
This issue tracks the overall plan and follow-up PRs for Paimon write support in Doris.
Goals
INSERT INTOfor Paimon tables.INSERT OVERWRITEfor Paimon tables.Scope
Basic Write Path
INSERT INTO Paimon table SELECT ....Concurrent Writes and Table Layout
Note: strictly guaranteeing that the same
(partition, bucket)is written by only one Doris writer requires Doris distribution to be aware of Paimon bucket semantics. This can be tracked as a dedicated enhancement.Table Types and Write Semantics
INSERT OVERWRITE
INSERT OVERWRITEfor Paimon tables.Types and Storage Environments
PR Tracking
INSERT OVERWRITEINSERT OVERWRITERisks and Notes
INSERT OVERWRITEneeds strict correctness for failure, retry, rollback, and empty-input cases.Expected Benefits
With Paimon write support, Doris users will be able to write query results, load results, and computed data directly into Paimon tables using Doris SQL. This completes the read/write loop for lakehouse workloads and lays the foundation for more complete Paimon data write, update, and table management capabilities in Doris.