Skip to content

Support writing to Apache Paimon tables #65086

Description

@suxiaogang223

Background

Apache Doris already supports reading Apache Paimon tables through External Catalog. To complete the read/write experience for lakehouse workloads, we plan to support writing data from Doris into Paimon tables, so users can write query results, load results, and computed data into Paimon tables using Doris SQL.

This issue tracks the overall plan and follow-up PRs for Paimon write support in Doris.

Goals

  • Support INSERT INTO for Paimon tables.
  • Support INSERT OVERWRITE for Paimon tables.
  • Support writing append-only tables and primary-key tables.
  • Support writing partitioned tables, fixed bucket tables, and dynamic bucket tables.
  • Support writing both primitive types and complex types.
  • Support concurrent writes from multiple Doris backends and fragments without degrading the write path into a single-writer bottleneck.
  • Align Doris transaction semantics with Paimon commit and abort semantics.
  • Preserve correctness for failure, retry, and rollback scenarios.
  • Continuously improve write throughput, file size control, and conflict reduction.

Scope

Basic Write Path

  • Support Paimon tables as Doris write targets.
  • Support INSERT INTO Paimon table SELECT ....
  • Commit Paimon write results only after the Doris transaction succeeds.
  • Clean up uncommitted write results when the Doris transaction fails.
  • Preserve consistency for commit retry and fragment retry scenarios.
  • Support primitive data type writes.

Concurrent Writes and Table Layout

  • Support concurrent writes to non-partitioned tables.
  • Support writes to partitioned tables.
  • Support writes to fixed bucket tables.
  • Support writes to dynamic bucket tables.
  • Organize Doris write distribution according to Paimon table layout where possible, reducing small files and write conflicts.
  • Continue improving writer control for the same partition and bucket.

Note: strictly guaranteeing that the same (partition, bucket) is written by only one Doris writer requires Doris distribution to be aware of Paimon bucket semantics. This can be tracked as a dedicated enhancement.

Table Types and Write Semantics

  • Support append-only table writes.
  • Support primary-key table writes.
  • Support full-row writes for primary-key tables.
  • Plan follow-up support for partial update, delete, update, merge, and Paimon merge-engine related semantics.

INSERT OVERWRITE

  • Support INSERT OVERWRITE for Paimon tables.
  • Support overwrite for non-partitioned tables.
  • Support static partition overwrite.
  • Support dynamic partition overwrite.
  • Preserve overwrite correctness for failure, retry, rollback, and empty-input scenarios.

Types and Storage Environments

  • Support primitive Doris types mapped to Paimon types.
  • Support complex types such as array, map, and struct.
  • Validate decimal, timestamp, timezone, and binary semantics.
  • Support writes on HDFS and object storage environments such as S3 and OSS.
  • Support write compatibility with schema evolution scenarios.

PR Tracking

  • Support the basic Paimon write path
  • Align Doris transactions with Paimon commit and abort
  • Support append-only table writes
  • Support primary-key table writes
  • Support partitioned table writes
  • Support fixed bucket table writes
  • Support dynamic bucket table writes
  • Support INSERT OVERWRITE
  • Support primitive type writes
  • Support complex type writes
  • Support object storage write scenarios
  • Improve partition and bucket write distribution
  • Improve small file control and write throughput
  • Add tests for append-only table writes
  • Add tests for primary-key table writes
  • Add tests for partitioned and bucketed table writes
  • Add tests for dynamic bucket table writes
  • Add tests for INSERT OVERWRITE
  • Add tests for complex type writes
  • Add tests for transaction commit, retry, and rollback
  • Add tests for object storage write scenarios

Risks and Notes

  • Bucketed table writes need careful handling of concurrent writers, small files, and commit conflicts.
  • Dynamic bucket support requires additional planning around data distribution and bucket management.
  • Primary-key table writes need clear semantic boundaries for full-row writes, partial updates, delete, and update.
  • INSERT OVERWRITE needs strict correctness for failure, retry, rollback, and empty-input cases.
  • Doris transaction semantics and Paimon commit/abort semantics must remain consistent under failures and retries.
  • Complex types, decimal, timestamp, timezone, and binary values need dedicated validation.
  • Writes, commits, and cleanup on object storage need dedicated validation.

Expected Benefits

With Paimon write support, Doris users will be able to write query results, load results, and computed data directly into Paimon tables using Doris SQL. This completes the read/write loop for lakehouse workloads and lays the foundation for more complete Paimon data write, update, and table management capabilities in Doris.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions