From 4f7c62538d5c3bb79bc8f7b1210b6ac2e2cb8f87 Mon Sep 17 00:00:00 2001 From: luoyuxia Date: Fri, 17 Apr 2026 20:33:06 +0800 Subject: [PATCH 1/4] docs: add Paimon Rust 0.1.0 release note Co-Authored-By: Claude Opus 4.6 --- .../releases/paimon-rust-release-0.1.0.md | 227 ++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 community/docs/releases/paimon-rust-release-0.1.0.md diff --git a/community/docs/releases/paimon-rust-release-0.1.0.md b/community/docs/releases/paimon-rust-release-0.1.0.md new file mode 100644 index 0000000000000..d7ecd76a6264c --- /dev/null +++ b/community/docs/releases/paimon-rust-release-0.1.0.md @@ -0,0 +1,227 @@ +--- +title: "Paimon Rust Release 0.1.0" +type: release +version: Paimon Rust 0.1.0 +--- + +# Paimon Rust 0.1.0 Available + +Apr 2025 - Yuxia Luo (luoyuxia@apache.org) + +The Apache Paimon PMC officially announces the first release of Paimon Rust 0.1.0. This release includes +133 commits from 21 contributors. We would like to express our sincere gratitude to all the developers +who have participated in the contribution! + +## What is Paimon Rust? + +[Paimon Rust](https://github.com/apache/paimon-rust) is a native Rust implementation of the Apache Paimon +lake format. It provides a high-performance, cross-platform entry point to the Paimon ecosystem with +multi-language bindings and query engine integration. + +## Version Overview + +The first version of Paimon Rust supports the following features: + +1. Full Paimon table format specification implementation. +2. Table Read: ReadBuilder with column projection, predicate pushdown, schema evolution, deletion vector, and multi-format reader (Parquet, ORC, Avro). +3. Table Write (Initial): INSERT INTO/OVERWRITE, SnapshotCommit, and MERGE & UPDATE support. +4. Table Scan: partition/bucket pruning, stats pruning, deletion vector filtering, and limit pushdown. +5. Index: File Index, BTree global index, and Tantivy full-text search. +6. Time Travel: by snapshot ID, timestamp, tag name, and DataFusion `VERSION AS OF`. +7. Catalog: Filesystem Catalog and REST Catalog. +8. Storage: Local filesystem, S3, OSS, and HDFS. +9. Apache DataFusion integration with DDL, DML, predicate pushdown, and parallelized split execution. +10. Language bindings: Python (pypaimon_rust), Go. + +### Supported Table Types + +| Table Type | Scan | Read | Write | +|------------------------------------------|-----------|-----------|----------------------------| +| Append-Only Table | Supported | Supported | Supported | +| Primary Key Table (with Deletion Vector) | Supported | Supported | Not yet supported | +| Data Evolution Table (Append-Only) | Supported | Supported | Supported (MERGE & UPDATE) | + +- **Append-Only Table**: Fully supported for scan, read, and write. Supports partitioned and bucketed tables. +- **Primary Key Table**: Scan and read are supported when deletion vectors are enabled. Write is not yet supported + in this version. +- **Data Evolution Table**: Supports MERGE INTO operations on append-only tables with `data-evolution.enabled` + and `row-tracking.enabled`, enabling row-ID-based column updates. + +### Table Format Specification + +Implemented the complete Paimon spec layer including Snapshot, Schema, DataFileMeta, ManifestList, ManifestFile, +IndexManifest, and all primitive & complex data types with full serialization/deserialization support. + +### Table Read + +Paimon Rust provides a full-featured read path: + +- **ReadBuilder** with column projection, partition-level and data-level predicate pushdown, and filter + push-down to the Parquet read path. +- **Deletion Vector** support — plan, read, and apply deletion vectors with row-level filtering and + cardinality tracking. +- **Schema Evolution** read with `SchemaManager`, supporting data evolution table mode and row-id-based + evolution filtering. +- **Multi-format readers** — abstracted `FormatFileReader` with Parquet (primary), ORC, and Avro reader + implementations. + +### Table Scan + +- **TableScan** with split planning. +- **Partition and bucket pruning** to skip irrelevant data. +- **Stats-based pruning** at both partition and data-file levels. +- **Deletion vector and postpone filtering** with data-evolution group pruning. +- **Limit pushdown** support. + +### Table Write + +The initial write path includes: + +- **Write Pipeline** with DataFusion `INSERT INTO` / `INSERT OVERWRITE` support for append-only tables. +- **Commit Pipeline** with `SnapshotCommit` abstraction for atomic commits. +- **MERGE & UPDATE** support via `DataEvolutionWriter` for data evolution tables. + +### Index + +- **File Index** format read and write. +- **BTree Global Index** reader with async on-demand block loading. +- **Tantivy Full-Text Search** with on-demand archive reading. + +### Time Travel + +- Time travel by **snapshot ID**, **timestamp**, and **tag name**. +- DataFusion **`VERSION AS OF`** syntax support. + +### Catalog + +- **Filesystem Catalog** for local and cloud storage. +- **REST Catalog** with full database and table CRUD operations. + +### Storage Backends + +Paimon Rust supports multiple storage backends via [Apache OpenDAL](https://github.com/apache/opendal): + +- **Local filesystem** (default) +- **Amazon S3** +- **Alibaba Cloud OSS** +- **HDFS** (via hdfs-native) + +### Apache DataFusion Integration + +The `paimon-datafusion` crate provides deep integration with Apache DataFusion: + +- Full **CatalogProvider** integration — register Paimon catalogs as DataFusion catalogs. +- **DDL** support with `CREATE TABLE` and `PRIMARY KEY` constraint syntax. +- **INSERT INTO / INSERT OVERWRITE** write support. +- **Partition predicate pushdown** and **statistics** for scan optimization. +- **Parallelized split execution** for high throughput. +- **Limit pushdown**. +- **Time travel** with `VERSION AS OF`. +- **`$options` system table** for table options inspection. + +### Python Binding + +Introduced `pypaimon` core with DataFusion catalog integration, allowing Python users to query Paimon +tables via DataFusion's Python interface. + +### Go Binding + +- **C FFI binding** layer for cross-language interop. +- **Go binding** with table read, column projection, and filter push-down. +- **Predicate API** with compound predicates (And/Or/Not) support. + +## Getting Started + +Add `paimon` to your `Cargo.toml`: + +```toml +[dependencies] +paimon = "0.1.0" +``` + +### Read a table with Paimon native API + +```rust +use futures::TryStreamExt; +use paimon::{Catalog, CatalogOptions, FileSystemCatalog, Options}; +use paimon::catalog::Identifier; + +#[tokio::main] +async fn main() { + // 1. Create a FileSystemCatalog + let mut options = Options::new(); + options.set(CatalogOptions::WAREHOUSE, "/path/to/warehouse"); + let catalog = FileSystemCatalog::new(options).unwrap(); + + // 2. Get a table + let identifier = Identifier::new("my_db", "my_table"); + let table = catalog.get_table(&identifier).await.unwrap(); + + // 3. Scan with projection and filter + let mut read_builder = table.new_read_builder(); + read_builder.with_projection(&["id", "name"]); + + let scan = read_builder.new_scan(); + let plan = scan.plan().await.unwrap(); + + // 4. Read data as Arrow RecordBatches + let read = read_builder.new_read().unwrap(); + let batches: Vec<_> = read + .to_arrow(plan.splits()) + .unwrap() + .try_collect() + .await + .unwrap(); + + for batch in &batches { + println!("{:?}", batch); + } +} +``` + +### Query with DataFusion SQL + +Add `paimon-datafusion` to your `Cargo.toml`: + +```toml +[dependencies] +paimon-datafusion = "0.1.0" +``` + +```rust +use std::sync::Arc; +use datafusion::prelude::SessionContext; +use paimon::{Catalog, CatalogOptions, FileSystemCatalog, Options}; +use paimon_datafusion::PaimonCatalogProvider; + +#[tokio::main] +async fn main() { + // 1. Create a catalog and register it with DataFusion + let mut options = Options::new(); + options.set(CatalogOptions::WAREHOUSE, "/path/to/warehouse"); + let catalog = FileSystemCatalog::new(options).unwrap(); + + let provider = PaimonCatalogProvider::new(Arc::new(catalog)); + let ctx = SessionContext::new(); + ctx.register_catalog("paimon", Arc::new(provider)); + + // 2. Query Paimon tables with SQL + let df = ctx + .sql("SELECT id, name FROM paimon.my_db.my_table WHERE id > 100") + .await + .unwrap(); + + df.show().await.unwrap(); +} +``` + +For more information, visit the [documentation](https://paimon.apache.org/docs/rust/) and the +[GitHub repository](https://github.com/apache/paimon-rust). + +## Contributors + +Thanks to all the contributors who made this release possible: + +Aitozi, Asura7969, Cancai Cai, DogeKing, ErXi, ForwardXu, Huanbing, HunterXHunter, Jiajia Li, +Jingsong Lee, QuakeWang, Ryan Tan, SeungMin, Song Chuanqi, WenjunMin, XiaoHongbo, Xuanwo, +Yuxia Luo, Zach, umi, zmlcc From 545a91ad16cbfdcbebab00a3a28aef68fe2572aa Mon Sep 17 00:00:00 2001 From: luoyuxia Date: Fri, 17 Apr 2026 20:37:57 +0800 Subject: [PATCH 2/4] fix: adjust frontmatter version to avoid duplicate title Co-Authored-By: Claude Opus 4.6 --- community/docs/releases/paimon-rust-release-0.1.0.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/community/docs/releases/paimon-rust-release-0.1.0.md b/community/docs/releases/paimon-rust-release-0.1.0.md index d7ecd76a6264c..32c17cb506009 100644 --- a/community/docs/releases/paimon-rust-release-0.1.0.md +++ b/community/docs/releases/paimon-rust-release-0.1.0.md @@ -1,7 +1,7 @@ --- -title: "Paimon Rust Release 0.1.0" +title: "Rust Release 0.1.0" type: release -version: Paimon Rust 0.1.0 +version: Rust 0.1.0 --- # Paimon Rust 0.1.0 Available From 3f4a3fd61f17b8b3ff4068286bcdbdf59cbdf0bf Mon Sep 17 00:00:00 2001 From: luoyuxia Date: Fri, 17 Apr 2026 20:40:16 +0800 Subject: [PATCH 3/4] fix: restore full project name in frontmatter title Co-Authored-By: Claude Opus 4.6 --- community/docs/releases/paimon-rust-release-0.1.0.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/community/docs/releases/paimon-rust-release-0.1.0.md b/community/docs/releases/paimon-rust-release-0.1.0.md index 32c17cb506009..0d544596891d4 100644 --- a/community/docs/releases/paimon-rust-release-0.1.0.md +++ b/community/docs/releases/paimon-rust-release-0.1.0.md @@ -1,12 +1,12 @@ --- -title: "Rust Release 0.1.0" +title: "Paimon Rust Release 0.1.0" type: release version: Rust 0.1.0 --- # Paimon Rust 0.1.0 Available -Apr 2025 - Yuxia Luo (luoyuxia@apache.org) +Apr 2026 - Yuxia Luo (luoyuxia@apache.org) The Apache Paimon PMC officially announces the first release of Paimon Rust 0.1.0. This release includes 133 commits from 21 contributors. We would like to express our sincere gratitude to all the developers From d6f0c15ae0ad53bb08fbd94a327d0424537a7b04 Mon Sep 17 00:00:00 2001 From: luoyuxia Date: Wed, 29 Apr 2026 14:43:00 +0800 Subject: [PATCH 4/4] docs: add future roadmap and fix contributor list in Paimon Rust 0.1.0 release note Co-Authored-By: Claude Opus 4.6 --- .../releases/paimon-rust-release-0.1.0.md | 43 ++++++++++++++++++- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/community/docs/releases/paimon-rust-release-0.1.0.md b/community/docs/releases/paimon-rust-release-0.1.0.md index 0d544596891d4..fee03f6907486 100644 --- a/community/docs/releases/paimon-rust-release-0.1.0.md +++ b/community/docs/releases/paimon-rust-release-0.1.0.md @@ -218,10 +218,49 @@ async fn main() { For more information, visit the [documentation](https://paimon.apache.org/docs/rust/) and the [GitHub repository](https://github.com/apache/paimon-rust). +## Future + +### Feature Parity with Paimon Java + +Continue aligning Paimon Rust with the Java implementation to cover core lakehouse capabilities: + +- **Primary Key Table** — full read/write support with sort-merge deduplication, dynamic bucket + assignment, and additional merge engines (partial-update, aggregation). +- **DML & SQL** — copy-on-write DML (DELETE, UPDATE, MERGE INTO), `TRUNCATE TABLE`, + `DROP PARTITION`, `INSERT OVERWRITE PARTITION`, `CALL` procedures, and session-scoped + dynamic options (`SET`/`RESET`). +- **System Tables** — `$schemas`, `$snapshots`, `$tags`, `$manifests` for metadata inspection + via DataFusion SQL. + +### Multimodal Support + +Build multimodal data capabilities on top of the Paimon lake format: + +- **Lumina Vector Index** — vector similarity search via DataFusion SQL. +- **Vortex File Format** — [Vortex](https://github.com/spiraldb/vortex) columnar format support + as an alternative to Parquet. +- **Variant Type** — semi-structured data support. +- **Blob Type** — large binary object storage and DDL semantics. + +### TypeScript Language Binding + +As AI agents become a primary way to interact with data infrastructure, a native TypeScript binding allows agent frameworks to directly read and +query Paimon tables without JVM overhead, making Paimon a first-class data source in the AI agent ecosystem. + +### Fluss Integration + +[Fluss](https://fluss.apache.org/) is a streaming storage built for real-time analytics, where +fresh data lands in Fluss and is continuously tiered into the Paimon lake. By integrating +[Fluss](https://fluss.apache.org/) as a DataFusion data source alongside Paimon, +we can build a unified query layer that transparently merges real-time data in Fluss with +historical data in Paimon, delivering second-level data freshness for analytical queries — all +within a pure-Rust, JVM-free deployment. + + ## Contributors Thanks to all the contributors who made this release possible: -Aitozi, Asura7969, Cancai Cai, DogeKing, ErXi, ForwardXu, Huanbing, HunterXHunter, Jiajia Li, +Aitozi, Asura7969, Cancai Cai, DogeKing, ForwardXu, Huanbing, HunterXHunter, Jiajia Li, Jingsong Lee, QuakeWang, Ryan Tan, SeungMin, Song Chuanqi, WenjunMin, XiaoHongbo, Xuanwo, -Yuxia Luo, Zach, umi, zmlcc +Yuxia Luo, umi, zmlcc