Skip to content

branch-4.1: [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769)#61443

Open
morningman wants to merge 2 commits intoapache:branch-4.1from
morningman:41_bp60769
Open

branch-4.1: [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769)#61443
morningman wants to merge 2 commits intoapache:branch-4.1from
morningman:41_bp60769

Conversation

@morningman
Copy link
Contributor

bp #60769

…og tables (apache#60769)

Related apache#60768

Add end-to-end write support for MaxCompute external tables, enabling
users to export data from Doris to MaxCompute via standard INSERT INTO
syntax. This builds on the JNI writer framework introduced in apache#60756.

Key changes:

BE:
- Add MCTableSinkOperatorX pipeline sink operator and
MCTableSinkLocalState
- Add VMCTableWriter (async) and VMCPartitionWriter for partition-aware
writes
- Extend VJniFormatTransformer with get_statistics() for retrieving
writemetrics from Java-side writer
- Track TMCCommitData in RuntimeState and report it back to
coordinatorvia FragmentMgr

FE:
- Add MaxComputeJniWriter using MC Tunnel SDK for data upload
- Add MCTransaction for upload session lifecycle management and commit
- Add MCTransactionManager and MCInsertExecutor/MCInsertCommandContext
- Add Nereids planner support: UnboundMaxComputeTableSink,
  LogicalMaxComputeTableSink, PhysicalMaxComputeTableSink with
  corresponding bind and implementation rules
- Add MaxComputeTableSink planner node

Thrift:
- Define TMCCommitData, TMaxComputeTableSink, and MAXCOMPUTE_TABLE_SINK
  data sink type

```
-- 1. Create MaxCompute catalog
CREATE CATALOG mc PROPERTIES (
  "type" = "max_compute",
  "mc.default.project" = "doris_test_schema",
  "mc.access_key" = "ak",
  "mc.secret_key" = "sk",
  "mc.endpoint" = "http://service.cn-beijing-vpc.maxcompute.aliyun-inc.com/api"
);

-- 2. Create database
CREATE DATABASE mc_db;

-- 3. Create table & INSERT INTO VALUES
CREATE TABLE mc_db.t1 (id INT, name STRING, value DOUBLE);
INSERT INTO mc_db.t1 VALUES (1, 'Alice', 10.5), (2, 'Bob', 20.3);

-- 4. INSERT INTO SELECT
CREATE TABLE mc_db.t2 (id INT, name STRING, value DOUBLE);
INSERT INTO mc_db.t2 SELECT * FROM mc_db.t1;

-- 5. CREATE TABLE AS SELECT (CTAS)
CREATE TABLE mc_db.t3 AS SELECT * FROM mc_db.t1;

-- 6. Partition table write
CREATE TABLE mc_db.t4 (id INT, name STRING, ds STRING)
PARTITION BY (ds)();
INSERT INTO mc_db.t4 VALUES (1, 'a', '20250101'), (2, 'b', '20250102');

-- 7. Multi-level partition table write
CREATE TABLE mc_db.t5 (id INT, val STRING, ds STRING, region STRING)
PARTITION BY (ds, region)();
INSERT INTO mc_db.t5 VALUES (1, 'v1', '20250101', 'bj'), (2, 'v2', '20250102', 'sh');

-- 8. Complex types (array, map, struct and nested)
CREATE TABLE mc_db.t6 (
  id INT,
  arr ARRAY<STRUCT<name:STRING, val:INT>>,
  m MAP<STRING, ARRAY<INT>>,
  s STRUCT<outer_f:STRING, inner_f:STRUCT<a:INT, b:STRING>>
);
INSERT INTO mc_db.t6 VALUES (
  1,
  array(named_struct('name','a','val',1), named_struct('name','b','val',2)),
  map('k1', array(1,2,3), 'k2', array(4,5)),
  named_struct('outer_f','hello','inner_f',named_struct('a',10,'b','world'))
);

-- 9. static partition
INSERT INTO static_multi_ecd6860d PARTITION(ds='20250101', region='bj', ds='20250102') VALUES (1, 'v1'), (2, 'v2');

-- 10. insert overwrite
INSERT OVERWRITE TABLE overwrite_nopart_d3a90945 VALUES (2, 'new')
```

Capabilities covered:
- CREATE/DROP DATABASE
- CREATE/DROP TABLE (including partitioned and multi-level partitioned
tables)
- INSERT INTO VALUES / INSERT INTO SELECT
- CREATE TABLE AS SELECT (CTAS)
- Full type support (primitive types + nested complex types)
- Cross-catalog write
- Large data volume write (2million rows)

- INSERT OVERWRITE
- INSERT into specified partition(static partition insertion)
@Thearas
Copy link
Contributor

Thearas commented Mar 17, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.40% (31932/49580)
Region Coverage 65.24% (15981/24496)
Branch Coverage 55.79% (8499/15234)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/269) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.84% (19358/36636)
Line Coverage 36.18% (180806/499679)
Region Coverage 32.66% (139755/427971)
Branch Coverage 33.66% (60825/180719)

@morningman morningman changed the title [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769) branch-4.1: [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769) Mar 17, 2026
@morningman morningman requested a review from yiguolei as a code owner March 18, 2026 00:17
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.43% (31942/49580)
Region Coverage 65.22% (15977/24496)
Branch Coverage 55.80% (8501/15234)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 4.76% (22/462) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/269) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.80% (19345/36636)
Line Coverage 36.16% (180679/499679)
Region Coverage 32.64% (139669/427971)
Branch Coverage 33.64% (60790/180719)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants