branch-4.1: [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769)#61443
Open
morningman wants to merge 2 commits intoapache:branch-4.1from
Open
branch-4.1: [feat](maxcompute) Support INSERT INTO for MaxCompute external catalog tables (#60769)#61443morningman wants to merge 2 commits intoapache:branch-4.1from
morningman wants to merge 2 commits intoapache:branch-4.1from
Conversation
…og tables (apache#60769) Related apache#60768 Add end-to-end write support for MaxCompute external tables, enabling users to export data from Doris to MaxCompute via standard INSERT INTO syntax. This builds on the JNI writer framework introduced in apache#60756. Key changes: BE: - Add MCTableSinkOperatorX pipeline sink operator and MCTableSinkLocalState - Add VMCTableWriter (async) and VMCPartitionWriter for partition-aware writes - Extend VJniFormatTransformer with get_statistics() for retrieving writemetrics from Java-side writer - Track TMCCommitData in RuntimeState and report it back to coordinatorvia FragmentMgr FE: - Add MaxComputeJniWriter using MC Tunnel SDK for data upload - Add MCTransaction for upload session lifecycle management and commit - Add MCTransactionManager and MCInsertExecutor/MCInsertCommandContext - Add Nereids planner support: UnboundMaxComputeTableSink, LogicalMaxComputeTableSink, PhysicalMaxComputeTableSink with corresponding bind and implementation rules - Add MaxComputeTableSink planner node Thrift: - Define TMCCommitData, TMaxComputeTableSink, and MAXCOMPUTE_TABLE_SINK data sink type ``` -- 1. Create MaxCompute catalog CREATE CATALOG mc PROPERTIES ( "type" = "max_compute", "mc.default.project" = "doris_test_schema", "mc.access_key" = "ak", "mc.secret_key" = "sk", "mc.endpoint" = "http://service.cn-beijing-vpc.maxcompute.aliyun-inc.com/api" ); -- 2. Create database CREATE DATABASE mc_db; -- 3. Create table & INSERT INTO VALUES CREATE TABLE mc_db.t1 (id INT, name STRING, value DOUBLE); INSERT INTO mc_db.t1 VALUES (1, 'Alice', 10.5), (2, 'Bob', 20.3); -- 4. INSERT INTO SELECT CREATE TABLE mc_db.t2 (id INT, name STRING, value DOUBLE); INSERT INTO mc_db.t2 SELECT * FROM mc_db.t1; -- 5. CREATE TABLE AS SELECT (CTAS) CREATE TABLE mc_db.t3 AS SELECT * FROM mc_db.t1; -- 6. Partition table write CREATE TABLE mc_db.t4 (id INT, name STRING, ds STRING) PARTITION BY (ds)(); INSERT INTO mc_db.t4 VALUES (1, 'a', '20250101'), (2, 'b', '20250102'); -- 7. Multi-level partition table write CREATE TABLE mc_db.t5 (id INT, val STRING, ds STRING, region STRING) PARTITION BY (ds, region)(); INSERT INTO mc_db.t5 VALUES (1, 'v1', '20250101', 'bj'), (2, 'v2', '20250102', 'sh'); -- 8. Complex types (array, map, struct and nested) CREATE TABLE mc_db.t6 ( id INT, arr ARRAY<STRUCT<name:STRING, val:INT>>, m MAP<STRING, ARRAY<INT>>, s STRUCT<outer_f:STRING, inner_f:STRUCT<a:INT, b:STRING>> ); INSERT INTO mc_db.t6 VALUES ( 1, array(named_struct('name','a','val',1), named_struct('name','b','val',2)), map('k1', array(1,2,3), 'k2', array(4,5)), named_struct('outer_f','hello','inner_f',named_struct('a',10,'b','world')) ); -- 9. static partition INSERT INTO static_multi_ecd6860d PARTITION(ds='20250101', region='bj', ds='20250102') VALUES (1, 'v1'), (2, 'v2'); -- 10. insert overwrite INSERT OVERWRITE TABLE overwrite_nopart_d3a90945 VALUES (2, 'new') ``` Capabilities covered: - CREATE/DROP DATABASE - CREATE/DROP TABLE (including partitioned and multi-level partitioned tables) - INSERT INTO VALUES / INSERT INTO SELECT - CREATE TABLE AS SELECT (CTAS) - Full type support (primitive types + nested complex types) - Cross-catalog write - Large data volume write (2million rows) - INSERT OVERWRITE - INSERT into specified partition(static partition insertion)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
FE UT Coverage ReportIncrement line coverage |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bp #60769