Skip to content

[fix](fe) Preserve external table column name case#65094

Open
Gabriel39 wants to merge 5 commits into
apache:masterfrom
Gabriel39:fix_0701
Open

[fix](fe) Preserve external table column name case#65094
Gabriel39 wants to merge 5 commits into
apache:masterfrom
Gabriel39:fix_0701

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.

Release note

Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.

Check List (For Author)

  • Test: Unit Test
    • Maven focused FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest,PaimonMetadataOpsTest,IcebergUtilsTest#testParseSchemaPreservesNonLowercaseColumnNames,PaimonUtilTest#testParseSchemaPreservesNonLowercaseColumnNames
    • git diff --check
    • A broader focused run including two existing Mockito-based IcebergUtilsTest methods compiled successfully but those two methods failed locally because Mockito inline Byte Buddy could not self-attach to the Homebrew JDK 17 VM.
  • Behavior changed: Yes. Iceberg and Paimon external schemas, partition specs, and partition metadata now preserve external column name casing.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.

### Release note

Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.

### Check List (For Author)

- Test: Unit Test
    - Maven focused FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest,PaimonMetadataOpsTest,IcebergUtilsTest#testParseSchemaPreservesNonLowercaseColumnNames,PaimonUtilTest#testParseSchemaPreservesNonLowercaseColumnNames
    - git diff --check
    - A broader focused run including two existing Mockito-based IcebergUtilsTest methods compiled successfully but those two methods failed locally because Mockito inline Byte Buddy could not self-attach to the Homebrew JDK 17 VM.
- Behavior changed: Yes. Iceberg and Paimon external schemas, partition specs, and partition metadata now preserve external column name casing.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR for external table column-name case preservation across Iceberg and Paimon schema conversion, create-table DDL, partition metadata, predicate pushdown, scan projection, JNI reader handoff, and the added unit tests.

I found three issues that should be fixed before merge: Paimon mixed-case columns can be dropped or fail in scan projection/JNI required-field matching, existing Paimon scan-node unit tests still assert the old lower-case partition-key contract, and Iceberg sort-order creation still binds column names case-sensitively after the schema now preserves original case.

Critical checkpoints: the goal is only partially achieved; new tests cover schema and partition-name preservation but not Paimon scan projection or Iceberg sort order. The changes are focused, and I did not find concurrency, lifecycle, persistence, or FE/BE protocol versioning concerns. There is a parallel-path gap in Paimon projection/JNI handling and an Iceberg sort-order DDL gap. No new config or observability change is involved.

Validation: I ran git diff --check on the scoped PR diff and it was clean. I did not run FE unit tests because this checkout is missing thirdparty/installed and thirdparty/installed/bin/protoc, which fe/AGENTS.md requires before FE build/test execution.

Subagent conclusions: tests-session-config proposed TEST-1, accepted as M-1. optimizer-rewrite proposed OPT-1, merged into M-2 with additional main verification. M-3 was found and verified by the main pass. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Paimon scan projection still matched Doris slot names against Paimon field names with lower-case or exact string comparisons, so mixed-case columns could be dropped from the FE projection or rejected by the JNI scanner as missing. Iceberg table creation also passed sort-order column names to the Iceberg builder without resolving them through the case-insensitive schema lookup, so ORDER BY clauses that used normalized column spelling could fail to bind to mixed-case Iceberg fields. This change resolves Paimon field indexes case-insensitively in both FE and JNI scan paths and resolves Iceberg sort-order names to the canonical schema field name before building the sort order.

### Release note

Fix Paimon scan projection and Iceberg sort-order handling for mixed-case external column names.

### Check List (For Author)

- Test: Unit Test
    - Remote FE UT: ssh gabriel@10.26.20.3, /mnt/disk3/gabriel/Workspace/dev3/doris, MAVEN_ARGS=-o ./run-fe-ut.sh --run org.apache.doris.datasource.paimon.source.PaimonScanNodeTest#testGetPathPartitionKeysReturnsTablePartitionKeys+testSetPaimonParamsUsesOrderedPartitionKeys+testGetFieldIndexMatchesMixedCaseColumns
    - Maven focused Paimon JNI test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl be-java-extensions/paimon-scanner -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=PaimonJniScannerTest#testGetFieldIndexMatchesMixedCaseColumns
    - Maven focused Iceberg FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest#testSortOrderResolvesNonLowercaseColumnNamesCaseInsensitively
    - git diff --check
- Behavior changed: Yes. Paimon scan projection and Iceberg sort-order creation now resolve mixed-case external column names case-insensitively while preserving canonical schema names.
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review complete. I did not find any additional blocking issue to raise as a new inline comment beyond the existing review context.

Critical checkpoint conclusions:

  • Goal/test proof: The current head preserves mixed-case external Iceberg/Paimon root column names and resolves the previously risky partition, sort-order, FE projection, and JNI required-field paths case-insensitively. The PR adds focused unit coverage for mixed-case Iceberg schema/partition/sort-order handling, Paimon schema/partition handling, FE scan projection lookup, and JNI field lookup.
  • Scope/focus: The changes are focused on external Iceberg/Paimon name preservation and the directly affected tests.
  • Concurrency/lifecycle/config: I found no new concurrency path, lifecycle ownership change, or configuration item introduced by this PR.
  • Compatibility/parallel paths: I checked the parallel Paimon scan paths across FE split planning, path partition materialization, native raw-file splits, JNI field projection, and datetime precision lookup; I also checked Iceberg create-table partition/sort binding and SHOW CREATE partition/sort display helpers. No additional issue remained after the current fixes.
  • Tests/results: The existing inline comments around Paimon partition-key tests, Paimon FE/JNI projection lookup, and Iceberg sort-order binding are addressed by the current head, so I did not resubmit them. I could not run the FE unit tests locally because this runner is missing thirdparty/installed and thirdparty/installed/bin/protoc, and .worktree_initialized is absent; I did run git diff --check on the authoritative PR file list and it passed.
  • Observability/transactions/persistence: No new transaction, edit-log, metric, or logging requirement appears applicable.
  • User focus: No additional user-provided review focus was present.

Subagent conclusions:

  • optimizer-rewrite found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the empty inline comment set.
  • tests-session-config found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the same empty inline comment set.
  • No subagent candidate became an inline comment; no new duplicates were merged beyond the existing GitHub threads already documented in the ledger.

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

Gabriel39 added 2 commits July 1, 2026 20:29
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: Regenerated the Iceberg invalid Avro column name regression output after rebuilding FE and BE and rerunning the target external Iceberg case against the initialized REST catalog.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - Ran test_iceberg_invaild_avro_name on the remote validation host with FE and BE rebuilt.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: Regenerated the Iceberg invalid Avro column name regression output with a FE rebuilt from the PR changes. The expected DESC output now preserves the original mixed-case external column name.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - Rebuilt FE on the remote validation host, started a temporary FE/BE cluster from the rebuilt output, and ran test_iceberg_invaild_avro_name against it.
- Behavior changed: No
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30018 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17612	4115	4072	4072
q2	2016	326	197	197
q3	10288	1466	841	841
q4	4680	471	351	351
q5	7526	845	580	580
q6	184	172	138	138
q7	797	864	652	652
q8	9664	1663	1567	1567
q9	5689	4450	4423	4423
q10	6800	1819	1551	1551
q11	511	363	314	314
q12	721	549	447	447
q13	18138	3383	2777	2777
q14	272	270	249	249
q15	q16	797	789	709	709
q17	1077	979	1093	979
q18	7080	5844	5671	5671
q19	1165	1230	1098	1098
q20	762	666	535	535
q21	5574	2743	2566	2566
q22	440	366	301	301
Total cold run time: 101793 ms
Total hot run time: 30018 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4383	4285	4297	4285
q2	291	319	207	207
q3	4636	5026	4455	4455
q4	2106	2188	1399	1399
q5	4428	4326	4324	4324
q6	237	180	131	131
q7	1923	2035	1678	1678
q8	2596	2254	2228	2228
q9	8080	8117	7810	7810
q10	4892	4779	4295	4295
q11	594	421	380	380
q12	759	803	568	568
q13	3266	3552	3025	3025
q14	307	292	270	270
q15	q16	726	719	651	651
q17	1367	1358	1476	1358
q18	7818	7666	7387	7387
q19	1204	1133	1117	1117
q20	2221	2210	1961	1961
q21	5353	4601	4473	4473
q22	516	467	399	399
Total cold run time: 57703 ms
Total hot run time: 52401 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173173 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

query5	4316	642	485	485
query6	459	243	200	200
query7	4837	611	341	341
query8	335	197	178	178
query9	8768	4071	4058	4058
query10	453	349	305	305
query11	5965	2390	2162	2162
query12	160	104	99	99
query13	1272	601	457	457
query14	6258	5311	4970	4970
query14_1	4288	4267	4287	4267
query15	221	205	182	182
query16	1033	502	452	452
query17	919	698	569	569
query18	2442	471	341	341
query19	204	193	148	148
query20	109	109	104	104
query21	230	160	133	133
query22	13571	13566	13365	13365
query23	17449	16557	16096	16096
query23_1	16363	16524	16248	16248
query24	7854	1786	1308	1308
query24_1	1336	1370	1304	1304
query25	581	459	392	392
query26	1350	362	215	215
query27	2589	597	373	373
query28	4528	2066	2015	2015
query29	1104	627	491	491
query30	345	261	227	227
query31	1123	1098	979	979
query32	109	62	61	61
query33	557	334	272	272
query34	1186	1140	665	665
query35	761	796	692	692
query36	1412	1395	1206	1206
query37	169	120	93	93
query38	1892	1703	1664	1664
query39	936	910	907	907
query39_1	902	900	875	875
query40	244	171	146	146
query41	71	76	69	69
query42	100	100	98	98
query43	327	330	290	290
query44	1439	787	808	787
query45	207	206	185	185
query46	1090	1184	743	743
query47	2424	2341	2217	2217
query48	417	427	303	303
query49	610	429	325	325
query50	1027	437	344	344
query51	4453	4435	4382	4382
query52	86	87	79	79
query53	278	276	213	213
query54	316	249	232	232
query55	78	75	70	70
query56	324	320	319	319
query57	1447	1401	1316	1316
query58	302	284	269	269
query59	1591	1669	1407	1407
query60	321	280	284	280
query61	216	144	153	144
query62	699	648	587	587
query63	249	206	213	206
query64	2534	763	606	606
query65	4854	4766	4727	4727
query66	1817	513	393	393
query67	29658	29554	28832	28832
query68	3263	1582	1051	1051
query69	426	312	264	264
query70	1039	991	970	970
query71	365	330	332	330
query72	3058	2693	2358	2358
query73	891	799	446	446
query74	5121	4951	4757	4757
query75	2641	2581	2244	2244
query76	2333	1183	800	800
query77	363	372	279	279
query78	12549	12431	11783	11783
query79	1391	1158	793	793
query80	1112	552	447	447
query81	511	329	279	279
query82	554	156	121	121
query83	405	319	290	290
query84	330	172	134	134
query85	969	620	515	515
query86	414	315	287	287
query87	1836	1839	1766	1766
query88	3705	2871	2796	2796
query89	447	416	351	351
query90	1756	209	199	199
query91	203	195	164	164
query92	67	60	60	60
query93	1559	1486	1003	1003
query94	628	357	331	331
query95	800	503	528	503
query96	1052	806	342	342
query97	2683	2714	2544	2544
query98	223	204	232	204
query99	1188	1156	1029	1029
Total cold run time: 259737 ms
Total hot run time: 173173 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.25	0.13	0.14
query4	1.60	0.14	0.14
query5	0.24	0.26	0.23
query6	1.30	1.05	1.07
query7	0.04	0.01	0.00
query8	0.06	0.04	0.04
query9	0.38	0.31	0.31
query10	0.59	0.55	0.56
query11	0.20	0.15	0.14
query12	0.20	0.15	0.15
query13	0.47	0.48	0.47
query14	1.02	1.01	0.99
query15	0.60	0.59	0.61
query16	0.32	0.31	0.32
query17	1.12	1.10	1.09
query18	0.23	0.21	0.22
query19	2.03	1.93	1.98
query20	0.02	0.01	0.02
query21	15.46	0.22	0.13
query22	4.80	0.05	0.06
query23	16.08	0.31	0.12
query24	3.03	0.45	0.32
query25	0.11	0.06	0.05
query26	0.75	0.20	0.15
query27	0.04	0.04	0.03
query28	3.51	0.98	0.54
query29	12.47	4.30	3.46
query30	0.27	0.14	0.14
query31	2.78	0.61	0.32
query32	3.23	0.60	0.51
query33	3.28	3.20	3.16
query34	15.68	4.21	3.56
query35	3.53	3.51	3.54
query36	0.54	0.43	0.42
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.04	0.03	0.04
query40	0.18	0.16	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 96.85 s
Total hot run time: 25.25 s

### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: The Paimon catalog regression expected the duplicate-column diagnostic to use a lower-case column name, but the FE now preserves the original external column case and reports the duplicated column as ID. The Paimon JDBC catalog regression also treated output from a failed optional docker probe as a container name, which caused a malformed docker cp command when the spark-iceberg container was unavailable or the current user lacked docker permission. Update the expected duplicate-column message and make optional command failures return an empty result so the existing spark-iceberg availability check can skip the environment-dependent JDBC portion correctly.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and hive2HdfsPort=8320.
    - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_jdbc_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and enableJdbcTest=true; the case detected docker permission denial and skipped the spark-iceberg-dependent section as intended.
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants