Skip to content

[fix](binlog) Support hidden key columns in row binlog#65076

Open
seawinde wants to merge 1 commit into
apache:masterfrom
seawinde:fix-row-binlog-hidden-key-master
Open

[fix](binlog) Support hidden key columns in row binlog#65076
seawinde wants to merge 1 commit into
apache:masterfrom
seawinde:fix-row-binlog-hidden-key-master

Conversation

@seawinde

@seawinde seawinde commented Jul 1, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: #63110

Problem Summary:

Row binlog schema and the BE row-binlog writer previously treated normal row-binlog columns as visible source columns only. This misses hidden key columns, while hidden non-key internal columns such as sequence/delete/version/skip-bitmap columns should still be excluded from row binlog.

Root cause: In OlapTable.generateTableRowBinlogSchema() and SchemaChangeHandler.addColumnRowBinlog(), FE generated and maintained row-binlog schema from visible columns only. In RowBinlogSegmentWriter / RowBinlogSourceDataWriter, BE also used visible-column counts to map source columns to row-binlog normal columns.

This PR makes row-binlog normal columns follow a simple source-schema prefix contract: visible columns plus hidden key columns are written, and trailing hidden non-key columns are skipped. FE schema generation and add-column schema-change sync now use this contract. BE row-binlog writing uses the same normal-column count for full writes, partial update filtering, key column materialization, and BEFORE value columns.

File Change Description
OlapTable.java Generate row-binlog schema from full base schema, include hidden key columns, skip hidden non-key columns, and reject visible/key columns after hidden non-key columns.
SchemaChangeHandler.java Keep hidden key columns when syncing ADD COLUMN changes to row-binlog schema, while skipping hidden non-key columns.
row_binlog_segment_writer.* Use row-binlog normal column count instead of visible column count, filter partial-update source cids to normal columns, and collect all key columns in the normal prefix.
FE/BE/regression tests Cover hidden key schema generation/writing and hidden non-key exclusion.
graph TD
  A[Source tablet schema] --> B[FE row-binlog schema]
  B -->|visible columns + hidden key columns| C[Row-binlog normal columns]
  B -->|skip trailing hidden non-key columns| D[Internal hidden columns]
  C --> E[BE RowBinlogSourceDataWriter]
  E --> F[RowBinlogSegmentWriter writes row-binlog segment]
Loading

Release note

Fixed an issue where row binlog did not include hidden key columns in the row-binlog schema and write path.

Check List (For Author)

  • Test

    • Regression test
      • ./run-regression-test.sh --run -d row_binlog_p0 -s test_row_binlog_hidden_column_schema -forceGenOut
    • Unit Test
      • ./run-fe-ut.sh --run org.apache.doris.catalog.OlapTableRowBinlogSchemaTest,org.apache.doris.alter.SchemaChangeHandlerTest
      • PATH=/home/seawinde/apache-maven-3.9.12/bin:$PATH ./run-be-ut.sh --run --filter=RowBinlogSourceDataWriterTest.* -j20
    • Manual test
      • ./build-support/check-format.sh
      • git diff --check
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. Row-binlog now includes hidden key columns, while hidden non-key internal columns remain excluded.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Row binlog schema and BE writer previously treated normal row-binlog columns as visible source columns only. Tables that contain hidden key columns need those key columns to be included in row-binlog normal columns while still skipping hidden non-key internal columns. This change makes FE row-binlog schema generation and schema-change sync include visible columns plus hidden key columns, and updates BE row-binlog writing to use the same source-schema prefix contract.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.catalog.OlapTableRowBinlogSchemaTest,org.apache.doris.alter.SchemaChangeHandlerTest
    - ./build-support/check-format.sh
    - git diff --check
- Behavior changed: No
- Does this need documentation: No
@seawinde

seawinde commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29883 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 79e8ec50c0afe041a16176653ef9e3976d73bcf4, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17750	4130	4075	4075
q2	2051	314	206	206
q3	10244	1438	850	850
q4	4679	471	340	340
q5	7519	875	584	584
q6	195	181	143	143
q7	782	850	626	626
q8	9741	1696	1632	1632
q9	6047	4441	4379	4379
q10	6827	1787	1534	1534
q11	525	348	316	316
q12	741	575	449	449
q13	18139	3318	2767	2767
q14	277	263	241	241
q15	q16	794	772	712	712
q17	1121	1109	987	987
q18	6986	5741	5617	5617
q19	1497	1314	1066	1066
q20	780	665	536	536
q21	5952	2690	2519	2519
q22	452	382	304	304
Total cold run time: 103099 ms
Total hot run time: 29883 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4422	4328	4336	4328
q2	302	314	224	224
q3	4603	4928	4442	4442
q4	2093	2200	1438	1438
q5	4450	4353	4373	4353
q6	236	180	133	133
q7	2270	1877	1625	1625
q8	2585	2180	2141	2141
q9	7882	7805	7999	7805
q10	4810	4875	4398	4398
q11	630	439	396	396
q12	759	754	554	554
q13	3227	3520	3021	3021
q14	310	307	310	307
q15	q16	710	726	661	661
q17	1412	1380	1394	1380
q18	7862	7417	6967	6967
q19	1115	1097	1127	1097
q20	2225	2212	1947	1947
q21	5371	4661	4518	4518
q22	523	459	406	406
Total cold run time: 57797 ms
Total hot run time: 52141 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174988 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 79e8ec50c0afe041a16176653ef9e3976d73bcf4, data reload: false

query5	4301	648	495	495
query6	476	213	205	205
query7	4862	624	339	339
query8	349	191	196	191
query9	8766	4148	4116	4116
query10	460	353	302	302
query11	5947	2335	2170	2170
query12	158	102	109	102
query13	1261	637	442	442
query14	6602	5323	5012	5012
query14_1	4339	4338	4350	4338
query15	234	210	182	182
query16	1063	481	413	413
query17	1146	727	602	602
query18	2724	488	362	362
query19	216	197	156	156
query20	118	109	109	109
query21	234	167	139	139
query22	13699	13615	13387	13387
query23	17501	16719	16384	16384
query23_1	16416	16236	16220	16220
query24	7501	1812	1325	1325
query24_1	1350	1340	1330	1330
query25	575	477	402	402
query26	1336	343	209	209
query27	2612	627	393	393
query28	4435	2084	2049	2049
query29	1079	639	515	515
query30	349	270	226	226
query31	1142	1095	998	998
query32	102	65	64	64
query33	540	339	278	278
query34	1198	1147	672	672
query35	779	788	679	679
query36	1370	1400	1217	1217
query37	160	111	96	96
query38	1883	1724	1675	1675
query39	948	918	903	903
query39_1	900	871	874	871
query40	240	174	145	145
query41	67	63	63	63
query42	96	94	95	94
query43	328	330	282	282
query44	1478	795	787	787
query45	203	182	179	179
query46	1094	1220	774	774
query47	2296	2295	2190	2190
query48	407	435	291	291
query49	596	423	321	321
query50	1152	454	338	338
query51	4443	4389	4282	4282
query52	86	89	74	74
query53	271	294	204	204
query54	278	236	238	236
query55	74	72	69	69
query56	298	279	286	279
query57	1454	1417	1339	1339
query58	268	275	266	266
query59	1535	1604	1446	1446
query60	298	273	266	266
query61	155	143	150	143
query62	708	652	587	587
query63	249	210	212	210
query64	2509	783	614	614
query65	4868	4825	4810	4810
query66	1783	512	432	432
query67	29911	29750	29679	29679
query68	3548	1546	1013	1013
query69	418	317	275	275
query70	1041	991	989	989
query71	369	325	318	318
query72	2971	2684	2308	2308
query73	847	753	450	450
query74	5126	4991	4758	4758
query75	2626	2612	2246	2246
query76	2349	1217	819	819
query77	366	390	299	299
query78	12415	12581	11910	11910
query79	1449	1208	814	814
query80	1298	559	498	498
query81	521	327	282	282
query82	615	158	120	120
query83	373	325	299	299
query84	286	159	140	140
query85	971	620	525	525
query86	442	297	296	296
query87	1876	1841	1781	1781
query88	3843	2847	2838	2838
query89	462	417	354	354
query90	1919	212	205	205
query91	208	192	161	161
query92	65	60	61	60
query93	1798	1652	1093	1093
query94	763	367	322	322
query95	816	511	479	479
query96	1104	847	346	346
query97	2710	2706	2563	2563
query98	221	211	203	203
query99	1157	1160	1043	1043
Total cold run time: 261744 ms
Total hot run time: 174988 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.39 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 79e8ec50c0afe041a16176653ef9e3976d73bcf4, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.06	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.14
query5	0.27	0.22	0.23
query6	1.23	1.07	1.07
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.33	0.32
query10	0.58	0.57	0.62
query11	0.19	0.15	0.14
query12	0.19	0.15	0.15
query13	0.49	0.49	0.48
query14	1.03	1.02	1.01
query15	0.62	0.59	0.61
query16	0.32	0.32	0.32
query17	1.08	1.13	1.09
query18	0.24	0.21	0.22
query19	2.08	1.97	1.99
query20	0.02	0.01	0.01
query21	15.45	0.20	0.13
query22	4.95	0.05	0.05
query23	16.16	0.32	0.12
query24	2.88	0.43	0.34
query25	0.12	0.05	0.03
query26	0.74	0.21	0.15
query27	0.05	0.03	0.04
query28	3.52	0.94	0.53
query29	12.50	4.42	3.48
query30	0.27	0.16	0.16
query31	2.77	0.58	0.31
query32	3.23	0.60	0.49
query33	3.31	3.25	3.18
query34	15.65	4.20	3.54
query35	3.54	3.55	3.52
query36	0.55	0.44	0.43
query37	0.08	0.07	0.07
query38	0.05	0.04	0.03
query39	0.04	0.03	0.03
query40	0.18	0.16	0.14
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 97.01 s
Total hot run time: 25.39 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants