Skip to content

[refine](column) Make filter_by_selector const#65047

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:refine-column-filter-selector-const
Open

[refine](column) Make filter_by_selector const#65047
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:refine-column-filter-selector-const

Conversation

@Mryange

@Mryange Mryange commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

filter_by_selector reads from the source column and writes selected rows into a destination column, so the source column should be accessed through a const interface. Root cause: the interface was non-const, and ColumnNullable::filter_by_selector wrote the destination nested column through a const_cast on the destination nullable internals. This change makes the interface const, updates the supported column implementations, removes the nullable const_cast, and keeps ColumnDictionary read-only by using a local temporary StringRef buffer.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange

Mryange commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: I reviewed the full PR diff and surrounding lazy-materialization selector path for the column filter_by_selector const-correctness change. I did not find a blocking correctness issue or any evidence-backed inline finding to submit.

Critical checkpoint conclusions:

  • Goal and proof: the PR makes the source-side filter_by_selector interface const, updates the supported vector/decimal/string/dictionary/nullable implementations, removes the dictionary mutable scratch state, and updates BE unit tests to call through const source references.
  • Scope: the change is small and focused on the selector-filter interface and directly related tests.
  • Concurrency/lifecycle: removing ColumnDictI32::_strings avoids per-column mutable scratch during selector filtering; I found no new shared lifecycle or locking requirement.
  • Config/session/compatibility: no new config, session variable, serialization format, or FE/BE protocol surface is introduced.
  • Parallel paths: all visible supported filter_by_selector overrides in the changed column classes match the new const base signature; unsupported columns still use the base error path. The production caller in SegmentIterator obtains mutable or fresh destination columns before selector writes, including nullable nested-column writes.
  • Tests and validation: related BE unit tests were updated for const dispatch and nullable/dictionary/string behavior. git diff --check 257417b029595580f6632b2e0fcf5e5ac3ccdb7d d71a5bb866b6163d9838ca4cbfcc057dc3133c26 is clean. I did not run BE UTs/build because this runner lacks thirdparty/installed and compile_commands.json. build-support/check-format.sh was attempted but failed before checking files because clang-format on PATH is version 18 while the script requires version 16.

User focus: no additional review focus was provided.

Subagent conclusions: the optimizer/rewrite subagent found no candidates. The tests/session-config subagent found no candidates. In convergence round 1, both live subagents reviewed the final ledger/comment set with zero proposed inline comments and replied NO_NEW_VALUABLE_FINDINGS.

@Mryange

Mryange commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29609 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d71a5bb866b6163d9838ca4cbfcc057dc3133c26, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17687	3964	3924	3924
q2	1998	321	193	193
q3	10332	1413	826	826
q4	4700	470	334	334
q5	7638	866	584	584
q6	199	164	134	134
q7	789	818	630	630
q8	10227	1611	1575	1575
q9	6063	4361	4403	4361
q10	6827	1817	1513	1513
q11	533	352	317	317
q12	749	551	429	429
q13	18133	3379	2753	2753
q14	263	265	243	243
q15	q16	792	781	708	708
q17	1113	993	996	993
q18	6854	5708	5684	5684
q19	1323	1259	1027	1027
q20	775	656	547	547
q21	5725	2698	2528	2528
q22	428	364	306	306
Total cold run time: 103148 ms
Total hot run time: 29609 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4325	4249	4246	4246
q2	291	315	227	227
q3	4534	4970	4427	4427
q4	2052	2122	1378	1378
q5	4432	4293	4305	4293
q6	234	175	132	132
q7	1766	2080	1569	1569
q8	2474	2163	2086	2086
q9	7882	7806	7790	7790
q10	5025	4722	4307	4307
q11	563	424	381	381
q12	747	864	646	646
q13	3294	3559	2982	2982
q14	299	304	267	267
q15	q16	751	753	628	628
q17	1358	1348	1330	1330
q18	8030	7385	6840	6840
q19	1113	1100	1117	1100
q20	2223	2221	1930	1930
q21	5223	4505	4446	4446
q22	529	468	441	441
Total cold run time: 57145 ms
Total hot run time: 51446 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174585 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d71a5bb866b6163d9838ca4cbfcc057dc3133c26, data reload: false

query5	4322	653	510	510
query6	472	222	196	196
query7	4829	611	344	344
query8	341	191	167	167
query9	8750	4043	4028	4028
query10	451	348	292	292
query11	5880	2378	2151	2151
query12	162	104	103	103
query13	1353	620	429	429
query14	6267	5317	4987	4987
query14_1	4315	4291	4308	4291
query15	215	203	186	186
query16	1049	498	435	435
query17	1131	741	585	585
query18	2505	483	356	356
query19	213	197	159	159
query20	116	114	106	106
query21	235	154	140	140
query22	13731	13776	13477	13477
query23	17454	16662	16114	16114
query23_1	16388	16378	16384	16378
query24	7525	1771	1312	1312
query24_1	1337	1321	1311	1311
query25	582	468	404	404
query26	1336	341	207	207
query27	2591	593	379	379
query28	4456	2067	2048	2048
query29	1107	601	502	502
query30	343	264	229	229
query31	1105	1108	1010	1010
query32	113	58	59	58
query33	515	310	258	258
query34	1172	1161	666	666
query35	795	794	658	658
query36	1397	1398	1201	1201
query37	154	107	83	83
query38	1910	1735	1657	1657
query39	934	908	899	899
query39_1	859	907	884	884
query40	239	161	140	140
query41	67	64	63	63
query42	94	92	93	92
query43	314	330	285	285
query44	1448	800	785	785
query45	204	187	177	177
query46	1121	1227	737	737
query47	2395	2331	2228	2228
query48	400	392	300	300
query49	581	412	319	319
query50	1083	420	343	343
query51	4439	4427	4378	4378
query52	86	86	76	76
query53	265	274	207	207
query54	269	226	214	214
query55	79	78	94	78
query56	285	297	283	283
query57	1428	1392	1293	1293
query58	286	250	251	250
query59	1584	1694	1449	1449
query60	293	276	265	265
query61	155	152	154	152
query62	714	634	577	577
query63	243	208	207	207
query64	2518	772	620	620
query65	4868	4811	4750	4750
query66	1799	498	381	381
query67	29842	29876	29665	29665
query68	3277	1644	1014	1014
query69	401	302	271	271
query70	1085	993	986	986
query71	363	328	303	303
query72	2922	2663	2344	2344
query73	828	823	439	439
query74	5156	5019	4752	4752
query75	2619	2613	2236	2236
query76	2326	1191	823	823
query77	362	375	284	284
query78	12416	12539	11997	11997
query79	1431	1194	757	757
query80	828	546	473	473
query81	499	322	284	284
query82	559	157	122	122
query83	388	320	308	308
query84	282	161	130	130
query85	957	610	504	504
query86	394	285	294	285
query87	1850	1831	1785	1785
query88	3735	2827	2809	2809
query89	461	413	353	353
query90	1789	192	201	192
query91	201	193	163	163
query92	63	62	57	57
query93	1624	1460	944	944
query94	627	362	334	334
query95	778	597	467	467
query96	1066	768	365	365
query97	2724	2694	2573	2573
query98	214	202	200	200
query99	1182	1149	1015	1015
Total cold run time: 259204 ms
Total hot run time: 174585 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d71a5bb866b6163d9838ca4cbfcc057dc3133c26, data reload: false

query1	0.01	0.01	0.00
query2	0.09	0.05	0.05
query3	0.24	0.14	0.13
query4	1.61	0.13	0.13
query5	0.24	0.23	0.22
query6	1.26	1.08	1.03
query7	0.03	0.01	0.00
query8	0.06	0.03	0.03
query9	0.37	0.32	0.32
query10	0.55	0.54	0.53
query11	0.19	0.14	0.13
query12	0.18	0.14	0.15
query13	0.47	0.48	0.46
query14	1.00	1.01	1.02
query15	0.65	0.59	0.61
query16	0.30	0.32	0.30
query17	1.11	1.12	1.10
query18	0.23	0.21	0.21
query19	2.06	1.92	1.89
query20	0.02	0.01	0.01
query21	15.47	0.23	0.12
query22	4.87	0.06	0.05
query23	16.14	0.32	0.12
query24	2.94	0.40	0.33
query25	0.11	0.06	0.04
query26	0.73	0.20	0.16
query27	0.05	0.04	0.03
query28	3.51	0.91	0.52
query29	12.50	4.30	3.46
query30	0.28	0.15	0.17
query31	2.78	0.62	0.31
query32	3.21	0.61	0.48
query33	3.24	3.19	3.22
query34	15.63	4.18	3.50
query35	3.56	3.49	3.47
query36	0.54	0.44	0.42
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.04	0.03
query40	0.19	0.16	0.16
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 96.77 s
Total hot run time: 25 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 81.25% (13/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.12% (21728/39420)
Line Coverage 38.60% (207978/538822)
Region Coverage 34.67% (163853/472629)
Branch Coverage 35.68% (71834/201323)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.50% (14/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.23% (28563/38479)
Line Coverage 58.14% (311416/535632)
Region Coverage 54.88% (260535/474735)
Branch Coverage 56.25% (113390/201566)

@Mryange

Mryange commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

run cloud_p0

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.50% (14/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.23% (28563/38479)
Line Coverage 58.14% (311416/535632)
Region Coverage 54.88% (260535/474735)
Branch Coverage 56.25% (113390/201566)

@Mryange

Mryange commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

run cloud_p0

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.50% (14/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.23% (28563/38479)
Line Coverage 58.14% (311416/535632)
Region Coverage 54.88% (260535/474735)
Branch Coverage 56.25% (113390/201566)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants