Skip to content

[Enhancement](compaction) add information_schema.be_compaction_tasks system table#61428

Draft
Yukang-Lian wants to merge 4 commits intoapache:masterfrom
Yukang-Lian:feature/be-compaction-tasks-system-table
Draft

[Enhancement](compaction) add information_schema.be_compaction_tasks system table#61428
Yukang-Lian wants to merge 4 commits intoapache:masterfrom
Yukang-Lian:feature/be-compaction-tasks-system-table

Conversation

@Yukang-Lian
Copy link
Collaborator

Summary

  • Add a new system table information_schema.be_compaction_tasks that exposes compaction task metadata across all BEs, covering PENDING, RUNNING, FINISHED, and FAILED states with 30 columns including identification, timing, input/output stats, IO stats, and resource usage.
  • Introduce CompactionTaskTracker singleton to track compaction tasks across their full lifecycle, integrated at all 7 compaction entry points (local: base/cumu/full, single-replica, cold-data, manual HTTP; cloud: base/cumu/full, manual HTTP, index-change).
  • Support multi-BE fan-out query via BackendPartitionedSchemaScanNode, fallback records for missed registrations, and proper cleanup on early-return paths.

Usage Examples

-- View all running compaction tasks across all BEs
mysql> SELECT BACKEND_ID, TABLET_ID, COMPACTION_TYPE, STATUS, ELAPSED_TIME_MS, INPUT_DATA_SIZE
       FROM information_schema.be_compaction_tasks
       WHERE STATUS = 'RUNNING'
       ORDER BY ELAPSED_TIME_MS DESC;
+------------+-----------+-----------------+---------+-----------------+-----------------+
| BACKEND_ID | TABLET_ID | COMPACTION_TYPE | STATUS  | ELAPSED_TIME_MS | INPUT_DATA_SIZE |
+------------+-----------+-----------------+---------+-----------------+-----------------+
|      10001 |    123456 | base            | RUNNING |           35210 |       524288000 |
|      10002 |    789012 | cumulative      | RUNNING |            1250 |        10485760 |
+------------+-----------+-----------------+---------+-----------------+-----------------+

-- Find the slowest compactions (potential performance issues)
mysql> SELECT TABLET_ID, COMPACTION_TYPE, ELAPSED_TIME_MS, INPUT_DATA_SIZE, OUTPUT_DATA_SIZE,
              PEAK_MEMORY_BYTES, IS_VERTICAL, STATUS_MSG
       FROM information_schema.be_compaction_tasks
       WHERE STATUS IN ('FINISHED', 'FAILED')
       ORDER BY ELAPSED_TIME_MS DESC LIMIT 5;
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+
| TABLET_ID | COMPACTION_TYPE | ELAPSED_TIME_MS | INPUT_DATA_SIZE | OUTPUT_DATA_SIZE | PEAK_MEMORY_BYTES | IS_VERTICAL | STATUS_MSG                 |
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+
|    123456 | base            |           42000 |       524288000 |        210000000 |         268435456 |           1 |                            |
|    567890 | full            |            8500 |        30000000 |                0 |          33554432 |           0 | [INTERNAL_ERROR]disk full  |
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+

-- Check remote IO ratio (important for disaggregated storage)
mysql> SELECT TABLET_ID, COMPACTION_TYPE, BYTES_READ_FROM_LOCAL, BYTES_READ_FROM_REMOTE,
              ROUND(BYTES_READ_FROM_REMOTE * 100.0 / (BYTES_READ_FROM_LOCAL + BYTES_READ_FROM_REMOTE + 1), 2) AS remote_pct
       FROM information_schema.be_compaction_tasks
       WHERE STATUS = 'FINISHED' AND BYTES_READ_FROM_REMOTE > 0
       ORDER BY remote_pct DESC;
+-----------+-----------------+-----------------------+------------------------+------------+
| TABLET_ID | COMPACTION_TYPE | BYTES_READ_FROM_LOCAL | BYTES_READ_FROM_REMOTE | remote_pct |
+-----------+-----------------+-----------------------+------------------------+------------+
|    234567 | cumulative      |              10485760 |             104857600  |      90.91 |
|    345678 | base            |             524288000 |              52428800  |       9.09 |
+-----------+-----------------+-----------------------+------------------------+------------+

Full Schema (30 columns)

BACKEND_ID, COMPACTION_ID, TABLE_ID, PARTITION_ID, TABLET_ID,
COMPACTION_TYPE, STATUS, TRIGGER_METHOD, COMPACTION_SCORE,
SCHEDULED_TIME, START_TIME, END_TIME, ELAPSED_TIME_MS,
INPUT_ROWSETS_COUNT, INPUT_ROW_NUM, INPUT_DATA_SIZE, INPUT_SEGMENTS_NUM, INPUT_VERSION_RANGE,
MERGED_ROWS, FILTERED_ROWS, OUTPUT_ROW_NUM, OUTPUT_DATA_SIZE, OUTPUT_SEGMENTS_NUM, OUTPUT_VERSION,
BYTES_READ_FROM_LOCAL, BYTES_READ_FROM_REMOTE, PEAK_MEMORY_BYTES,
IS_VERTICAL, PERMITS, STATUS_MSG

Test plan

  • BE unit tests: 14 cases covering full lifecycle, failure paths, fallback records, concurrent safety, config changes, input_version_range backfill
  • Regression test: end-to-end SQL query validation with manual compaction trigger, field verification, filtering

closes #48893

…system table (apache#48893)

Add a new system table `be_compaction_tasks` in `information_schema` that
exposes compaction task metadata across all BEs, covering PENDING, RUNNING,
FINISHED, and FAILED states.

Key components:
- CompactionTaskTracker: singleton that tracks compaction tasks across their
  full lifecycle (PENDING -> RUNNING -> FINISHED/FAILED)
- SchemaCompactionTasksScanner: BE scanner that fills 30 columns including
  identification, timing, input/output stats, IO stats, and resource usage
- FE schema registration with BackendPartitionedSchemaScanNode for multi-BE
  fan-out

Tracker is integrated at all compaction entry points:
- Local: base/cumu/full, single-replica, cold-data, manual HTTP trigger
- Cloud: base/cumu/full, manual HTTP trigger, index-change

Closes apache#48893
@Thearas
Copy link
Contributor

Thearas commented Mar 17, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian Yukang-Lian marked this pull request as draft March 17, 2026 07:51
@Yukang-Lian Yukang-Lian force-pushed the feature/be-compaction-tasks-system-table branch from 8717ce1 to 5b67650 Compare March 17, 2026 08:04
…on_tasks regression test

Add SELECT * and explicit 30-column named SELECT to verify all columns are
queryable with reasonable values. Log each column for visual inspection.
Also add DESC test to verify schema has exactly 30 columns.
@Yukang-Lian Yukang-Lian force-pushed the feature/be-compaction-tasks-system-table branch from 5b67650 to e66a5ff Compare March 17, 2026 08:51
@Yukang-Lian Yukang-Lian marked this pull request as ready for review March 17, 2026 09:21
SCH_AUTHENTICATION_INTEGRATIONS took 67, bump SCH_BE_COMPACTION_TASKS to 68.
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1796/2269)
Line Coverage 64.44% (32267/50072)
Region Coverage 65.37% (16165/24727)
Branch Coverage 55.77% (8608/15434)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 27068 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e8d6bb12c2b24039b77925bd4591f17cd4a15684, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17625	4492	4571	4492
q2	q3	10706	773	539	539
q4	4733	363	245	245
q5	8052	1222	1031	1031
q6	232	175	145	145
q7	812	860	664	664
q8	10759	1494	1361	1361
q9	6435	4754	4763	4754
q10	6477	1923	1697	1697
q11	458	254	254	254
q12	777	588	466	466
q13	18099	2910	2181	2181
q14	233	247	206	206
q15	q16	731	745	681	681
q17	738	877	442	442
q18	6100	5361	5190	5190
q19	1129	991	613	613
q20	536	495	380	380
q21	4596	2042	1475	1475
q22	380	345	252	252
Total cold run time: 99608 ms
Total hot run time: 27068 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4773	4577	4638	4577
q2	q3	3898	4356	3815	3815
q4	918	1184	769	769
q5	4086	4401	4391	4391
q6	181	180	143	143
q7	1781	1653	1523	1523
q8	2660	2729	2586	2586
q9	7261	7399	7276	7276
q10	3867	3999	3643	3643
q11	531	444	455	444
q12	502	604	437	437
q13	2722	3204	2285	2285
q14	269	307	275	275
q15	q16	714	734	719	719
q17	1155	1470	1344	1344
q18	7230	6953	6692	6692
q19	938	881	1031	881
q20	2155	2221	1995	1995
q21	4001	3546	3296	3296
q22	477	418	368	368
Total cold run time: 50119 ms
Total hot run time: 47459 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 167854 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e8d6bb12c2b24039b77925bd4591f17cd4a15684, data reload: false

query5	4335	637	495	495
query6	334	228	206	206
query7	4206	461	269	269
query8	346	242	229	229
query9	8684	2764	2744	2744
query10	559	377	330	330
query11	7003	5168	4845	4845
query12	181	129	122	122
query13	1264	465	332	332
query14	5736	3671	3448	3448
query14_1	2787	2810	2798	2798
query15	205	190	170	170
query16	979	472	486	472
query17	1084	699	604	604
query18	2440	448	342	342
query19	219	240	178	178
query20	140	126	134	126
query21	211	135	107	107
query22	13286	13407	13176	13176
query23	15780	15478	15520	15478
query23_1	15884	15844	15710	15710
query24	7685	1683	1288	1288
query24_1	1265	1275	1265	1265
query25	625	537	484	484
query26	1413	293	172	172
query27	3273	518	312	312
query28	4556	1885	1873	1873
query29	915	558	478	478
query30	293	225	185	185
query31	1019	945	889	889
query32	83	73	72	72
query33	512	337	291	291
query34	923	882	522	522
query35	613	691	586	586
query36	1120	1137	913	913
query37	133	97	79	79
query38	2989	2948	2891	2891
query39	860	840	820	820
query39_1	800	788	803	788
query40	234	156	142	142
query41	64	59	61	59
query42	262	252	254	252
query43	238	256	233	233
query44	
query45	200	198	181	181
query46	883	968	603	603
query47	2132	2142	2057	2057
query48	305	317	235	235
query49	648	452	375	375
query50	691	280	218	218
query51	4150	3982	3987	3982
query52	264	272	258	258
query53	290	336	286	286
query54	309	280	291	280
query55	98	85	92	85
query56	325	325	322	322
query57	1953	1867	1557	1557
query58	289	279	278	278
query59	2786	2958	2759	2759
query60	356	344	335	335
query61	149	147	155	147
query62	646	601	550	550
query63	323	278	278	278
query64	5121	1282	1013	1013
query65	
query66	1464	456	380	380
query67	24242	24250	24132	24132
query68	
query69	431	318	284	284
query70	980	991	987	987
query71	349	312	304	304
query72	2839	2897	2599	2599
query73	548	547	323	323
query74	9648	9564	9408	9408
query75	2916	2773	2504	2504
query76	2382	1034	671	671
query77	393	424	316	316
query78	11111	11210	10438	10438
query79	1103	849	588	588
query80	880	706	588	588
query81	523	278	232	232
query82	1378	175	124	124
query83	361	278	260	260
query84	256	127	104	104
query85	1064	517	459	459
query86	391	308	300	300
query87	3224	3118	3065	3065
query88	3606	2662	2667	2662
query89	450	377	350	350
query90	1756	187	185	185
query91	175	159	142	142
query92	82	79	73	73
query93	980	886	513	513
query94	506	338	287	287
query95	604	350	391	350
query96	659	534	237	237
query97	2452	2469	2375	2375
query98	238	227	220	220
query99	992	985	927	927
Total cold run time: 250016 ms
Total hot run time: 167854 ms

@Yukang-Lian Yukang-Lian marked this pull request as draft March 17, 2026 12:46
Status res = Status::OK();
auto do_compact = [](Compaction& compaction) {
// Helper to register a compaction task as RUNNING in the tracker (direct execution, MANUAL trigger)
auto register_running_task = [&tablet](Compaction& compaction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress of compaction is not reflected.

The start time of compaction execution is not recorded.

In BE, can the container used to record compaction tasks be shared with the one used to record compaction profiles?

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.65% (19786/37577)
Line Coverage 36.20% (184744/510357)
Region Coverage 32.36% (142386/440023)
Branch Coverage 33.52% (62293/185863)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.12% (26896/36783)
Line Coverage 56.59% (287852/508643)
Region Coverage 53.97% (239660/444072)
Branch Coverage 55.61% (103625/186345)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] add a system table in information_schema named be_compaction_tasks which contains meta about compaction_tasks in be.

5 participants