feat: clustered segments + MSQ compaction by clintropolis · Pull Request #19597 · apache/druid

clintropolis · 2026-06-17T20:23:59Z

Description

Follow-up to #19579, this PR adds MSQ compaction support for clustered segments when using 'inline' or reindexing template based compaction.

changes:

CompactionTask now can specify baseTable spec to create clustered segments
DataSourceMSQDestination can now specify a baseTable so MSQ can generate clustered segments (or any other future baseTable spec)
adds baseTable to 'inline' and reindexing template compaction configs to feed to compaction task for auto-compaction
adds baseTable, segmentGranularitySpec to CompactionState, CompactionStatus is baseTable aware for checks
guards to prevent baseTable from working with 'native' compaction and direct towards MSQ compaction

changes: * `CompactionTask` now can specify `baseTable` spec to create clustered segments * `DataSourceMSQDestination` can now specify a `baseTable` so MSQ can generate clustered segments (or any other future baseTable spec) * adds `baseTable` to 'inline' and reindexing template compaction configs to feed to compaction task for auto-compaction * adds `baseTable`, `segmentGranularitySpec` to `CompactionState`, `CompactionStatus` is `baseTable` aware for checks * guards to prevent `baseTable` from working with 'native' compaction and direct towards MSQ compaction

capistrant

looks good. I re-read some of the paths before hitting submit here and I think it is a silent ignore of the other top level configs that base table now owns. which seems kind of sad but not destructive at least? I think it is worth forcing an operator to acknowledge what they are doing by cleaning up their config (or rules)

capistrant · 2026-06-17T21:55:43Z

+      }
      return CompactionConfigValidationResult.success();
    } else {
      return compactionConfigSupportedByMSQEngine(newConfig);


what about adding a baseTable != null block to this MSQ validation method that blows up if the config is trying to set the top level data schema configs that are now handled by the base table. fuzzy to me on if that situation would be handled okay (base table wins or blow up later on), but blowing up eagerly feels safer and will prompt operator to consciously modify the config (or rules?) to make it play nice

FrankChen021

Severity	Findings
P0	0
P1	2
P2	1
P3	0
Total	3

Reviewed 55 of 55 changed files.

Findings that could not be attached inline:

server/src/main/java/org/apache/druid/client/indexing/ClientCompactionRunnerInfo.java:127 - [P2] Validate range partitions against baseTable columns. The affected call is existing code made incomplete by the new baseTable config field. compactionConfigSupportedByMSQEngine still passes dimensionSchemas from newConfig.getDimensionsSpec(), so baseTable configs with no legacy dimensionsSpec skip range-partition column validation at config time. The same config is later validated against dataSchema.getDimensionsSpec() inside MSQCompactionRunner and can fail only after the coordinator submits tasks. Use newConfig.getBaseTable().getDimensionsSpec()/virtual columns when baseTable is present.

This is an automated review by Codex GPT-5.5

FrankChen021 · 2026-06-18T12:25:30Z

+                       .withDataSource(destination.getDataSource())
+                       .withTimestamp(new TimestampSpec(ColumnHolder.TIME_COLUMN_NAME, "millis", null))
+                       .withTransform(transformSpec)
+                       .withBaseTable(baseTable)


[P1] Carry projections into base-table segment generation

The baseTable branch builds the worker DataSchema without withProjections(destination.getProjections()), unlike the legacy branch. MSQCompactionRunner passes projections through DataSourceMSQDestination, but they are dropped before segment generation, so baseTable compaction configured with projections writes segments without them and stores lastCompactionState.projections as null; CompactionStatus then keeps scheduling the same compaction. Add the projections to this builder path and cover baseTable+projections.

oh this is a good catch.

FrankChen021 · 2026-06-18T12:25:30Z

+  {
+    if (queryGranularity == null
+        || Granularities.NONE.equals(queryGranularity)
+        || Granularities.ALL.equals(queryGranularity)


[P1] Do not erase ALL query granularity

Treating Granularities.ALL as a no-op makes baseTable compaction report an effective query granularity of NONE. That bypasses MSQCompactionRunner's existing ALL handling, which assigns rows to the interval start timestamp, so queryGranularity=ALL produces incorrectly timestamped clustered segments; the later status check also compares configured ALL to stored NONE and can recompact forever. Preserve or reject ALL explicitly instead of silently normalizing it away.

I think rejecting ALL with demand to use some other granularity makes more sense than going through effort to try and support it in granularity virtual column (if that is even possible).

Frank's comments seem legit. pulling approval until further discussion to avoid early merge by someone who comes along and sees approved green pr and merges for some reason

github-actions Bot added Area - Batch Ingestion Area - Querying Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jun 17, 2026

builder

9997e4d

capistrant previously approved these changes Jun 17, 2026

View reviewed changes

FrankChen021 reviewed Jun 18, 2026

View reviewed changes

capistrant self-requested a review June 18, 2026 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: clustered segments + MSQ compaction#19597

feat: clustered segments + MSQ compaction#19597
clintropolis wants to merge 2 commits into
apache:masterfrom
clintropolis:clustered-segment-compaction

clintropolis commented Jun 17, 2026

Uh oh!

capistrant left a comment

Uh oh!

capistrant Jun 17, 2026

Uh oh!

FrankChen021 left a comment

Uh oh!

FrankChen021 Jun 18, 2026

Uh oh!

capistrant Jun 18, 2026

Uh oh!

FrankChen021 Jun 18, 2026

Uh oh!

capistrant Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Jun 17, 2026

Description

Uh oh!

capistrant left a comment

Choose a reason for hiding this comment

Uh oh!

capistrant Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

capistrant Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

capistrant Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants