Add skipSecondaryIndexes option to skip building secondary indexes#18761
Add skipSecondaryIndexes option to skip building secondary indexes#18761deepthi912 wants to merge 9 commits into
Conversation
6a95f14 to
43ec9f5
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18761 +/- ##
============================================
+ Coverage 64.76% 64.77% +0.01%
Complexity 1309 1309
============================================
Files 3380 3381 +1
Lines 209573 209955 +382
Branches 32805 32888 +83
============================================
+ Hits 135735 136008 +273
- Misses 62914 62985 +71
- Partials 10924 10962 +38
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Drops the new IndexingConfig.skipSecondaryIndexes flag in favor of the existing tableIndexConfig.skipSegmentPreprocess. The reload override mechanism that was on the new flag (setOverrideSkipSecondaryIndexes) moves onto the existing flag (setOverrideSkipSegmentPreprocess). All four reload entry points in BaseTableDataManager and the four minion-build sites (SegmentProcessorFramework, UpsertCompactionTaskExecutor, RefreshSegmentTaskExecutor, MaterializedViewTaskExecutor) use the renamed override. Net effect end-to-end is unchanged from the prior commit set: server-boot load honors skipSegmentPreprocess and ships forward-only; reload paths (manual /reload, AsyncIndexReloader on the STP side, segment-refresh from minion) flip the override on the transient IndexLoadingConfig and run full preprocess including secondary indexes; minion segment rebuilds set the override on SegmentGeneratorConfig so the freshly-built segment carries all secondary indexes regardless of the flag. Reverts the per-handler gating in SegmentPreProcessor and the new test methods in SegmentPreProcessorTest, since skipSegmentPreprocess already short-circuits the entire preprocess via ImmutableSegmentLoader.needPreprocess().
Drops the four explicit setOverrideSkipSegmentPreprocess(true) calls from the public reload entry points and replaces them with one setter at the top of each of the two shared protected workers: - BaseTableDataManager.reloadSegment(SegmentDataManager, IndexLoadingConfig, boolean) — funnels reloadSegment / reloadAllSegments / reloadSegments - BaseTableDataManager.replaceSegmentIfCrcMismatch(SegmentDataManager, SegmentZKMetadata, IndexLoadingConfig) — funnels doReplaceSegment / minion SegmentRefreshMessage The reload code path itself now implicitly ignores tableIndexConfig.skipSegmentPreprocess — callers don't need to remember to set anything, and a stray future reload entry point can't accidentally honor the skip flag. Boot / state-transition loads continue to use plain fetchIndexLoadingConfig() and respect the persisted flag.
…ocess method
Drops the centralization that made every reload silently ignore
skipSegmentPreprocess. Existing reload methods (reloadSegment, reloadAllSegments,
reloadSegments, doReplaceSegment) restore upstream behavior and honor the flag.
Adds a new explicit method instead:
TableDataManager.reloadSegmentBypassingSkipPreprocess(String, boolean, String)
InstanceDataManager.reloadSegmentBypassingSkipPreprocess(String, String, boolean, String)
Implementations:
BaseTableDataManager — fetches IndexLoadingConfig, sets the per-call override,
then delegates to the existing protected reloadSegment worker.
HelixInstanceDataManager — looks up the table data manager and delegates.
Default interface methods delegate to the upstream reloadSegment so existing
custom implementations keep working without recompile. Callers that need to
materialize secondary indexes despite the persisted flag (AsyncIndexReloader's
TierSegmentPublishResource) call the new method explicitly.
…kipPreprocess method" This reverts commit 68de49b.
Drops the build-time changes that made new segments forward-only at creation:
- BaseSegmentCreator (per-column / post-creation gating)
- SegmentGeneratorConfig (build-time override field + accessors)
- SegmentProcessorFramework + 3 minion executors (which all set the build-time override)
The PR now contains only the two essential files for the runtime reload override:
- IndexLoadingConfig: transient override field + setter + override-aware
isSkipSegmentPreprocess()
- BaseTableDataManager: two inline setter calls — one in the protected
reloadSegment worker (funneling reloadSegment/reloadAllSegments/reloadSegments)
and one in replaceSegmentIfCrcMismatch (funneling SegmentRefreshMessage)
skipSegmentPreprocess remains a load-time-only knob in stock Pinot. New segments
get built with whatever indexes the table config asks for. Server boot honors
the flag. Any reload (manual /reload, AsyncIndexReloader, minion refresh)
ignores the flag for the duration of that single reload's IndexLoadingConfig
instance — letting secondary indexes get materialized when the user actually
asks for a reload, without persisting any config change.
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
| // is discarded when the reload finishes. Centralizing it here means every public reload entry point | ||
| // (reloadSegment / reloadAllSegments / reloadSegments) gets the reload semantics for free; boot / | ||
| // state-transition loads continue to use a plain fetchIndexLoadingConfig() and honor skipSegmentPreprocess. | ||
| indexLoadingConfig.setOverrideSkipSegmentPreprocess(true); |
There was a problem hiding this comment.
This changes the contract of the existing tableIndexConfig.skipSegmentPreprocess flag for every reload path, not just the async index-materialization flow. Today a table that sets this flag can rely on reloads honoring the persisted forward-only load policy; after this override, any manual/controller reload will silently build all secondary indexes instead. That is a behavior change for an existing config with real CPU/disk blast radius for current deployments. Please gate the bypass behind a new explicit reload API/flag instead of unconditionally overriding the persisted setting here.
There was a problem hiding this comment.
Yes, will going to do that! Ack
Summary
Adds a new table-level config
tableIndexConfig.skipSecondaryIndexes(defaultfalse). When enabled:New segments are built with only the forward index.
BaseSegmentCreatorskips every non-forward index creator during the per-row build loop and short-circuitsupdatePostSegmentCreationIndexes(so star-tree is also skipped). Dictionary is still written for dict-encoded columns because the forward index needs it to resolve dict-ids.Existing segments load as-is and preprocess does not add/remove secondary indexes.
SegmentPreProcessor.process()runs only theForwardIndexHandler(still allows forward-encoding upgrades) and default-column materialization; it skips the loop over all other index handlers, plus star-tree and multi-col text. Min/max generation is also skipped — it is purely a query-time pruning hint.needProcess()mirrors the same skips so preprocess isn't triggered just to add a secondary index. The on-disk loader (PhysicalColumnIndexContainer) is intentionally untouched, so segments with pre-existing secondary indexes keep serving them.Motivation
Useful for workloads that don't need secondary indexes during initial ingest so the ingestion and segment building is faster(or for an async "build indexes later" flow where the flag is later flipped off and a reload runs the missing handlers).
Config
{ "tableIndexConfig": { "skipSecondaryIndexes": true } }Reload / minion override —
setOverrideSkipSecondaryIndexesskipSecondaryIndexesis meant to govern initial server-side load and fresh ingestion only. Reload paths and minion-driven segment rebuilds need to materialize secondary indexes synchronously against the latest table config, even when the persisted flag istrue. To avoid forcing users to flip the table config back and forth, bothIndexLoadingConfigandSegmentGeneratorConfigcarry a transientsetOverrideSkipSecondaryIndexes(true)flag that suppresses the skip just for the in-flight operation. The override lives on the per-call config instance, is never persisted, and is automatically applied at these entry points:BaseTableDataManager—reloadSegment,reloadSegments,reloadAllSegments, anddoReplaceSegment(segment-refresh fromSegmentRefreshMessage). SoPOST /segments/{table}/reload, any controller-triggered reload, and minionSegmentRefreshMessageall build secondary indexes synchronously.SegmentProcessorFramework(used byMergeRollupTask,RealtimeToOfflineSegmentsTask,UpsertCompactMergeTask),UpsertCompactionTaskExecutor,RefreshSegmentTaskExecutor, andMaterializedViewTaskExecutorflip the override on theSegmentGeneratorConfigthey hand to the build driver, so the rebuilt segment carries all secondary indexes.Net effect: the only path that actually honors
skipSecondaryIndexes=trueend-to-end is the Helix state-transition (boot / segment first-load) and the initial offline/realtime build. Any subsequent reload or minion rebuild produces a fully-indexed segment.