Antalya 26.3 Hybrid: added support for segment pruning by mkmkme · Pull Request #1788 · Altinity/ClickHouse

mkmkme · 2026-05-14T09:26:32Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Hybrid tables now can ignore the segments that can't be reached by the combination of WHERE and segment predicate

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

github-actions · 2026-05-14T09:27:37Z

Workflow [PR], commit [00ca0b4]

arthurpassos

I can't help much here tbh

arthurpassos · 2026-05-14T15:25:47Z

+
+        if (col_ranges.empty())


Why is it that when there are no ranges you can actually prune the segment?

col_ranges being empty at this stage (after we proved that extractPlainRangesForColumn means that the segment's own predicate projected on the column i has no values.

mkmkme · 2026-05-15T06:29:54Z

the tests failing seems unrelated to my changes.

alsugiliazova · 2026-05-15T13:02:15Z

Verification: PR #1788

PR-added tests — all GREEN

03646_hybrid_segment_pruning — 6/6 OK across every stateless job that picked it up:

Job	Status
Stateless tests (amd_asan, distributed plan, parallel, 4/4)	OK
Stateless tests (amd_debug, distributed plan, s3 storage, parallel)	OK
Stateless tests (amd_debug, parallel)	OK
Stateless tests (arm_asan, azure, parallel, 4/4)	OK
Stateless tests (arm_asan, targeted)	OK
Stateless tests (arm_binary, parallel)	OK
Fast test	SKIPPED (expected, `no-fasttest`)

The new gtest gtest_hybrid_segment_pruner is built into the unit-tests binary; the unit-tests job is green.

The Hybrid segment-pruning path has clean positive coverage at both unit and stateless levels.

CI overview (head commit)

PR test workflow: 46 success / 48 skipped / 3 failure
Regression workflow: 28 success / 69 skipped / 3 failure (chronic baseline; actually one fewer suite than typical)

PR test-workflow failures

Check	Verdict
`Integration tests (amd_asan, db disk, old analyzer, 1/6)` — 9 × `test_storage_kafka/test_batch_fast.py::*` + `Timeout`	Job-level timeout cascade. The Kafka batch-fast subset failed together and the job reported `Timeout`. The shared `Timeout` symptom has 51 fails across 9 PRs in last 30d — recurring integration-job timeout, not PR-caused. PR diff is in Hybrid/Distributed, not Kafka.
`Stateless tests (arm_asan, azure, sequential, 1/2)` — `03760_backup_tar_archive`	Pre-existing flake (18 fails / 9 PRs in 30d). Unrelated to segment pruning.
`Stress test (arm_asan, s3)` — `Cannot start clickhouse-server`	Stress infra flake (39 fails / 15 PRs in 30d). Server fails to come up in stress harness; not a product-test FAIL.

Regression-workflow failures (chronic baseline on `antalya-26.3`)

Suite	Fails
Swarms (Aarch64 + Release)	230
Parquet (Aarch64 + Release)	18
S3Export `part` (Aarch64 + Release)	14

Same fingerprint as sibling antalya-26.3 PRs (1776, 1782, 1783, 1775, 1773, …). On this run S3Export partition was actually green; no new failure modes introduced.

Caveat — partial frontport

PR lands on antalya-26.3 while companion features from antalya-26.1 are still being frontported in parallel. Final re-verify recommended once the rest of the bundle lands.

Verdict

Safe to merge.

alsugiliazova · 2026-05-15T13:24:12Z

Audit: PR #1788 — Antalya 26.3 Hybrid: added support for segment pruning

AI audit note: This review comment was generated by AI (Cursor agent, audit-review skill).

Scope reviewed

Head acccd3f9 against base antalya-26.3. Feature scope: new HybridSegmentPruner (range-based unsat detection over the Hybrid identity key), extension of KeyCondition::extractPlainRangesForColumn for multi-column projection, integration into StorageDistributed::{getStorageSnapshot, getQueryProcessingStage, computeHybridPruningVerdict, substituteHybridWatermarks, getHybridWatermarkSnapshot, read}, new local-plan code path that bypasses ClusterProxy::executeQuery when the base segment is pruned but additional segments survive, and the new gtest plus the SQL test 03646_hybrid_segment_pruning. Method: static analysis (call graph + transition matrix + logical fault injection). No local build or test execution.

Confirmed defects

Low: Misleading parameter comment in createLocalPlan call

Impact: Documentation drift; future maintainers will be misled about which createLocalPlan parameter they are setting. Runtime behavior is correct because false happens to be the intended value of the actual parameter.
Anchor: src/Storages/StorageDistributed.cpp — StorageDistributed::read, the new cluster.empty() && !additional_query_infos.empty() block (around lines 1556-1558).
Trigger: Code review of the new local-plan code path.
Why defect: The 7th positional argument of createLocalPlan is bool build_logical_plan (see src/Processors/QueryPlan/DistributedCreateLocalPlan.h), but the call labels it /*has_missing_objects=*/false. There is no has_missing_objects parameter on this overload — that name belongs to createLocalPlanForParallelReplicas, not createLocalPlan. The mirroring block in ClusterProxy::executeQuery passes a plain false, consistent with the actual signature.
Fix direction: Replace the inline comment with /*build_logical_plan=*/false, or drop the named comment entirely to match the existing call in ClusterProxy::executeQuery.
Regression test direction: No runtime test needed; a single-line edit verified by build.

Low: HybridSegmentPruner builds the identity key over getColumns().getAll(), which includes ALIAS and EPHEMERAL columns

Impact: When the Hybrid table has alias / ephemeral columns, those columns are added to the identity key, but the user-side filter_actions_dag references the underlying expression (the Planner has resolved the alias). The user KeyCondition then cannot match those alias entries in the DAG and they sit unused; only the analysis cost is paid. Worst realistic outcome: an exception in HybridSegmentPruner's constructor (e.g. an ALIAS expression failing in KeyDescription::getKeyFromAST) — which getQueryProcessingStage does not catch — would convert a previously-working SELECT from a Hybrid table with such an alias into a hard error.
Anchor: src/Storages/StorageDistributed.cpp — StorageDistributed::computeHybridPruningVerdict: NamesAndTypesList hybrid_columns = storage_snapshot->metadata->getColumns().getAll();
Trigger: Hybrid table whose CREATE includes an ALIAS column (<name> <type> ALIAS <expr>) or an EPHEMERAL column.
Why defect: getAll() returns ordinary + materialized + alias + ephemeral. Only getAllPhysical() (or a similarly filtered list) is appropriate for an "identity key over comparable physical columns".
Fix direction: Replace getAll() with getAllPhysical() (or equivalent) before passing to filterComparable. Optionally also drop MATERIALIZED columns whose expression is non-trivial.
Regression test direction: Add a query-level test creating a Hybrid table that has an ALIAS column on top of a watermarked column; run a SELECT with a WHERE that should prune; assert the plan does not throw and pruning still applies on the watermark column.

Coverage summary

Item	Detail
Scope reviewed	New `HybridSegmentPruner` class (`src/Storages/HybridSegmentPruner.{h,cpp}`); `KeyCondition::extractPlainRangesForColumn` (multi-column range extraction); `StorageDistributed::{getStorageSnapshot, getQueryProcessingStage, computeHybridPruningVerdict, substituteHybridWatermarks, getHybridWatermarkSnapshot, read}` changes; the new local-plan code path for empty cluster + surviving additional segments; the gtest (`gtest_hybrid_segment_pruner.cpp`); and the SQL test (`03646_hybrid_segment_pruning`).
Categories failed	Misleading parameter comment in `createLocalPlan`; identity key built from `getAll()` rather than physical columns.
Categories passed	Lifetime of `KeyCondition` after the source DAG goes out of scope (RPN is self-contained, no dangling pointers); thread-safety of `MultiVersion<WatermarkParams>` snapshot freezing via `HybridSnapshotData`; consistency of pruning verdict between `getQueryProcessingStage` and `read` (same `query_info.filter_actions_dag` and same `storage_snapshot`); soundness of per-column projection in `extractPlainRangesForColumn` for atoms / `AND` / `OR` (existential projection over-approximation); reachability of the `FUNCTION_NOT` branch (only top-level `not(...)` wrappers around opaque sub-expressions appear, which surface as `FUNCTION_UNKNOWN` and short-circuit `return false` before NOT is processed, so the multi-column NOT-after-projection unsoundness is not exploitable in practice); `col_ranges.empty()` -> prune is sound (per-column emptiness implies whole-segment emptiness); rect bounds-check in `checkInHyperrectangle` (rect length equals `identity_key.column_names.size()` which equals `user_condition.num_key_columns`); empty-cluster early-returns (analyzer returns empty plan, legacy adds `NullSource`, "case 2" handles surviving additional segments without dereferencing `cluster.getShardsAddresses().front()` from `updateSettingsAndClientInfoForCluster`); processing-stage math after base pruning (`nodes` recomputed from empty cluster + surviving segment count, `Complete` stage chosen iff one survivor); exception safety on missing watermark (substitution throws `BAD_ARGUMENTS` consistently in both phases); double pruning verdict computation matches because filter DAG, storage snapshot, segments, and watermark snapshot are all stable across the two calls; gtest covers range contradiction, overlap, OR-with-non-mandatory-conjunct, bounded DNF with constant folding, and unsupported-atom-in-OR (conservative keep); SQL test covers baseline, cold/hot pruned individually, all-pruned, three-segment table, OR with non-mandatory conjunct, JOIN (conservative skip), PREWHERE participation, and alias-shadows-Hybrid-column.
Assumptions / limits	Static reasoning only; the gtest and SQL test were not executed in this audit. The "FUNCTION_NOT unreachable" argument relies on the current `cloneDAGWithInversionPushDown` behavior (`not(...)` wrapper added only around non-relational, non-`and`/`or`/`cast`/`materialize`/`indexHint` subtrees, which surface as `FUNCTION_UNKNOWN`); a future change that emits `FUNCTION_NOT` around a known atom would expose the multi-column NOT-after-projection bug and should be revisited.

…ical_plan)

This was suggested during the verification stage. Replacing getAll() with getAllPhysical() avoids potentially wasted work.

mkmkme · 2026-05-15T13:59:08Z

@alsugiliazova, I've pushed two commits resolving both issues.

Low: Misleading parameter comment in createLocalPlan call

This was an unrelated to this PR issue, but I fixed it nonetheless to avoid any confusions in the future.

Low: HybridSegmentPruner builds the identity key over getColumns().getAll(), which includes ALIAS and EPHEMERAL columns

Replaced getAll() with getAllPhysical(). I must say, though, that audit-report slightly overstated the worse-case scenario here. An exception cannot happen in pruner's constructor, because the alias's expression never reaches KeyDescription, only the column's name and type do. The worse case scenario would be just wasted work, there wouldn't be any correction issue.
The optional suggestion to drop MATERIALIZED columns whose expression is non-trivial is not needed. MATERIALIZED columns are physically stored. The expression only runs at INSERT to populate the stored value. From the read path's perspective they're the same as ordinary columns, and the user's filter can reference them directly. Excluding them would be wrong.

alsugiliazova · 2026-05-15T14:36:29Z

Resolved (verification update on `00ca0b48`)

Both Low-severity findings from the previous round have been addressed.

Resolved: Misleading parameter comment in createLocalPlan call → fixed in 04758b6a (/*has_missing_objects=*/false → /*build_logical_plan=*/false). The new comment matches the actual 7th positional argument (bool build_logical_plan) of createLocalPlan in src/Processors/QueryPlan/DistributedCreateLocalPlan.h. Verified by re-reading StorageDistributed::read, the cluster.empty() && !additional_query_infos.empty() block.

Resolved: Identity key built from getAll() instead of physical columns → fixed in 00ca0b48 (getAll() → getAllPhysical() in StorageDistributed::computeHybridPruningVerdict). getAllPhysical() correctly keeps Ordinary and Materialized columns and drops Alias/Ephemeral, so the identity key now reflects only what the user filter can actually reference at read time.

Audit corrections accepted

Worst-case impact was overstated: my earlier write-up suggested an exception could escape HybridSegmentPruner's constructor when an alias expression failed analysis. That is not reachable: buildIdentityKey calls KeyDescription::getKeyFromAST(tuple(...), ColumnsDescription{NamesAndTypesList}). Constructing ColumnsDescription from a NamesAndTypesList materializes plain columns (ColumnsDescription::add(ColumnDescription(name, type))), so the alias's source expression is dropped at the getAll() boundary. The actual worst case is exactly what the commit message says: wasted analysis work on alias entries that the user filter cannot bind to. No correctness or exception path.
MATERIALIZED exclusion suggestion was wrong: my earlier note ("optionally drop MATERIALIZED columns whose expression is non-trivial") was incorrect. MATERIALIZED columns are physically stored — the expression only runs at INSERT time to populate the value — so user filters can reference them directly and pruning on them is valid. getAllPhysical() keeps them, and that is the right behavior; excluding them would have been a missed-pruning regression.

Confirmed defects

No confirmed defects in reviewed scope on head 00ca0b48.

Hybrid: added support for segment pruning

acccd3f

mkmkme added antalya port-antalya PRs to be ported to all new Antalya releases hybrid antalya-26.3 labels May 14, 2026

mkmkme mentioned this pull request May 14, 2026

Antalya 26.1 - hybrid: added support for segment pruning #1721

Closed

27 tasks

svb-alt requested a review from arthurpassos May 14, 2026 12:13

arthurpassos previously approved these changes May 14, 2026

View reviewed changes

mkmkme added 2 commits May 15, 2026 15:39

StorageDistributed: fix the comment (has_missing_objects -> build_log…

04758b6

…ical_plan)

StorageDistributed: replace getAll() with getAllPhysical()

00ca0b4

This was suggested during the verification stage. Replacing getAll() with getAllPhysical() avoids potentially wasted work.

mkmkme dismissed arthurpassos’s stale review via 00ca0b4 May 15, 2026 13:53

alsugiliazova added the verified Approved for release label May 15, 2026

arthurpassos approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antalya 26.3 Hybrid: added support for segment pruning#1788

Antalya 26.3 Hybrid: added support for segment pruning#1788
mkmkme wants to merge 3 commits into
antalya-26.3from
mkmkme/antalya-26.3/hybrid-segment-pruning

mkmkme commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

arthurpassos left a comment

Uh oh!

arthurpassos May 14, 2026

Uh oh!

mkmkme May 15, 2026

Uh oh!

mkmkme commented May 15, 2026

Uh oh!

alsugiliazova commented May 15, 2026

Uh oh!

alsugiliazova commented May 15, 2026

Uh oh!

mkmkme commented May 15, 2026

Uh oh!

alsugiliazova commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mkmkme commented May 14, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos left a comment

Choose a reason for hiding this comment

Uh oh!

arthurpassos May 14, 2026

Choose a reason for hiding this comment

Uh oh!

mkmkme May 15, 2026

Choose a reason for hiding this comment

Uh oh!

mkmkme commented May 15, 2026

Uh oh!

alsugiliazova commented May 15, 2026

Verification: PR #1788

PR-added tests — all GREEN

CI overview (head commit)

PR test-workflow failures

Regression-workflow failures (chronic baseline on antalya-26.3)

Caveat — partial frontport

Verdict

Uh oh!

alsugiliazova commented May 15, 2026

Audit: PR #1788 — Antalya 26.3 Hybrid: added support for segment pruning

Scope reviewed

Confirmed defects

Coverage summary

Uh oh!

mkmkme commented May 15, 2026

Uh oh!

alsugiliazova commented May 15, 2026

Resolved (verification update on 00ca0b48)

Audit corrections accepted

Confirmed defects

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 14, 2026 •

edited

Loading

Regression-workflow failures (chronic baseline on `antalya-26.3`)

Resolved (verification update on `00ca0b48`)