Skip to content

feat(cubestore): pre-filter in-memory chunks on worker before IPC#11040

Open
waralexrom wants to merge 2 commits into
masterfrom
cubestore-mem-chunks-filter
Open

feat(cubestore): pre-filter in-memory chunks on worker before IPC#11040
waralexrom wants to merge 2 commits into
masterfrom
cubestore-mem-chunks-filter

Conversation

@waralexrom

@waralexrom waralexrom commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Trims in-memory chunks on the worker by the dedup-safe unique-key pushable predicate before they are serialized and shipped over IPC to the select subprocess. Parquet data is already pruned by the predicate; in-memory chunks previously crossed IPC whole, so a partition that survives range-pruning still shipped rows of many keys that the query immediately discards. This closes that gap.

Changes

  • Extract the dedup-safe pushable-filter selection into a shared dedup_safe_unique_key_filter reused by the scan-time FilterExec and the new worker pre-filter, so the two cannot diverge.
  • Compute the per-index pushable predicate once at planning time (choose_index_ext, from the same filters that gate partition pruning), strip column qualifiers, and carry it proto-encoded in PlanningMeta.pushable_chunk_filters (1:1 with indices, #[serde(default)] for back-compat).
  • On the worker, decode the predicate and trim the loaded chunk batches between load_in_memory_chunks and serialization (trace op chunks.prefilter). The subprocess still re-applies the predicate, so this only reduces IPC payload and is never relied on for correctness.
  • A chunk referenced by more than one scan (self-join / self-union of one index) is left untrimmed: the subprocess shares batches by chunk id, and a scan with a different or no predicate would otherwise lose rows. Reference counting spans all index snapshots, not just predicate-bearing ones.
  • Best-effort: any decode/build/eval failure is logged and skipped rather than failing the query.

Testing

  • New regression test prefilter_chunks_shared_scan (self-union of one index, one branch key-filtered, the other not) — reproduced the shared-chunk bug (60 vs expected 130) before the cross-snapshot reference-count fix, green after.
  • filter_pushdown_unique_key, unique_key_and_multi_*, *_stream_table, limit_pushdown_unique_key — pass in-process and multi-process (real select-subprocess IPC), no regressions.
  • cargo build -p cubestore clean; cargo fmt --check passes (pre-commit hook).

@waralexrom waralexrom requested a review from a team as a code owner June 8, 2026 22:53
@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 8, 2026
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 1m 37s —— View job


Code review

Solid, well-scoped change. The shared-chunk reference count across all index snapshots (not just predicate-bearing ones) is the key correctness insight, and the new prefilter_chunks_shared_scan test pins it down precisely. Defense-in-depth (subprocess re-applies the predicate; planner-side encode failures degrade to no pre-filter) keeps the surface area for correctness regressions small.

A few minor observations — nothing blocking:

1. in_memory_chunk_filter_groups doc is slightly misleadingserialized_plan.rs:1086

The doc says the result is "grouped by the proto-encoded pushable predicate", but the implementation produces one entry per index snapshot — two snapshots with identical predicate bytes yield two entries, not one merged group. Not a correctness issue (downstream just iterates), but the comment overstates the grouping. Consider either deduping by bytes or rewording (e.g., "one entry per index snapshot that carries a pre-filter").

2. phys.as_ref().unwrap()query_executor.rs:2433

After the explicit if phys.is_none() build block, the immediate phys.as_ref().unwrap() is safe but slightly awkward. Binding the freshly built Arc directly avoids the unwrap:

let p = match phys.as_ref() {
    Some(p) => p.clone(),
    None => { /* build, set phys, return p */ }
};

Pure style.

3. Backward-compat direction is one-way

#[serde(default)] on pushable_chunk_filters covers old plan → new worker. The reverse — new router → old worker — relies on serde's default of ignoring unknown fields, which is true for bincode-derived serializers but worth confirming for whatever transport actually serializes PlanningMeta in your mixed-version rollouts. If a rolling upgrade is in scope, a quick check would be cheap insurance.

4. Filter cost for low-selectivity predicates

filter_record_batch reallocates the entire batch even when the predicate selects most rows. For the intended use case (selective unique-key filters in multi-tenant scans) this is a clear win, but if predicates ever land here that match >90% of rows, the IPC-saving math flips. Not worth gating on, but a future selectivity heuristic could short-circuit when the mask popcount is close to row count.

5. Test gap: pre-filter actually fires when it should

prefilter_chunks_shared_scan proves the skip path is correct (shared chunks aren't wrongly trimmed). It doesn't directly verify the active path trims anything — both branches produce the same final result whether the pre-filter ran or not, since the subprocess re-applies the predicate. A counter on chunks.prefilter (or an EXPLAIN ANALYZE DETAILED assertion now that #11016 landed) would catch a future refactor that silently disables the pre-filter. Not required for this PR.

Nits

  • query_executor.rs:2359 doc comment for dedup_safe_unique_key_filter is very good — the "commutes with last-row dedup" framing is exactly the right invariant to call out.
  • in_memory_chunk_filter_groups walks index_snapshots twice (once for chunk_counts, once for emit). Fine at current sizes; mentionable only if profiling ever points here.

LGTM modulo the doc tweak on item 1.

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.51%. Comparing base (d91df4b) to head (a12c836).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11040   +/-   ##
=======================================
  Coverage   58.51%   58.51%           
=======================================
  Files         216      216           
  Lines       17265    17265           
  Branches     3524     3524           
=======================================
  Hits        10102    10102           
  Misses       6648     6648           
  Partials      515      515           
Flag Coverage Δ
cube-backend 58.51% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 40s —— View job


Code review (re-review after sync)

Two new commits since the prior review:

  • 392e472 — doc rewording on in_memory_chunk_filter_groups (addresses item ReferenceError: self is not defined - Issue with whatwg-fetch when using the core lib with Node.js #1 from my prior review — accurately describes "one entry per index snapshot that carries a pre-filter" rather than the misleading "grouped by predicate"). ✅
  • fb10cf6prefilter_chunks_shared_scan excluded from the migration suite. Reasonable: the test relies on a specific shared-snapshot planner shape, and migration replays don't need to re-validate the prefilter correctness path. No production code touched.

The substantive change (392e472's parent) is unchanged from my earlier review — still LGTM. Already approved by @paveltiunov.

Items #2#5 from my prior review remain non-blocking polish, not regressions:

  • phys.as_ref().unwrap() after the build block in query_executor.rs:2433 — pure style.
  • New-router → old-worker compat depends on the transport's tolerance for unknown serde fields; worth a sanity check if mixed-version rollouts are in scope.
  • Selectivity heuristic for filter_record_batch — future work, only matters if low-selectivity predicates ever land here.
  • No assertion that the prefilter actually fires (skip path is covered; active path is not directly observable from the test result since the subprocess re-applies). An EXPLAIN ANALYZE DETAILED assertion on chunks.prefilter (now that feat(cubestore): EXPLAIN ANALYZE DETAILED per-query trace #11016 landed) would close that gap in a follow-up.

Trim in-memory chunks by the dedup-safe unique-key pushable predicate on
the worker before they are serialized and shipped over IPC to the select
subprocess. The predicate is computed once at planning time (from the same
filters that gate partition pruning), carried in PlanningMeta, and applied
to the loaded batches; the subprocess still re-applies it, so this only
reduces IPC payload and is never relied on for correctness.

A chunk referenced by more than one scan (self-join/self-union of one
index) is left untrimmed: the subprocess shares batches by chunk id and a
scan with a different or no predicate would otherwise lose rows. The
pre-filter is best-effort - any failure is logged and skipped rather than
failing the query.
…suite

Migration tests run each registered case against a pre-baked data dir from
the fixtures tarball; a brand-new test has none. Also clarify the
in_memory_chunk_filter_groups doc (one entry per snapshot, not merged by
predicate).
@waralexrom waralexrom force-pushed the cubestore-mem-chunks-filter branch from fb10cf6 to a12c836 Compare June 9, 2026 09:17
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 33s —— View job


Code review (sync re-check)

  • Diffed origin/master...HEAD, listed review threads, inspected new commits.

No new substantive changes since the prior re-review:

  • a5b9014 — feature commit (unchanged from the LGTM'd version).
  • a12c836 — excludes prefilter_chunks_shared_scan from the migration suite. Test-only; reasonable given the test depends on a specific shared-snapshot planner shape that migration replays don't need to re-validate.

Already approved by @paveltiunov; no new threads from human reviewers to consider. Prior non-blocking observations (items #2#5 from my earlier review — phys.as_ref().unwrap() style, new-router→old-worker serde compat sanity check, low-selectivity heuristic for filter_record_batch, and an assertion that the active prefilter path fires) still stand as optional follow-ups and don't gate this PR.

LGTM.
· branch: cubestore-mem-chunks-filter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants