Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths by kosiew · Pull Request #20664 · apache/datafusion

kosiew · 2026-03-03T10:10:47Z

Which issue does this PR close?

Part of perf: push_down_filter is pathologically slow for some plans #20002.

Rationale for this change

The PushDownFilter optimizer rule shows a severe planner-time performance pathology in the sql_planner_extended benchmark, where profiling indicates it dominates total planning CPU time and repeatedly recomputes expression types.

This PR adds a deterministic, CASE-heavy LEFT JOIN benchmark to reliably reproduce the worst-case behavior and introduces lightweight debug-only timing + counters inside push_down_filter to make it easier to pinpoint expensive sub-sections (e.g. predicate simplification and join predicate inference) during profiling.

What changes are included in this PR?

Benchmark: add a deterministic CASE-heavy LEFT JOIN workload
- Adds build_case_heavy_left_join_query and helpers to construct a CASE-nested predicate chain over a LEFT JOIN.
- Adds a new benchmark logical_plan_optimize_case_heavy_left_join to stress planning/optimization time.
- Adds an A/B benchmark group push_down_filter_case_heavy_left_join_ab that sweeps predicate counts and CASE depth, comparing:
  - default optimizer with push_down_filter enabled
  - optimizer with push_down_filter removed
Optimizer instrumentation (debug-only)
- Adds a small with_debug_timing helper gated by log_enabled!(Debug) to record microsecond timings for specific sections.
- Instruments and logs:
  - time spent in infer_join_predicates
  - time spent in simplify_predicates
  - counts of parent predicates, on_filters, inferred join predicates
  - before/after predicate counts for simplification

Are these changes tested?

No new unit/integration tests were added because this PR is focused on benchmarking and debug-only instrumentation rather than changing optimizer semantics.
Coverage is provided by:
- compiling/running the sql_planner_extended benchmark
- validating both benchmark variants (with/without push_down_filter) produce optimized plans without errors
- enabling RUST_LOG=debug to confirm timing sections and counters emit as expected

Are there any user-facing changes?

No user-facing behavior changes.
The optimizer logic is unchanged; only debug logging is added (emits only when RUST_LOG enables Debug for the relevant modules).
Benchmark suite additions only affect developers running benches.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

…ging in push down filter

…lter

kosiew added 3 commits March 3, 2026 18:07

feat: add benchmarking for case-heavy left join and improve debug log…

054607b

…ging in push down filter

feat: enhance benchmarking for case-heavy left join with push down fi…

b927fe7

…lter

Align benchmark helper setup

8ea18ed

github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 3, 2026

cargo fmt

7f6512b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:push_down-20002

kosiew commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kosiew commented Mar 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant