Skip to content

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664

Draft
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:push_down-20002
Draft

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:push_down-20002

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 3, 2026

Which issue does this PR close?

Rationale for this change

The PushDownFilter optimizer rule shows a severe planner-time performance pathology in the sql_planner_extended benchmark, where profiling indicates it dominates total planning CPU time and repeatedly recomputes expression types.

This PR adds a deterministic, CASE-heavy LEFT JOIN benchmark to reliably reproduce the worst-case behavior and introduces lightweight debug-only timing + counters inside push_down_filter to make it easier to pinpoint expensive sub-sections (e.g. predicate simplification and join predicate inference) during profiling.

What changes are included in this PR?

  • Benchmark: add a deterministic CASE-heavy LEFT JOIN workload

    • Adds build_case_heavy_left_join_query and helpers to construct a CASE-nested predicate chain over a LEFT JOIN.

    • Adds a new benchmark logical_plan_optimize_case_heavy_left_join to stress planning/optimization time.

    • Adds an A/B benchmark group push_down_filter_case_heavy_left_join_ab that sweeps predicate counts and CASE depth, comparing:

      • default optimizer with push_down_filter enabled
      • optimizer with push_down_filter removed
  • Optimizer instrumentation (debug-only)

    • Adds a small with_debug_timing helper gated by log_enabled!(Debug) to record microsecond timings for specific sections.

    • Instruments and logs:

      • time spent in infer_join_predicates
      • time spent in simplify_predicates
      • counts of parent predicates, on_filters, inferred join predicates
      • before/after predicate counts for simplification

Are these changes tested?

  • No new unit/integration tests were added because this PR is focused on benchmarking and debug-only instrumentation rather than changing optimizer semantics.

  • Coverage is provided by:

    • compiling/running the sql_planner_extended benchmark
    • validating both benchmark variants (with/without push_down_filter) produce optimized plans without errors
    • enabling RUST_LOG=debug to confirm timing sections and counters emit as expected

Are there any user-facing changes?

  • No user-facing behavior changes.
  • The optimizer logic is unchanged; only debug logging is added (emits only when RUST_LOG enables Debug for the relevant modules).
  • Benchmark suite additions only affect developers running benches.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant