Skip to content

perf: eliminate closure allocations in evaluator hot paths#775

Open
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/closure-elimination
Open

perf: eliminate closure allocations in evaluator hot paths#775
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/closure-elimination

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation

Reduce allocation and dispatch overhead in evaluator hot paths by replacing closure-allocating Scala collection operations with explicit loops where the evaluator already operates on arrays and tight traversal paths.

Key Design Decision

Keep the optimization local and behavior-preserving: do not change evaluation semantics, data model, or public API. The PR replaces targeted .map / .filter / .foreach usage with equivalent while loops and reuses a small argument-evaluation helper for repeated apply paths.

Modification

  • visitArr: replace .map(visitAsLazy) with a while loop and retain the empty-array shortcut.
  • visitApply: use evalArgsToArray for tailstrict and non-tailstrict argument evaluation paths.
  • visitExprWithTailCallSupport: reuse evalArgsToArray for tail-position apply handling.
  • visitImportBin: replace byte-to-Eval conversion .map with a while loop.
  • visitComp IfSpec: replace .filter with manual ArrayBuilder traversal.
  • visitMemberList: replace fields.foreach with an explicit loop for object construction.

Benchmark Results

Prior hyperfine evidence reported in this PR showed improvements on selected JVM workloads:

Benchmark Before After Δ
realistic2 418.9ms 397.5ms +5.1%
bench.02 336.5ms 322.8ms +4.1%
realistic1 292.5ms 287.8ms +1.6%
bench.08 250.0ms 249.4ms flat

No benchmark was remeasured in this rebase cycle. The full test suite passed after rebasing; prior benchmark evidence should be rerun under controlled conditions if maintainers need refreshed performance numbers on the rebased commit.

Analysis

The changes remove per-call lambda allocation opportunities and Scala collection wrapper dispatch from evaluator hot paths while preserving the same traversal order and lazy evaluation boundaries. Risk is limited because the modifications are localized and mirror existing explicit-loop style already used elsewhere in the evaluator and standard library hot paths.

References

Result

  • Rebase: completed without conflicts.
  • Formatting: ./mill __.reformat completed and left the PR worktree clean.
  • Tests: ./mill __.test completed successfully.
  • Benchmarks: not rerun in this cycle.

@He-Pin He-Pin marked this pull request as draft April 12, 2026 21:28
He-Pin added 2 commits April 25, 2026 13:25
Replace .map() and .filter() calls with explicit while loops in the
evaluator to eliminate intermediate Array allocations that increase GC
pressure in hot paths.

Changes:
- visitArr: replace .map(visitAsLazy) with while loop
- visitApply: extract evalArgsToArray helper to replace .map calls
- visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply
- visitImportBin: replace .map with while loop
- visitComp IfSpec: replace .filter with manual filtered array builder

Benchmark results (hyperfine, 20 runs, M4 Max macOS):
  realistic2:  418.9ms -> 397.5ms (+5.1%)
  bench.02:    336.5ms -> 322.8ms (+4.1%)
  realistic1:  292.5ms -> 287.8ms (+1.6%)
  bench.08:    250.0ms -> 249.4ms (flat)

🤖 Generated with [Qoder][https://qoder.com]
Replace .map()/.filter()/.foreach() calls with explicit while loops in
the evaluator to eliminate per-call closure allocations on hot paths.

Note: Array.map already creates the target array directly (no "intermediate"
array). The saved allocation is the closure/lambda passed to these methods,
plus the method call overhead of the Scala collections layer.

Changes:
- visitArr: replace .map(visitAsLazy) with while loop + empty array shortcut
- visitApply: extract evalArgsToArray helper to replace 2x .map calls
- visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply
- visitImportBin: replace .map with while loop for raw bytes conversion
- visitComp IfSpec: replace .filter with manual ArrayBuilder while loop
- visitMemberList: replace fields.foreach with while loop (per-object closure)

Benchmark results (hyperfine, 20 runs, JVM, M4 Max macOS):
  realistic2:  418.9ms -> 397.5ms (+5.1%)
  bench.02:    336.5ms -> 322.8ms (+4.1%)
  realistic1:  292.5ms -> 287.8ms (+1.6%)
  bench.08:    250.0ms -> 249.4ms (flat)

🤖 Generated with [Qoder][https://qoder.com]
@He-Pin He-Pin force-pushed the perf/closure-elimination branch from 455e39c to c36b4de Compare April 25, 2026 05:32
@He-Pin He-Pin marked this pull request as ready for review April 25, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant