perf: eliminate closure allocations in evaluator hot paths#775
Open
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Open
perf: eliminate closure allocations in evaluator hot paths#775He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Conversation
Replace .map() and .filter() calls with explicit while loops in the evaluator to eliminate intermediate Array allocations that increase GC pressure in hot paths. Changes: - visitArr: replace .map(visitAsLazy) with while loop - visitApply: extract evalArgsToArray helper to replace .map calls - visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply - visitImportBin: replace .map with while loop - visitComp IfSpec: replace .filter with manual filtered array builder Benchmark results (hyperfine, 20 runs, M4 Max macOS): realistic2: 418.9ms -> 397.5ms (+5.1%) bench.02: 336.5ms -> 322.8ms (+4.1%) realistic1: 292.5ms -> 287.8ms (+1.6%) bench.08: 250.0ms -> 249.4ms (flat) 🤖 Generated with [Qoder][https://qoder.com]
Replace .map()/.filter()/.foreach() calls with explicit while loops in the evaluator to eliminate per-call closure allocations on hot paths. Note: Array.map already creates the target array directly (no "intermediate" array). The saved allocation is the closure/lambda passed to these methods, plus the method call overhead of the Scala collections layer. Changes: - visitArr: replace .map(visitAsLazy) with while loop + empty array shortcut - visitApply: extract evalArgsToArray helper to replace 2x .map calls - visitExprWithTailCallSupport: reuse evalArgsToArray for tail Apply - visitImportBin: replace .map with while loop for raw bytes conversion - visitComp IfSpec: replace .filter with manual ArrayBuilder while loop - visitMemberList: replace fields.foreach with while loop (per-object closure) Benchmark results (hyperfine, 20 runs, JVM, M4 Max macOS): realistic2: 418.9ms -> 397.5ms (+5.1%) bench.02: 336.5ms -> 322.8ms (+4.1%) realistic1: 292.5ms -> 287.8ms (+1.6%) bench.08: 250.0ms -> 249.4ms (flat) 🤖 Generated with [Qoder][https://qoder.com]
455e39c to
c36b4de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Reduce allocation and dispatch overhead in evaluator hot paths by replacing closure-allocating Scala collection operations with explicit loops where the evaluator already operates on arrays and tight traversal paths.
Key Design Decision
Keep the optimization local and behavior-preserving: do not change evaluation semantics, data model, or public API. The PR replaces targeted
.map/.filter/.foreachusage with equivalentwhileloops and reuses a small argument-evaluation helper for repeated apply paths.Modification
visitArr: replace.map(visitAsLazy)with a while loop and retain the empty-array shortcut.visitApply: useevalArgsToArrayfor tailstrict and non-tailstrict argument evaluation paths.visitExprWithTailCallSupport: reuseevalArgsToArrayfor tail-position apply handling.visitImportBin: replace byte-to-Evalconversion.mapwith a while loop.visitCompIfSpec: replace.filterwith manualArrayBuildertraversal.visitMemberList: replacefields.foreachwith an explicit loop for object construction.Benchmark Results
Prior hyperfine evidence reported in this PR showed improvements on selected JVM workloads:
No benchmark was remeasured in this rebase cycle. The full test suite passed after rebasing; prior benchmark evidence should be rerun under controlled conditions if maintainers need refreshed performance numbers on the rebased commit.
Analysis
The changes remove per-call lambda allocation opportunities and Scala collection wrapper dispatch from evaluator hot paths while preserving the same traversal order and lazy evaluation boundaries. Risk is limited because the modifications are localized and mirror existing explicit-loop style already used elsewhere in the evaluator and standard library hot paths.
References
perf/closure-eliminationupstream/masterin this cycle.Result
./mill __.reformatcompleted and left the PR worktree clean../mill __.testcompleted successfully.