Skip to content

perf: promote hot-path evaluator to default; remove dual-evaluator flag#788

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/remove-old-evaluator
Open

perf: promote hot-path evaluator to default; remove dual-evaluator flag#788
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/remove-old-evaluator

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 25, 2026

Summary

Lifts the NewEvaluator hot-path dispatch (introduced in #785 behind --new-evaluator) into the default Evaluator, then removes the now-redundant subclass, Settings.useNewEvaluator, and the --new-evaluator CLI flag. Maintaining two evaluators doubled the surface for every future optimization and left the slow path as the user-facing default.

Benchmarks

JMH (EvaluatorBenchmark, 18 files × 1 fork × 5 warmup × 3s + 10 measure × 3s; the single 1-fork outlier bench.06 re-verified with 3 forks × 8 warmup + 15 measure):

count examples
✓ Strictly better (CIs do not overlap) 7 bench.01 -28.8%, bench.02 -27.2%, bench.03 -37.6%, gen_big_object -18.6%
= Within noise (CIs overlap) 11 realistic2 +1.3%, comparison2 -1.4%, ...
✗ Strictly worse 0 (initial 1-fork run flagged bench.06 +4.0%; 3-fork rerun showed 0.186 ± 0.001 vs 0.186 ± 0.003 — a tie, the gap was a JIT compilation artifact)

Geometric mean: -7.27%.

Changes

  • Evaluator.scalavisitExpr is now the 7-type instanceof hot path (covers ~96.1% of dispatches) plus a visitExprCold tag-based tableswitch. Profiler instrumentation is preserved in the hot pathNewEvaluator previously dropped it, so --profile --new-evaluator was silently broken.
  • Settings.scala / Config.scala / SjsonnetMainBase.scala — drop the flag plumbing.
  • Interpreter.createEvaluator — always returns the unified Evaluator.
  • TestUtils / EvaluatorTests / AggressiveStaticOptimizationTests — drop the useNewEvaluator parameter; EvaluatorTests no longer runs every assertion twice.
  • bench/EvaluatorBenchmark.scala — deleted; ongoing perf coverage stays in MainBenchmark and RegressionBenchmark.

Net diff: 9 files, +270 / -530.

Breaking change

Settings.useNewEvaluator and the --new-evaluator CLI flag are removed. Embedders that explicitly set the flag should drop it; the resulting behavior is the same as the (now-default) hot-path dispatch.

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.compile
  • ./mill 'sjsonnet.js[3.3.7]'.compile
  • ./mill 'sjsonnet.native[3.3.7]'.compile
  • ./mill bench.compile
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 12 suites, 0 failures
  • ./mill __.checkFormat — clean

… flag

Motivation:
The hybrid instanceof + tableswitch dispatch introduced as NewEvaluator (databricks#785)
was opt-in via --new-evaluator / Settings.useNewEvaluator. Maintaining two
evaluators doubles the surface for every future optimization, splits test
runs (each EvaluatorTests case ran twice), and leaves the slow path as the
default for users who don't know to flip the flag.

JMH on the existing 18 EvaluatorBenchmark files (1 fork x 5 warmup x 3s +
10 measure x 3s; bench.06 re-verified with 3 forks x 8 warmup + 15 measure)
shows the new dispatch is statistically equal-or-better everywhere, with
geomean -7.27% and up to -37% on bench.03. The single 1-fork outlier
(bench.06 +4.03%) reproduced as a tie on 3-fork rerun (0.186 vs 0.186 ms).

Modification:
- Lift NewEvaluator.visitExpr (7-type instanceof hot path + tag-switch
  cold path) into base Evaluator. Profiler instrumentation is preserved
  in the hot path -- previously NewEvaluator silently dropped it.
- Delete NewEvaluator class.
- Remove Settings.useNewEvaluator and the --new-evaluator CLI flag
  (breaking change for any embedder that set the flag explicitly; the
  flag was the slower path, so removing it improves their default).
- Collapse EvaluatorTests.allTests(useNewEvaluator) into a single
  tests block (no more 2x prefixed run).
- Remove the now-redundant EvaluatorBenchmark A/B harness; ongoing
  perf is covered by MainBenchmark and RegressionBenchmark.

Result:
One evaluator implementation. All 12 JVM Scala 3 test suites pass with
0 failures. -260 net lines of code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant