Skip to content

perf: change Expr from trait to abstract class#786

Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/expr-abstract-class-pr
Draft

perf: change Expr from trait to abstract class#786
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/expr-abstract-class-pr

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 18, 2026

perf: change Expr from trait to abstract class

Motivation

On the JVM, calling a method through an interface reference (invokeinterface) requires searching the interface method table (itable). Calling through a class reference (invokevirtual) uses a fixed-offset vtable lookup — a single indexed memory access. Cost difference: ~2-5ns per call.

In the evaluator hot path:

  • e.tag (NewEvaluator's tableswitch dispatch)
  • e.pos (position tracking)
  • e.exprErrorString (error reporting)

These are called on Expr-typed references millions of times per evaluation. This adds up to measurable CPU overhead.

Additionally, instanceof checks during pattern matching (e.g., case e: Val) are faster on class hierarchies than interface hierarchies, because the JIT can use Class Hierarchy Analysis (CHA) to generate more aggressive code.

Key Design Decision

Convert Expr from trait to abstract class:

  • Trait → Abstract class moves all Expr calls from invokeinterface (O(n) itable search) to invokevirtual (O(1) vtable)
  • Val extends Expr (class hierarchy, single inheritance)
  • Eval remains trait (necessary for Val extends Expr with Eval)
  • TailCall.pos becomes var (required by abstract class constructor)
  • Pattern matching case e: Val now checks class hierarchy (faster) instead of mixed class+interface

Why not alternatives?

  • Keep trait + store tag as field: Saves ~1ns/call but requires adding constructor parameter to 200+ Expr case classes — massive refactor for <1% gain
  • Move Eval to class: Would break Val extends Expr with Eval (single inheritance in JVM)
  • Other traits stay unchanged: TailstrictableExpr, CompSpec, ObjBody are not on dispatch hot path and benefit less from class hierarchy

Modifications

File Change
Expr.scala trait Exprabstract class Expr
Val.scala Val extends EvalVal extends Expr with Eval
Val.scala Literal extends Val with ExprLiteral extends Val (redundant)
Val.scala Func extends Val with ExprFunc extends Val (redundant)
Val.scala TailCall.pos: defvar (required by abstract class)
Val.scala TailCall.exprErrorString: add override (now overrides Expr's default)
StaticOptimizer.scala Val with Expr patterns → Val (Val is always Expr now)

Benchmark Results

JMH Benchmark Results (JDK 21.0.10, Zulu, aarch64, -f 1 -wi 8 -i 8):

Evaluator-only (no parse/materialize)

Benchmark trait (ms/op) abstract class (ms/op) Δ
oldEvaluator bench.02 34.409 ± 0.189 33.155 ± 0.337 -3.6%
oldEvaluator reverse 8.891 ± 0.180 8.291 ± 0.287 -6.7%
oldEvaluator comparison2 19.027 ± 1.172 18.477 ± 0.861 -2.9% ✅
newEvaluator bench.02 25.984 ± 1.362 25.439 ± 0.471 -2.1% ✅
newEvaluator reverse 8.766 ± 0.252 8.544 ± 0.042 -2.5% ✅

Full pipeline (parse + eval + materialize)

Benchmark trait (ms/op) abstract class (ms/op) Δ
comparison2 19.785 ± 2.752 17.535 ± 0.450 -11.4%
reverse 6.844 ± 0.216 6.645 ± 0.320 -2.9% ✅

Other

Benchmark trait (ms/op) abstract class (ms/op) Δ
Optimizer 0.545 ± 0.033 0.523 ± 0.020 -4.0% ✅
Parser 1.584 ± 0.164 1.511 ± 0.062 -4.6% ✅

Consistent improvement across all JVM benchmarks, with oldEvaluator (instanceof-based dispatch) benefiting most (-6.7%).

Scala Native

Benchmark trait (ms) abstract class (ms) Δ
bench.02 62.1 ± 1.5 62.7 ± 1.9 +1.0% (noise)
comparison2 35.8 ± 2.0 36.0 ± 1.8 +0.6% (noise)

Native shows no change (expected — Scala Native fully devirtualizes both, so trait vs class is irrelevant).

Analysis

Why does this help?

  1. instanceof checks are faster: Class hierarchy checking is O(1) (single depth comparison), while mixed class+interface checking is O(n) (itable + class hierarchy)
  2. JIT optimizations: CHA produces stronger type constraints for pure class hierarchies, enabling more aggressive inlining and devirtualization
  3. Hot path coverage: Evaluator's e.tag, e.pos, e.exprErrorString calls happen millions of times, so per-call savings accumulate

Why is improvement visible in oldEvaluator but smaller in newEvaluator?

  • oldEvaluator uses pattern matching (match { case e: Val => ... }) which compiles to instanceof checks — benefits directly from CHA improvement
  • newEvaluator uses tableswitch (vtable dispatch), which already bypasses instanceof chains — gains primarily from vtable vs itable cost difference (~2-5ns per call)

No regression on Scala Native: Native backend fully devirtualizes both trait and abstract class implementations, so the JVM vtable/itable distinction doesn't apply.

Safety

No semantic changes — Expr behavior is identical, just faster dispatch
All tests pass — Full test suite (./mill __.test) passes on JVM/Native/JS
Backward compatible — No public API changes (Expr is internal)

References

  • Netty uses ByteBuf as abstract class for identical reasons
  • Java best practice: Prefer abstract class when there's no multiple inheritance (faster dispatch)
  • JMH profiling guides: https://www.baeldung.com/java-jmh-benchmark

Result

Rebased to latest master and verified. Abstract class Expr provides consistent 2-11% speedup on JVM, particularly for instanceof-heavy code paths. No regressions on Native. Ready for merge.

@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 18, 2026

In Netty, the ByteBuf is an abstract class too.

@He-Pin He-Pin force-pushed the perf/expr-abstract-class-pr branch from 2c7c1ac to a51477b Compare April 18, 2026 23:11
@He-Pin He-Pin marked this pull request as draft April 20, 2026 04:06
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 20, 2026

Need update the readme too

Motivation:
trait methods use invokeinterface (~5-8ns, itable search) while abstract
class methods use invokevirtual (~2-3ns, vtable). The e.tag dispatch in
the evaluator hot path benefits from this.

Modification:
- Expr: trait → abstract class
- Val: extends Expr directly (Val already extends Eval)
- Literal: remove redundant 'with Expr' (inherited via Val)
- Func: remove redundant 'with Expr' (inherited via Val)
- TailCall: change def pos to var pos (required by abstract class Expr)
- StaticOptimizer: simplify 'Val with Expr' casts to 'Val'

Result:
e.tag calls use invokevirtual instead of invokeinterface. All tests pass.
@He-Pin He-Pin force-pushed the perf/expr-abstract-class-pr branch from a51477b to dcd1702 Compare April 25, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant