Skip to content

perf: O(1) sort and equality for range arrays#780

Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/jrsonnet-optimizations
Draft

perf: O(1) sort and equality for range arrays#780
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/jrsonnet-optimizations

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 15, 2026

Summary

O(1) fast paths for sorting and comparing range arrays. std.sort(std.range(1, N)) previously materialized the full range into an array and performed O(n log n) sort on already-sorted data. Now returns immediately. std.assertEqual on two equal ranges similarly drops from O(n) element comparison to O(1).

Changes

  1. Sort fast path for range arrayssortArr checks Val.Arr.asSortedIfKnown(pos) before materializing. Forward ranges return as-is (already sorted ascending); reversed ranges return the O(1) forward equivalent via the existing reversed() method. Only applies when no keyF is provided; falls through to full sort otherwise.

  2. Range equality fast pathEvaluator.equal for two Val.Arr adds two short-circuits before element-by-element comparison:

    • Reference equality (x eq y) — covers assertEqual(r, sort(r)) where sort returns the same object
    • Structural range equality (rangeEquals) — covers assertEqual(range(1,N), sort(reverse(range(1,N)))) where both sides are ranges with the same (rangeFrom, length, reversed)

Both optimizations use semantic methods on Val.Arr (asSortedIfKnown, rangeEquals) that encapsulate range internals — callers never access _isRange or _rangeFrom directly.

Benchmark Results — Scala Native vs jrsonnet (Rust, from source)

Machine: Apple Silicon, macOS. Tool: hyperfine --warmup 5 --min-runs 30 -N.

Reliable benchmarks (>20ms runtime, startup overhead not dominant)

Benchmark master (ms) this PR (ms) jrsonnet (ms) PR vs master PR vs jrsonnet
comparsion_for_primitives 35 35 211 ~neutral 6.0x faster
inheritance_recursion 62 61 121 ~neutral 2.0x faster
simple_recursive_call 29 29 52 ~neutral 1.8x faster
realistic_2 85 85 98 ~neutral 1.16x faster

Sub-20ms benchmarks (startup noise significant)

Benchmark master (ms) this PR (ms) jrsonnet (ms) PR vs master PR vs jrsonnet
realistic_1 10.6 10.9 13.7 ~neutral 1.3x faster
foldl_string_concat 6.5 6.5 9.6 ~neutral 1.5x faster
std_foldl 6.4 6.0 7.0 ~neutral ~tied
array_sorts 7.6 7.6 5.3 ~neutral startup gap

Note on sub-10ms benchmarks: Scala Native process startup is ~3-4ms vs Rust ~1ms. For benchmarks with <10ms total runtime, this 2-3ms difference is 40-60% of measured time and masks algorithmic improvements. The sort/equality fast paths eliminate O(n log n) sort and O(n) comparison for range arrays, but the absolute savings (microseconds) are invisible in wall-clock time at this scale.

Algorithmic improvement

Operation Before After
std.sort(std.range(1, N)) O(n) materialize + O(n log n) sort O(1)
std.sort(std.reverse(std.range(1, N))) O(n) materialize + O(n log n) sort O(1)
assertEqual(range_a, range_b) same sequence O(n) element comparison O(1)
assertEqual(x, x) same object O(n) element comparison O(1)

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — all 141 test suites pass
  • ./mill __.reformat — scalafmt passes
  • ./mill 'sjsonnet.native[3.3.7]'.nativeLink — Native binary builds
  • No regressions on any benchmark

Comment thread sjsonnet/src/sjsonnet/Val.scala Outdated
*/
private[sjsonnet] def rangeEquals(other: Arr): Boolean = {
isRange && other.isRange && _length == other._length &&
_reversed == other._reversed && _rangeFrom == other._rangeFrom
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better do with patterns after 778

@He-Pin He-Pin marked this pull request as draft April 17, 2026 03:19
He-Pin added 2 commits April 25, 2026 16:34
Motivation:
std.sort on a range array (e.g. std.range(1, 1000)) was materializing
the entire range into an Eval[] array and performing O(n log n) sort on
already-sorted data. Similarly, comparing two range arrays with
std.assertEqual did O(n) element-by-element comparison even when both
ranges describe the same integer sequence.

Modification:
- Val.Arr.asSortedIfKnown: returns the range as-is for forward ranges,
  or the O(1) reversed equivalent for reversed ranges. Returns null for
  non-range arrays to fall through to full sort.
- Val.Arr.rangeEquals: O(1) structural equality for two ranges by
  comparing (rangeFrom, length, reversed) fields directly.
- SetModule.sortArr: checks asSortedIfKnown before materializing.
- Evaluator.equal: adds reference equality short-circuit (x eq y) and
  rangeEquals fast path before element-by-element comparison.

Result:
array_sorts benchmark improved from 8.3ms to 7.7ms (master to PR).
No regressions on realistic_2 or other benchmarks.
After rebase onto master, isRange field was not defined in Arr class.
Use pattern matching on RangeArr subclass instead.

🤖 Generated with [Qoder][https://qoder.com]
@He-Pin He-Pin force-pushed the perf/jrsonnet-optimizations branch from 21c14e7 to 8fb5228 Compare April 25, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant