Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,11 @@ out/
*.rej
*.bak
.bazelbsp/
/jsonnet/
/go-jsonnet/
/jrsonnet/
/rsjsonnet/
/scala-js/
/scala-native/
/fastparse/
/.claude/
8 changes: 8 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,14 @@ JMH benchmarks live in `bench/`. Benchmark suites in `bench/resources/`: `bug_su

For ad-hoc benchmarking, `hyperfine` is available on the PATH.

## Upstream Sync Tracking

When syncing from upstream, process commits in upstream commit order and track every evaluated commit in `sync/upstream-sync-points.tsv`. Keep the TSV machine-friendly: one row per sync point, no comments, factual values only, and leave fields empty rather than guessing.

Use `status=pending|migrated|skipped`. Skipped commits must record one of `reason_code=already-implemented|existing-pr|not-applicable|license-only|superseded` plus short notes. Migrated commits must link the upstream commit/PR, local branch, draft PR, local commit, and report reference so review can trace `tracking row <-> local branch <-> draft PR <-> local commit <-> upstream source`.

Use one dedicated branch per migrated commit (`sync/<order>-<sha7>` or an equivalent PR-specific branch). Commit messages and PR descriptions must include `Upstream-Commit`, `Upstream-PR`, and `Sync-Point` references.

## Architecture

### Pipeline
Expand Down
151 changes: 151 additions & 0 deletions docs/performance/benchmark-gaps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Benchmark Gaps vs jrsonnet

This file tracks only the cases where **sjsonnet is slower than jrsonnet**. Do not record wins, ties, or unmeasured guesses.

## Scope

This document is the local source of truth for newly measured gaps. Historical benchmark tables from upstream or jrsonnet docs are useful context, but only add rows here after a local serial run confirms that sjsonnet still trails jrsonnet.

## Gap summary schema

Use one row per measured benchmark case where the gap is still open.

| field | meaning |
| --- | --- |
| `case` | Stable case name, usually the input file path or suite/id pair |
| `suite_or_fixture` | Benchmark suite, resource path, or fixture name |
| `gap_area` | Short hotspot label such as `parser`, `stdlib`, `imports`, `manifest`, `object`, or `unknown` |
| `jrsonnet_mean_ms` | Mean runtime for jrsonnet in milliseconds |
| `sjsonnet_native_mean_ms` | Mean runtime for Scala Native sjsonnet in milliseconds |
| `sjsonnet_jvm_or_graal_mean_ms` | Mean runtime for JVM or GraalVM sjsonnet when measured |
| `sjsonnet_vs_jrsonnet` | `sjsonnet_native_mean_ms / jrsonnet_mean_ms`; only keep rows where this is `> 1.00` |
| `priority` | `P0`, `P1`, or `P2` based on impact and confidence |
| `evidence` | Link to PR comment, issue, local raw output, or report section |
| `status` | `open`, `remeasure`, or `closed` |
| `next_step` | Smallest actionable follow-up to reduce or explain the gap |

## Open gap table

Populate this table only after a serial benchmark run confirms that sjsonnet still trails jrsonnet.

| case | suite_or_fixture | gap_area | jrsonnet_mean_ms | sjsonnet_native_mean_ms | sjsonnet_jvm_or_graal_mean_ms | sjsonnet_vs_jrsonnet | priority | evidence | status | next_step |
| --- | --- | --- | ---: | ---: | ---: | ---: | --- | --- | --- | --- |

## Serial benchmark workflow

Run the comparison on one machine, one benchmark case at a time.

1. Record the environment before timing:

```bash
java -version
./mill --version
uname -a
```

2. Prepare both executables before timing. Reuse the same built artifacts for every case in the session.
3. Pin the exact commands in shell variables so the comparison is reproducible:

```bash
SJSONNET_CMD='<path-to-sjsonnet-command>'
JRSONNET_CMD='<path-to-jrsonnet-command>'
INPUT='<path-to-benchmark.jsonnet>'
```

4. Time the two commands serially with identical output handling:

```bash
hyperfine --warmup 3 --runs 10 \
"$SJSONNET_CMD $INPUT >/dev/null" \
"$JRSONNET_CMD $INPUT >/dev/null"
```

5. Copy the mean, min, max, and stddev into the PR evidence. Copy the mean runtimes into the gap table and compute `sjsonnet_vs_jrsonnet`.
6. If `sjsonnet_vs_jrsonnet <= 1.00`, do not add a gap row.
7. Re-run only after a code change, toolchain change, or benchmark-environment change; otherwise update `status` instead of duplicating rows.

Do not run multiple Mill, JMH, or hyperfine jobs in parallel. Mill has a file lock and concurrent benchmark jobs distort measurements.

## JMH before/after schema

Use this table for JVM or microbenchmark changes.

| benchmark | param_or_fixture | metric | unit | before_mean | before_error | after_mean | after_error | delta_pct | speedup | winner | notes |
| --- | --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | --- | --- |

## Scala Native hyperfine before/after schema

Use this table for executable-level comparisons.

| case | command | before_mean | after_mean | min | max | stddev | delta_pct | correctness_check | notes |
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | --- | --- |

## jrsonnet comparison schema

Use this table whenever the PR claims to close or narrow a gap against jrsonnet.

| case | method | jrsonnet_mean | sjsonnet_mean | sjsonnet_vs_jrsonnet | target_gap | status |
| --- | --- | ---: | ---: | ---: | ---: | --- |

## PR body template

Paste this into PR descriptions when the PR changes performance-sensitive code. Keep the sections even when a section says `not measured` so PRs stay comparable.

```md
## Motivation

Explain the performance gap or compatibility issue being targeted.

## Key Design Decision

Explain the core implementation decision, why it is semantics-preserving, and why it should help this gap.

## Modification

- Change 1
- Change 2
- Benchmark fixture(s) affected

## Benchmark Results

### Environment

| field | value |
| --- | --- |
| machine | |
| os | |
| java/runtime | |
| sjsonnet command | |
| jrsonnet command | |

### JMH before / after

| benchmark | param | unit | before | after | delta_pct | speedup |
| --- | --- | --- | ---: | ---: | ---: | ---: |

### Scala Native hyperfine before / after

| case | before_mean | after_mean | min | max | stddev | delta_pct |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |

### jrsonnet comparison

| case | jrsonnet_mean | sjsonnet_mean | sjsonnet_vs_jrsonnet | status |
| --- | ---: | ---: | ---: | --- |

## Analysis

Interpret wins, regressions, measurement noise, and remaining gaps.

## References

- Upstream commit or PR:
- Benchmark raw output:
- Related tracking row:

## Result

State whether this closes, narrows, or only measures the gap.
```

If no remaining gap is observed, replace the table row with: `No measured case in this PR leaves sjsonnet slower than jrsonnet.`
56 changes: 56 additions & 0 deletions sjsonnet/src/sjsonnet/Val.scala
Original file line number Diff line number Diff line change
Expand Up @@ -480,8 +480,18 @@ object Val {
private var _isRange: Boolean = false
private var _rangeFrom: Int = 0

// Lazy repeat state. When _isRepeated is true, this array represents
// the repetition of a base array N times. Inspired by jrsonnet's RepeatedArray (arr/spec.rs).
// _repeatedBase holds the underlying array; _repeatedCount is the repetition count.
// Total length = base.length * _repeatedCount, computed lazily in _length.
// Element access uses modulo: value(i) = base.value(i % base.length)
private var _isRepeated: Boolean = false
private var _repeatedBase: Arr = _
private var _repeatedCount: Int = 0

@inline private def isConcatView: Boolean = _concatLeft ne null
@inline private def isRange: Boolean = _isRange
@inline private def isRepeated: Boolean = _isRepeated

override def asArr: Arr = this

Expand All @@ -491,6 +501,7 @@ object Val {
else {
val computed =
if (isConcatView) _concatLeft.length + _concatRight.length
else if (isRepeated) _repeatedBase.length * _repeatedCount
else arr.length // isRange always has _length pre-set, never reaches here
_length = computed
computed
Expand All @@ -501,6 +512,8 @@ object Val {
if (isConcatView) {
val leftLen = _concatLeft.length
if (i < leftLen) _concatLeft.value(i) else _concatRight.value(i - leftLen)
} else if (isRepeated) {
_repeatedBase.value(i % _repeatedBase.length)
} else if (isRange) {
// For reversed ranges, _rangeFrom is the last element and we count down
if (_reversed) Val.cachedNum(pos, _rangeFrom - i)
Expand All @@ -521,6 +534,8 @@ object Val {
if (isConcatView) {
val leftLen = _concatLeft.length
if (i < leftLen) _concatLeft.eval(i) else _concatRight.eval(i - leftLen)
} else if (isRepeated) {
_repeatedBase.eval(i % _repeatedBase.length)
} else if (isRange) {
if (_reversed) Val.cachedNum(pos, _rangeFrom - i)
else Val.cachedNum(pos, _rangeFrom + i)
Expand Down Expand Up @@ -550,6 +565,7 @@ object Val {
def asLazyArray: Array[Eval] = {
if (isConcatView) materialize()
if (isRange) materializeRange()
if (isRepeated) materializeRepeated()
if (_reversed) {
val len = arr.length
val result = new Array[Eval](len)
Expand Down Expand Up @@ -614,6 +630,27 @@ object Val {
_reversed = false // range is now materialized in correct order
}

/**
* Materialize a lazy repeated view into a flat array. After this call, `arr` holds the full
* repeated contents and the repeated state is cleared. The base array is materialized first via
* asLazyArray to handle nested views (e.g., repeated range).
*/
private def materializeRepeated(): Unit = {
val baseArr = _repeatedBase.asLazyArray
val baseLen = baseArr.length
val count = _repeatedCount
val totalLen = baseLen * count
val result = new Array[Eval](totalLen)
var i = 0
while (i < count) {
System.arraycopy(baseArr, 0, result, i * baseLen, baseLen)
i += 1
}
arr = result
_isRepeated = false // clear repeated flag
_repeatedBase = null // allow GC of base reference
}

/**
* Concatenate two arrays. For large left-side arrays where neither operand is already a concat
* view, creates a lazy ConcatView that defers the copy until bulk access is needed. This is
Expand Down Expand Up @@ -730,6 +767,25 @@ object Val {
a
}

/**
* Create a lazy repeated array representing the base array repeated `count` times. Elements are
* computed on demand via modulo indexing (index % base.length). This turns
* `std.repeat([1, 2, 3], 1000)` from O(n) to O(1) creation and O(1) memory.
*
* Inspired by jrsonnet's RepeatedArray (arr/spec.rs) which uses the same deferred approach.
*/
def repeated(pos: Position, base: Arr, count: Int): Arr = {
if (count <= 0) Arr(pos, EMPTY_EVAL_ARRAY)
else {
val a = new Arr(pos, null)
a._isRepeated = true
a._repeatedBase = base
a._repeatedCount = count
a._length = base.length * count
a
}
}

/**
* Threshold for lazy concat. Arrays with left.length >= this value use a virtual ConcatView
* instead of eager arraycopy. Below this size, arraycopy is cheap enough that the indirection
Expand Down
12 changes: 3 additions & 9 deletions sjsonnet/src/sjsonnet/stdlib/ArrayModule.scala
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,7 @@ object ArrayModule extends AbstractFunctionModule {
builtin("repeat", "what", "count") { (pos, ev, what: Val, count: Int) =>
val res: Val = what match {
case Val.Str(_, str) =>
// For strings, concatenation still uses StringBuilder (O(n) memory minimum for result)
val builder = new StringBuilder(str.length * count)
var i = 0
while (i < count) {
Expand All @@ -663,15 +664,8 @@ object ArrayModule extends AbstractFunctionModule {
}
Val.Str(pos, builder.toString())
case a: Val.Arr =>
val lazyArray = a.asLazyArray
val elemLen = lazyArray.length
val result = new Array[Eval](elemLen * count)
var i = 0
while (i < count) {
System.arraycopy(lazyArray, 0, result, i * elemLen, elemLen)
i += 1
}
Val.Arr(pos, result)
// For arrays, use lazy repeated view — O(1) memory, O(1) creation time
Val.Arr.repeated(pos, a, count)
case x => Error.fail("std.repeat first argument must be an array or a string")
}
res
Expand Down
44 changes: 44 additions & 0 deletions sjsonnet/test/resources/new_test_suite/repeat_view.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Test cases for std.repeat with lazy repeated array view optimization
// Verifies that repeated arrays work correctly and maintain the same semantics
// as the original eager array copy implementation.

local tests = [
// Basic repeat of simple array
std.repeat([1, 2, 3], 3) == [1, 2, 3, 1, 2, 3, 1, 2, 3],

// Edge case: repeat zero times
std.repeat([1, 2, 3], 0) == [],

// Edge case: repeat once
std.repeat([1, 2, 3], 1) == [1, 2, 3],

// Repeat empty array
std.repeat([], 5) == [],

// Repeat single element
std.repeat([42], 5) == [42, 42, 42, 42, 42],

// Repeat with mixed types
std.repeat([1, "a", true], 2) == [1, "a", true, 1, "a", true],

// Repeat nested array
std.repeat([[1, 2], [3, 4]], 2) == [[1, 2], [3, 4], [1, 2], [3, 4]],

// Access by index on repeated array
std.repeat([1, 2, 3], 4)[5] == 3, // index 5 maps to position 5 % 3 = 2

// Length of repeated array
std.length(std.repeat([1, 2, 3], 1000)) == 3000,

// Repeat string
std.repeat("ab", 3) == "ababab",

// Concat with repeated array
std.repeat([1, 2], 2) + [3, 4] == [1, 2, 1, 2, 3, 4],

// Sort repeated array
std.sort(std.repeat([3, 1, 2], 2)) == [1, 1, 2, 2, 3, 3],
];

// Test all conditions pass
std.all(tests)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
true
1 change: 1 addition & 0 deletions sync/upstream-sync-points.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
upstream_order upstream_repo upstream_commit upstream_date upstream_pr upstream_title status reason_code local_branch draft_pr_url local_commit report_ref notes