Skip to content

base64 encode/decode is ~6x slower than jrsonnet on large payloads #779

@He-Pin

Description

@He-Pin

I was benchmarking base64 performance on large payloads and noticed sjsonnet is significantly slower than jrsonnet — about 6x on a ~4.5MB string with a couple of encode/decode roundtrips.

Dug into it a bit. The bottleneck isn't the base64 codec itself — it's the UTF-16 ↔ UTF-8 conversion that happens on every call. Since Java/Scala strings are UTF-16 internally, every std.base64(str) has to do str.getBytes("UTF-8") to get bytes for the encoder, and every std.base64Decode has to do new String(bytes, "UTF-8") to produce the result. That's two full copies of the data per operation, going through the charset encoder/decoder.

jrsonnet doesn't have this problem because its strings are UTF-8 natively (custom IStr type backed by [u8]), so base64 can work directly on the string bytes with zero conversion.

For small payloads (a few KB) this doesn't really matter — interpreter overhead dominates. But once you get into the hundreds-of-KB or MB range, the conversion cost adds up fast.

Repro (requires hyperfine + both tools installed):

// base64_ultra.jsonnet
local s1 = std.repeat("The quick brown fox jumps over the lazy dog. ", 100000);
local e1 = std.base64(s1);
local d1 = std.base64Decode(e1);
local e2 = std.base64(d1);
local d2 = std.base64Decode(e2);
{
  input_len: std.length(s1),
  encoded_len: std.length(e1),
  roundtrip_ok: d2 == s1,
}
hyperfine --warmup 2 \
  'sjsonnet base64_ultra.jsonnet' \
  'jrsonnet base64_ultra.jsonnet'

On my M4 Max (Scala Native build):

  • sjsonnet: ~88ms
  • jrsonnet: ~14ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions