Summary
Switch the on-the-wire encoding of Actor and Worker records in the Valkey state store from protojson to binary protobuf. Pre-alpha, no production data, so this is a clean cutover with no migration path needed. This is the cheapest, highest-leverage step toward the 1 billion record north star: it roughly halves per-record memory and is ~4x faster to encode, with no schema changes and no change to the locking path.
Motivation
The store keeps every Actor/Worker record resident in RAM with no TTL, so per-record size directly sets the capacity bill. At the 1B target, encoding alone moves the provisioned fleet from ~3.8 TB to ~1.4 TB (~2.2 TB saved).
Measurements (full methodology and all five encodings in docs/record-encoding-benchmarks.md):
|
protojson (today) |
binary protobuf |
binary + field trims |
| Value size |
609 B |
~321 B |
170 B |
| In-Valkey (w/ overhead) |
731 B |
n/a |
283 B |
| Encode CPU |
3124 ns |
721 ns (4x) |
~same |
Binary protobuf is the necessary-but-not-sufficient first step: a constant-factor win that extends runway. It does not change the cost class of holding 1B mostly-idle records in RAM. That is the job of hot/cold tiering, tracked separately (see #12).
Proposed change
All in cmd/ateapi/internal/store/ateredis/ateredis.go:
- Write path:
protojson.Marshal to proto.Marshal for Actor and Worker records (Create/Update).
- Read path:
protojson.Unmarshal to proto.Unmarshal.
Clean cutover. Since there's no production data, any existing dev keyspace is flushed (records are a re-derivable cache anyway). The version-check logic stays in Go, so the WATCH/MULTI optimistic-concurrency paths are unaffected. The lock-release Lua (ateredis.go:578) is also unaffected: it compares an opaque, ephemeral lock token by byte equality and never touches the record encoding, so the change is invisible to it (verified against a real Valkey cluster, including a token carrying an embedded NUL byte).
Non-goals / out of scope
- No change to the lock release Lua (
ateredis.go:578). The compare-and-delete is opaque byte equality and is binary-safe (verified against a real Valkey cluster, including an embedded NUL byte in the token).
- Field trims (170 B / 283 B) are a separate phase. They involve semantic decisions about droppable/derivable fields and carry regression risk.
- zstd + dictionary (112 B) is a different tradeoff (~1 ms/record encode, ~1400x), only justified if RAM becomes the hard binding constraint.
- Hot/cold tiering and any transactional-store direction are tracked elsewhere.
Acceptance criteria
Risks
- Hard cutover: any pre-existing dev data won't decode, so flush keyspaces on deploy of this change. Acceptable pre-alpha; must land before any production use.
- Debuggability: values in
valkey-cli are no longer human-readable.
- Test coverage gap:
miniredis can't run cluster commands, so add at least one real-cluster round-trip test.
References
Summary
Switch the on-the-wire encoding of
ActorandWorkerrecords in the Valkey state store from protojson to binary protobuf. Pre-alpha, no production data, so this is a clean cutover with no migration path needed. This is the cheapest, highest-leverage step toward the 1 billion record north star: it roughly halves per-record memory and is ~4x faster to encode, with no schema changes and no change to the locking path.Motivation
The store keeps every Actor/Worker record resident in RAM with no TTL, so per-record size directly sets the capacity bill. At the 1B target, encoding alone moves the provisioned fleet from ~3.8 TB to ~1.4 TB (~2.2 TB saved).
Measurements (full methodology and all five encodings in
docs/record-encoding-benchmarks.md):Binary protobuf is the necessary-but-not-sufficient first step: a constant-factor win that extends runway. It does not change the cost class of holding 1B mostly-idle records in RAM. That is the job of hot/cold tiering, tracked separately (see #12).
Proposed change
All in
cmd/ateapi/internal/store/ateredis/ateredis.go:protojson.Marshaltoproto.Marshalfor Actor and Worker records (Create/Update).protojson.Unmarshaltoproto.Unmarshal.Clean cutover. Since there's no production data, any existing dev keyspace is flushed (records are a re-derivable cache anyway). The version-check logic stays in Go, so the
WATCH/MULTIoptimistic-concurrency paths are unaffected. The lock-release Lua (ateredis.go:578) is also unaffected: it compares an opaque, ephemeral lock token by byte equality and never touches the record encoding, so the change is invisible to it (verified against a real Valkey cluster, including a token carrying an embedded NUL byte).Non-goals / out of scope
ateredis.go:578). The compare-and-delete is opaque byte equality and is binary-safe (verified against a real Valkey cluster, including an embedded NUL byte in the token).Acceptance criteria
miniredis) for both record types.store.Interface; existing store tests pass unmodified.Risks
valkey-cliare no longer human-readable.minirediscan't run cluster commands, so add at least one real-cluster round-trip test.References
docs/record-encoding-benchmarks.mddocs/state-store.md