Skip to content

feat: add mergeable containers (#759)#991

Open
typedrat wants to merge 7 commits into
loro-dev:mainfrom
synapdeck:mergeable-containers
Open

feat: add mergeable containers (#759)#991
typedrat wants to merge 7 commits into
loro-dev:mainfrom
synapdeck:mergeable-containers

Conversation

@typedrat
Copy link
Copy Markdown

@typedrat typedrat commented May 30, 2026

Adds mergeable child containers, which are child containers that you can create under map keys that merge across peers instead of creating two separate containers. Closes #759.

const a = new LoroDoc(), b = new LoroDoc();
a.getMap("state").getMergeableCounter("revision").increment(1);
b.getMap("state").getMergeableCounter("revision").increment(1);
// after sync: one counter == 2, not two conflicting counters

They're available as getMergeable{Counter,Map,List,MovableList,Text,Tree} on LoroMap, in the loro, loro-internal, and WASM crates.

Problem

Right now, if two peers each create a child container at the same map key, they end up with two different containers, because the ContainerID is derived from the creating op's id. One of them wins and the other peer's edits are stranded on a container nobody looks at. That's a problem any time you want a single shared object regardless of who created it first - a revision counter, a settings sub-map, a shared text body, etc.

Approach

A mergeable child is a ContainerID::Root with a deterministic name derived from (parent, key, kind), in a reserved "🤝:" namespace. Both peers compute the same id, so they're talking about the same container.

Which child is "live" is decided by the parent map slot, not by separate bookkeeping. When you call getMergeable*, the parent map stores a "🤝:<kind>" discriminator string at that key, and the active child is just whatever that key's normal LWW resolves to. Letting the existing map machinery own this gets us a few things mostly for free:

  • Delete is just clearing the key (an ordinary MapSet { value: None }). No new op types, and no tombstone tracking to keep in sync.
  • If two peers pick different kinds for the same key, the map's LWW picks the winner like it would for any value. The loser's container is still reachable by its deterministic name if you ask for that kind again.
  • Asking for a different kind under an existing key overwrites the discriminator. The previous kind's contents stick around and come back if you request that kind later.
  • Snapshots don't need anything special - the discriminator travels like any other map value. The only thing rebuilt on import is the in-memory parent-edge index.

Commits

I split this into 6 commits by layer, each one carrying its own tests:

  1. feat(common) - cid namespace + discriminator encode/parse helpers
  2. feat(internal) - core mergeable state in DocState
  3. feat(internal) - get_mergeable_* handler API
  4. feat(loro) - public LoroMap wrappers
  5. feat(wasm) - getMergeable* bindings
  6. test(fuzz) - opt-in mergeable fuzz surface

There's also a trailing chore: commit with the changeset.

Compatibility

This is all new methods - no existing signatures change, so it shouldn't break anyone. Root names that start with the "🤝:" prefix are rejected by check_root_container_name so user code can't fabricate one.

Testing

57 tests across the internal (43), public-API (8), and cid-encoding (6) targets, plus a WASM suite in mergeable.test.ts that covers each kind and the subscription-flush invariant from AGENTS.md. There's also an opt-in fuzz target (cargo +nightly fuzz run mergeable); the fuzz crate builds with and without the mergeable feature.

Between them they cover concurrent same- and different-kind creation, three-peer convergence, delete + recreate, snapshot/update import round-trips, event and path resolution, and has_container for mergeable cids.

AI usage disclosure

Most of the lines in this PR were written by an AI coding assistant (Claude Opus 4.7, primarily). I want to be upfront about that, and equally upfront that the engineering wasn't outsourced. I read every diff and handled every non-trivial engineering decision myself, although I did regularly seek the advice of the agent during the process.

So: the assistant did most of the typing, I did the deciding and the reviewing. I've reviewed the whole change and I stand behind its correctness; treat it as my work and review it as critically as you would anything else.

…helpers

Introduce the shared encoding primitives that let map keys host
deterministic "mergeable" child containers. Both peers derive the same
`ContainerID::Root` for a given `(parent, key, kind)` triple, so a child
created independently on two sites resolves to one container that merges
rather than two that fork.

- `MERGEABLE_NAMESPACE_PREFIX` ("🤝:"): reserved root-name prefix mirroring
  Loro's existing 🦜 brand sentinel. `check_root_container_name` rejects
  user-supplied root names that collide with it.
- `ContainerID::new_mergeable` / `is_mergeable` / `parse_mergeable`: encode and
  decode the hex `(parent, key, container_type)` payload, with length-prefixed
  segments so arbitrary keys (empty, NUL, embedded prefix) round-trip cleanly.
- Discriminator helpers (`mergeable_discriminator`, `*_string`,
  `parse_mergeable_discriminator`): the `"🤝:<kind>"` value the parent map
  stores at the key to mark which child kind is active.

The cid-encoding integration tests (`mergeable_cid_encoding.rs`) live with
this commit since they exercise only the common-crate surface.
@typedrat typedrat force-pushed the mergeable-containers branch from 6e571b0 to 8be8ca7 Compare May 30, 2026 02:51
typedrat added 6 commits May 30, 2026 16:23
Add the core state machinery for mergeable child containers. A mergeable
child lives as a deterministic `ContainerID::Root` in the reserved
namespace; its visibility is driven entirely by the `"🤝:<kind>"`
discriminator the parent map stores at the key. The parent map's ordinary
LWW picks the active discriminator, which in turn selects the active child
kind — exactly as a regular child container's value-table entry selects its
child. This makes deletes slot-authoritative: clearing the discriminator
removes the child, and concurrent type conflicts resolve by the same LWW
that governs the map slot, with no separate tombstone bookkeeping.

- `state/mergeable.rs`: parent-edge index (`child_containers`) kept in sync
  with discriminators — seeded on snapshot and update import, updated per-op
  as discriminators appear or clear. Path resolution and reachability map a
  mergeable cid back to its key through this index.
- `map_state.rs`: store per-key discriminators, expose
  `active_mergeable_children` / `iter_mergeable_children`, and resolve the
  active child kind from the parent slot.
- `arena.rs` / `state.rs`: parent mergeable roots into arena walks and route
  mergeable map children to their deterministic root ids so reachability,
  child enumeration, and alive-container walks include them.
- `dead_containers_cache.rs`: treat a mergeable child as dead once its parent
  edge (discriminator) is gone, mirroring a regular container whose value slot
  was overwritten.
Expose mergeable child containers on `MapHandler` via
`get_mergeable_{counter,map,list,movable_list,text,tree}`. Each accessor
writes the `"🤝:<kind>"` discriminator into the parent map at the key and
returns a handler bound to the deterministic mergeable cid, so two peers
calling the same accessor with the same key converge on one container.
Requesting a different kind under an existing key is a deliberate kind change
that overwrites the discriminator; the previous kind's container stays
reachable by its deterministic name. Detached handlers fall back to the
ordinary get-or-create path.

Wire the supporting plumbing through `loro.rs` so the discriminator op and
the derived child route through the normal commit/event path.

The internal integration suite (`tests/mergeable_container/`, split into
focused modules sharing a `common` helper) lands here since it drives the
handler API: concurrent create/convergence, discriminator-based type
conflict resolution, slot-authoritative delete and recreate, snapshot/update
import round-trips, and event/path resolution.
Add `get_mergeable_{counter,map,list,movable_list,text,tree}` to the public
`LoroMap`, wrapping the internal handler accessors and returning the public
container types (`LoroCounter`, `LoroMap`, `LoroList`, `LoroMovableList`,
`LoroText`, `LoroTree`). These are purely additive, non-breaking additions to
the public surface.

The public-API test target (`crates/loro/tests/mergeable_public_api.rs`)
confirms each wrapper forwards to the handler and the returned container is
usable end-to-end, including cross-peer convergence, the kind-change-by-
overwrite semantics, and the `has_container` carve-out for mergeable cids.
Bind `getMergeable{Counter,Map,List,MovableList,Text,Tree}` on the WASM
`LoroMap`, delegating to the internal handler accessors. Each returns the
corresponding JS container handle bound to the deterministic mergeable cid,
so independently created children on two peers merge instead of forking.

These methods emit a discriminator `MapSet` against the parent map, which
flows through the same auto-commit barrier as `LoroMap.set`; the events are
flushed by the already-decorated `commit`, so no `index.ts` allowlist change
is needed. The WASM test (`tests/mergeable.test.ts`) covers convergence for
every kind plus the subscription-flush invariant from AGENTS.md (no
`[LORO_INTERNAL_ERROR] Event not called` under an active subscription).
Extend the fuzz harness to exercise mergeable children alongside regular
containers, behind an opt-in `mergeable` feature so the default fuzz runs are
unaffected. Adds a `mergeable` fuzz target plus per-container action support
(`MapAction::GetMergeable` and friends) so the existing convergence/replay
fuzzers can create and mutate mergeable counters, maps, lists, movable lists,
text, and trees. The crate compiles with and without the feature.
@typedrat typedrat force-pushed the mergeable-containers branch from 8be8ca7 to e3927de Compare May 30, 2026 23:23
@typedrat
Copy link
Copy Markdown
Author

Fuzzed via cargo +nightly fuzz run all --features fuzz/mergeable -- -max_total_time=86400 -fork=192 -rss_limit_mb=4096 -ignore_crashes=1 -ignore_ooms=1 -ignore_timeouts=1 on a 192-core c8g.metal-48xl instance. 24h, 402.7M executions, 0 crashes/OOMs/timeouts, final coverage 32,236 edges (saturated).

@zxch3n
Copy link
Copy Markdown
Member

zxch3n commented Jun 1, 2026

P1: Invalidate deleted-cache entries when a mergeable child becomes active again

A mergeable child can become reachable again after it was previously deleted.

Example flow:

  1. get_mergeable_counter("revision") creates/activates mergeable cid X.
  2. delete("revision") clears the discriminator, so X becomes unreachable.
  3. Mutating an old handle to X correctly fails and may cache X as deleted.
  4. Calling get_mergeable_counter("revision") again writes the discriminator back, so X is reachable again.
  5. In release builds, is_deleted() may still trust the old deleted-cache entry and keep treating X as deleted.

When sync_mergeable_side_table_for_op() registers an active mergeable cid, it should invalidate the corresponding deleted-cache entry, or conservatively clear the whole dead_containers_cache.

P1: Clarify or restrict get_mergeable_* overwriting existing non-mergeable values

get_mergeable_* currently writes a 🤝:<kind> discriminator into the parent map slot. That means if the key already contains a normal scalar value or a normal child container, the call silently overwrites it.

That behavior may be intentional, but the name get_mergeable_* makes it easy to read this as a harmless “get or create if empty” API.

I think this should be made explicit in one of these ways:

  • Return ArgErr when the key already contains a non-mergeable value.
  • Rename the API to something that signals mutation/overwrite semantics, such as ensure_mergeable_* or set_mergeable_*.
  • At minimum, document clearly that this call may overwrite the existing map slot with a mergeable discriminator.

P2: Do not treat every 🤝: root name as a valid mergeable container

If the intended semantics are:

Users may create root names starting with 🤝:, but only names that successfully parse as the mergeable encoding should be treated as mergeable containers.

then is_mergeable() should not be implemented as a simple prefix check.

A root name like 🤝:not-valid-hex or 🤝:abc should remain a normal root container, not a half-valid mergeable container with missing parent/key metadata.

Suggested direction:

pub fn is_mergeable(&self) -> bool {
    self.parse_mergeable().is_some()
}

or at least make all critical code paths distinguish between “has the mergeable-looking prefix” and “is a valid parsed mergeable cid”.

P2: Reject 🤝:Unknown(n) discriminators

parse_mergeable_discriminator() should reject 🤝:Unknown(n).

The discriminator controls which child kind is active under a map key. Unknown(_) does not have a valid user-facing mergeable-container behavior here, and downstream code may hit unreachable/default-value paths for unknown container types.

The parser should only accept supported concrete kinds:

Map, List, MovableList, Text, Tree, Counter

and explicitly reject Unknown(_). A small regression test for 🤝:Unknown(7) would be enough.

@zxch3n
Copy link
Copy Markdown
Member

zxch3n commented Jun 1, 2026

Snapshot import does not need to scan/decode every Map to rebuild mergeable parent edges.

For mergeable children that have state entries, the mergeable root cid already encodes (parent, key, kind). We can discover those candidates by scanning container ids for valid 🤝:<hex> mergeable cids, then only verify the corresponding parent map slot still contains the matching discriminator.

The current implementation instead walks all Map containers and reads every Map's shallow value. That is much broader than necessary.

One subtle case is unmutated mergeable children: they may only exist as a discriminator in the parent map and may not have a child state entry yet. If we need those to have side-table entries immediately after snapshot import, then scanning discriminators is one way to find them. But that could also be handled lazily when the child is accessed or when path/reachability is queried.

So I think the import-time recovery should avoid iter_all_container_ids() -> load_all() and either:

  • rebuild from valid mergeable root cids plus targeted parent-slot verification, or
  • make mergeable side-table reconstruction lazy.

@zxch3n
Copy link
Copy Markdown
Member

zxch3n commented Jun 1, 2026

It's an awesome PR overall. I can push the fixes that address these comments directly to this branch if you want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Automatic Merging for Concurrent Container Inserts in Maps

2 participants