Skip to content

Tech debt: type-driven hardening of the KEL verify path (unsigned-replay unrepresentable, consolidate DelegatorKelLookup, tighten boundary types) #263

@bordumb

Description

@bordumb

Tech debt: type-driven hardening of the KEL verify path

Architectural follow-ups surfaced while remediating RT-002 (the systemic "verify path
replays a KEL by structure only, never checks event signatures" finding; see
docs/prompts/red_team_2026_06_10.md and #262). RT-002's functional gap is closed for
every signature-carrying transport. This issue captures the structural weaknesses that
let the class exist and that still make it easy to reintroduce. None of these are
security-blocking today; they are correctness-by-construction improvements.

Pre-launch repo: no backwards-compatibility constraints — wire formats and public
signatures may change freely.


Zero-context primer (read first)

  • KERI / KEL. An identity is a Key Event Log (KEL): an ordered chain of events —
    icp (inception), rot (rotation), ixn (interaction), plus delegated variants dip
    (delegated inception) and drt (delegated rotation). Each event is content-addressed by
    a SAID (self-addressing identifier, a hash over its own bytes). An identity's prefix
    (its DID, did:keri:<prefix>) is the SAID of its inception event — so the inception is
    self-certifying, but later appended events are not constrained by the prefix.
  • Two kinds of "replay". The engine lives in crates/auths-keri/src/validate.rs:
    • Structural replayvalidate_kel(&[Event]) (+ _with_lookup, _with_receipts,
      and the alias replay_kel): checks SAID + sequence + chain-linkage + pre-rotation
      commitment. It does not verify that each event is signed by the controlling key.
    • Authenticated replayvalidate_signed_kel(&[SignedEvent], Option<&dyn DelegatorKelLookup>) at validate.rs:588: folds per-event signature verification into
      the replay. SignedEvent { event, signatures } pairs an event with its CESR signature
      attachment.
    • RT-002 in one line: untrusted-input verifiers were calling the structural form, so
      a forged/unsigned KEL replayed to attacker-chosen keys and verified.
  • Delegation constraint (important for any refactor here). A delegated event (dip/
    drt) cannot be authenticated standalone: validate_delegated_inception
    (validate.rs:1024) requires a DelegatorKelLookup to resolve the delegator's
    anchoring seal. So authentication of a delegated device KEL must happen where the root
    KEL + the device KEL + the lookup all coexist (the commit-verify layer, or the
    org-bundle pattern in offline_verify.rs).
  • Trust tiers. The local ~/.auths registry is trusted (self-owned). Bundles
    (--identity-bundle, org air-gapped bundle), WASM/FFI inputs, and --remote/--oobi
    fetches are untrusted and must be authenticated.
  • Ports/adapters. RegistryBackend (trait at crates/auths-id/src/storage/registry/ backend.rs:502) is the storage port; adapters are GitRegistryBackend (auths-storage,
    real), PostgresAdapter (stub), FakeRegistryBackend (testing), plus a blanket
    impl RegistryBackend for Arc<T> (backend.rs:1019).
  • Build/check (per-crate, avoid --all-features on auths-crypto/-core — a deliberate
    FIPS/CNSA compile_error! guard makes that fail):

    cargo build -p auths-<crate> 2>&1 | grep "^error\[E". The verifier's WASM path compiles
    natively with cargo build -p auths-verifier --features wasm. A standing lint
    cargo run -p xtask -- check-verify-path-completeness guards the verify surfaces (below).

Verdict

Trending right, but mixed. The ports/adapters seam is genuine and the leaf domain types
are good. The weaknesses cluster in three places: (1) a trait-default footgun that caused
RT-002, (2) parse-don't-validate is applied at the leaves but not at the load-bearing
boundary
(the public replay API still accepts unsigned &[Event]), and (3) genuine
logic duplication in delegator-seal lookups.


1. Ports/adapters — the lossy trait-default footgun (ALREADY FIXED; keep as a guardrail)

What happened: RegistryBackend::append_signed_event / get_attachment were default
methods whose defaults silently did the wrong thing — append_signed_event delegated to
append_event and dropped the signature attachment; get_attachment returned
Ok(None). The Arc<T> blanket impl did not override them, so every
Arc<dyn RegistryBackend> (the common handle) inherited the lossy default. Nothing failed
loudly — get_attachment just returned None. This is the literal root cause of RT-002:
producers couldn't ship signatures because storage silently discarded them.

Fix already landed (this session): the two methods are now required (no default) at
backend.rs:537/:549, so the compiler enumerates every adapter that must implement them;
Git/Fake/Postgres/Arc all implement them (Arc forwards at backend.rs:1029/:1038).

Lesson to encode (the actionable part):

  • A trait default method is safe only if its default is fail-closed. A default that
    loses data or returns "absent" is a landmine. Audit the rest of RegistryBackend (and
    peer port traits) for other defaults that silently no-op.
  • A blanket impl ... for Arc<T> must forward every method, and there is no
    compile-time guarantee it stays complete as methods are added (a required method does
    force it; a defaulted one does not). Consider whether the Arc blanket impl is worth its
    maintenance hazard, or whether a #[forward]-style macro / explicit newtype is clearer.

2. Parse-don't-validate — solid at the leaves, weak exactly where it's load-bearing

Good (do not "fix"): Prefix, Said, CesrKey, IndexedSignature are real newtypes
that validate on deserialize. SignedEvent { event, signatures } is the right domain type
and validate_signed_kel consumes it.

Gaps:

2a. The dangerous boundary is not typed — this is the big one. All four structural
replays are still public and take bare &[Event]:

  • validate.rs:390 pub fn validate_kel(&[Event])
  • validate.rs:398 pub fn validate_kel_with_lookup(...)
  • validate.rs:472 pub fn validate_kel_with_receipts(...)
  • validate.rs:1131 pub fn replay_kel(&[Event]) (a literal alias of validate_kel)

The type system does nothing to stop an unsigned replay; we rely on the CI lint
xtask check-verify-path-completeness (which bans these on verify surfaces unless a call
site carries an // rt-002-allow: <reason> comment). The lint is a guardrail; the real
fix the audit specified (task "A.0", docs/prompts/red_team_2026_06_10.md) is type-level:
make bare-Event replay pub(crate)/private and have the only public replay take
&[SignedEvent], so "verify an unsigned KEL" becomes a compile error. Then the lint
becomes a backstop instead of the primary defense.

2b. Parallel index-correlated arrays instead of one type. The wire structs carry events
and their attachments as two arrays zipped + length-checked at runtime:

  • crates/auths-verifier/src/core.rs:935 pub kel: Vec<serde_json::Value> +
    :943 pub kel_attachments: Vec<String> (in IdentityBundle, core.rs:918).
  • crates/auths-sdk/src/domains/org/bundle.rs:49 BundledKel { events: Vec<Event>, attachments: Vec<String> }.

A Vec<SignedEvent> makes the length mismatch unrepresentable. Worse, IdentityBundle.kel
is Vec<serde_json::Value>unparsed JSON, the weakest possible typing, deferring the
parse past the type boundary entirely. The parallel-array wire shape has a defensible
forward/backward-read reason, but internally we should collapse to Vec<SignedEvent>
immediately at deserialize and never pass the loose arrays around. (Today the verify gate
does reconstruct SignedEvent then call validate_signed_kel, but the struct still leaks
the loose arrays to every other reader.)

2c. Primitive obsession at the resolver boundary. In
crates/auths-cli/src/commands/verify_commit.rs:

  • :173/:394/:439 thread bundle_kel: Option<&(String, Vec<Event>)> — an anonymous
    tuple whose String is really a DID — through three signatures. A named
    BundleKel { did: Prefix /* or IdentityDid */, events: Vec<Event> } (or Vec<SignedEvent>)
    is clearer and self-documenting.
  • resolve_signer_kel(did: &str) re-runs parse_did_keri(did) on every branch
    (:403, :408, …) instead of parsing once at the boundary into a Prefix and passing
    the typed value down (parse-don't-validate: parse at the edge, pass the proof inward).

3. Shared logic vs. duplication — one clear offender (and I added to it)

There are three near-identical DelegatorKelLookup impls (trait at validate.rs:373,
single method find_seal(&self, delegator_aid: &Prefix, seal_said: &Said) -> Option<SourceSeal>), each indexing "a KEL's anchoring seals":

  • crates/auths-verifier/src/commit_kel.rs:165 RootKelLookup — linear scan
  • crates/auths-verifier/src/presentation.rs:67 DelegatorSeals — linear scan
  • crates/auths-sdk/src/domains/org/offline_verify.rs:88 OrgKelLookup — precomputed
    HashMap, O(1) (added during the RT-002 work because the other two weren't reusable)

Three structs for one concept, and inconsistent in quality (HashMap vs linear scan). This
should be one reusable index — e.g. KelSealIndex::from_events(&[Event]) living in
auths-keri next to the trait — that everyone constructs. Note the lookups span crates
(auths-verifier and auths-sdk), so the shared type belongs in auths-keri.

Minor: replay_kel (validate.rs:1131) is a literal alias of validate_kel — two public
names for one function; collapse to one (the alias forced the lint to ban both names).

Working as intended (keep): replay logic is not duplicated — bundle, org bundle, and
WASM validateKelJson all call the single validate_signed_kel engine fn. That reuse is
the pattern done right; preserve it.


Proposed work (priority order)

P1 — Type-level A.0: make unsigned replay unrepresentable.
Make validate_kel / validate_kel_with_lookup / validate_kel_with_receipts /
replay_kel pub(crate) (or move behind a private module); expose only signed entrypoints
taking &[SignedEvent]. Signer-side trusted re-derivations that legitimately replay an
already-trusted local KEL stay inside the crate boundary or get a typed "trusted" wrapper.
Acceptance: calling structural replay with bare &[Event] from outside the engine crate
is a compile error; check-verify-path-completeness stays green as a backstop; the
existing rt-002-allow: annotations are revisited (several become unnecessary once the type
enforces it).

P2 — Consolidate the three DelegatorKelLookup impls into one KelSealIndex in
auths-keri; delete RootKelLookup, DelegatorSeals, OrgKelLookup in favor of it.
Acceptance: one impl, O(1) lookup, used by commit-verify + presentation + offline org
verify; behavior unchanged (existing org_delegation + commit_kel + presentation tests
pass).

P3 — Tighten boundary types. IdentityBundle.kel: Vec<serde_json::Value>
Vec<Event> (ideally fold kel + kel_attachments into Vec<SignedEvent>); same for
BundledKel. Replace (String, Vec<Event>) in verify_commit.rs with a named struct;
have resolve_signer_kel take a parsed Prefix instead of &str + repeated
parse_did_keri.
Acceptance: no serde_json::Value in the bundle wire type; no anonymous DID-bearing
tuples on the resolver path; DID parsed once at the boundary.

Also fold in: audit remaining RegistryBackend (and peer port) trait methods for
lossy/no-op defaults (§1); collapse the replay_kel alias (§3).


Constraints & gotchas for whoever picks this up

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions