From 7eaff88b4237ebe71a4ae9a09202036b23283809 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Mon, 2 Mar 2026 17:39:24 +0000 Subject: [PATCH 01/61] product and engineering specifications for simplexmq --- CODE.md | 210 ++++++++++++++++++++++++++++ product/README.md | 22 +++ product/components/agent.md | 9 ++ product/components/notifications.md | 7 + product/components/servers.md | 9 ++ product/components/smp.md | 9 ++ product/components/xftp.md | 9 ++ product/components/xrcp.md | 7 + product/concepts.md | 5 + product/glossary.md | 5 + product/goals.md | 5 + product/rules.md | 5 + product/threat-model.md | 22 +++ spec/README.md | 64 +++++++++ spec/agent-protocol.md | 13 ++ spec/agent.md | 13 ++ spec/compression.md | 7 + spec/crypto-ratchet.md | 13 ++ spec/crypto-tls.md | 11 ++ spec/crypto.md | 19 +++ spec/encoding.md | 11 ++ spec/ntf-protocol.md | 15 ++ spec/ntf-server.md | 11 ++ spec/remote-control.md | 11 ++ spec/security-invariants.md | 19 +++ spec/smp-client.md | 11 ++ spec/smp-protocol.md | 13 ++ spec/smp-server.md | 13 ++ spec/storage-agent.md | 11 ++ spec/storage-server.md | 9 ++ spec/transport-http2.md | 13 ++ spec/transport-websocket.md | 7 + spec/transport.md | 11 ++ spec/version.md | 9 ++ spec/xftp-client.md | 11 ++ spec/xftp-protocol.md | 13 ++ spec/xftp-server.md | 11 ++ spec/xrcp-protocol.md | 13 ++ 38 files changed, 676 insertions(+) create mode 100644 CODE.md create mode 100644 product/README.md create mode 100644 product/components/agent.md create mode 100644 product/components/notifications.md create mode 100644 product/components/servers.md create mode 100644 product/components/smp.md create mode 100644 product/components/xftp.md create mode 100644 product/components/xrcp.md create mode 100644 product/concepts.md create mode 100644 product/glossary.md create mode 100644 product/goals.md create mode 100644 product/rules.md create mode 100644 product/threat-model.md create mode 100644 spec/README.md create mode 100644 spec/agent-protocol.md create mode 100644 spec/agent.md create mode 100644 spec/compression.md create mode 100644 spec/crypto-ratchet.md create mode 100644 spec/crypto-tls.md create mode 100644 spec/crypto.md create mode 100644 spec/encoding.md create mode 100644 spec/ntf-protocol.md create mode 100644 spec/ntf-server.md create mode 100644 spec/remote-control.md create mode 100644 spec/security-invariants.md create mode 100644 spec/smp-client.md create mode 100644 spec/smp-protocol.md create mode 100644 spec/smp-server.md create mode 100644 spec/storage-agent.md create mode 100644 spec/storage-server.md create mode 100644 spec/transport-http2.md create mode 100644 spec/transport-websocket.md create mode 100644 spec/transport.md create mode 100644 spec/version.md create mode 100644 spec/xftp-client.md create mode 100644 spec/xftp-protocol.md create mode 100644 spec/xftp-server.md create mode 100644 spec/xrcp-protocol.md diff --git a/CODE.md b/CODE.md new file mode 100644 index 000000000..2b99fff9e --- /dev/null +++ b/CODE.md @@ -0,0 +1,210 @@ +# simplexmq — LLM Navigation Guide + +This file is the entry point for LLMs working on simplexmq. Read it before making any code changes. + +## Three-Layer Architecture + +simplexmq maintains three documentation layers alongside source code: + +| Layer | Directory | Answers | Audience | +|-------|-----------|---------|----------| +| **Product** | `product/` | What does this do? Who uses it? What must never break? | Anyone reasoning about behavior, privacy, security | +| **Spec** | `spec/` | How does the code work? What does each function do? What are the security invariants? | LLMs and developers modifying code | +| **Protocol** | `protocol/` | What is the wire protocol? What are the message formats and state machines? | Protocol implementors, formal verification | + +Additionally: +- `rfcs/` — Protocol evolution: each RFC describes a delta to a protocol spec +- `product/threat-model.md` — Comprehensive threat model across all protocols +- `spec/security-invariants.md` — Every security invariant with enforcement and test coverage + +## Navigation Workflow + +When modifying code, follow this sequence: + +1. **Identify scope** — Find the relevant component in `product/concepts.md` +2. **Load product context** — Read the component file in `product/components/` to understand what users depend on +3. **Load spec context** — Read the relevant `spec/` file(s) for implementation details and call graphs +4. **Check security** — Read `spec/security-invariants.md` for any invariants enforced by the code you're changing +5. **Load source** — Read the actual source files referenced in spec/ +6. **Identify impact** — Trace the call graph to understand what your change affects +7. **Implement** — Make the change +8. **Update all layers** — Update spec/, product/, and protocol/ (if wire protocol changed) to stay coherent + +## Protocol Specifications + +Consolidated protocol specs live in `protocol/`. These describe the wire protocols as originally specified. Code has advanced beyond these versions — Phase 2 of this project will synchronize them. + +| File | Protocol | Spec version | Code version | +|------|----------|-------------|--------------| +| `simplex-messaging.md` | SMP (simplex messaging) | v9 | SMP relay v18, SMP client v4 | +| `agent-protocol.md` | Agent (duplex connections) | v5 | Agent v7 | +| `xftp.md` | XFTP (file transfer) | v2 | XFTP v3 | +| `xrcp.md` | XRCP (remote control) | v1 | RCP v1 | +| `push-notifications.md` | Push notifications | v2 | NTF v3 | +| `pqdr.md` | PQDR (post-quantum double ratchet) | v1 | E2E v3 | +| `overview-tjr.md` | Cross-protocol overview | — | — | + +Note: SMP has multiple version axes — `VersionSMP` (relay/transport, currently 18), `VersionSMPC` (client protocol, currently 4), and `VersionSMPA` (agent, currently 7). These are negotiated independently. + +Protocol specs are amended in place when implementation changes. RFCs in `rfcs/` track the evolution history. + +## Source Structure + +``` +src/Simplex/ + Messaging/ + Protocol.hs, Protocol/Types.hs — SMP wire protocol types + encoding + Client.hs — SMP client (protocol operations, proxy relay) + Client/Agent.hs — Low-level async SMP agent + Server.hs — SMP server request handling + Server/Env/STM.hs — Server environment + STM state + Server/Main.hs, Server/Main/Init.hs — Server CLI + initialization + Server/QueueStore/ — Queue storage (STM, Postgres) + Server/MsgStore/ — Message storage (STM, Journal, Postgres) + Server/MsgStore/Journal.hs — Journal message store (1000 lines) + Server/StoreLog/ — Store log (append-only write, read-compact-rewrite restore) + Server/NtfStore.hs — Message notification store + Server/Control.hs, Server/CLI.hs — Control protocol + CLI utilities + Server/Stats.hs, Server/Prometheus.hs — Metrics + Server/Information.hs — Server public information / metadata + Agent.hs — SMP agent: duplex connections, queue rotation + Agent/Client.hs — Agent's SMP/XFTP/NTF client management + Agent/Protocol.hs — Agent wire protocol types + encoding (2200 lines) + Agent/Store.hs — Agent storage types (queues, connections, messages) + Agent/Store/AgentStore.hs — Agent storage implementation (3500 lines) + Agent/Store/ — Agent storage backends (SQLite, Postgres) + Agent/Env/SQLite.hs — Agent environment + configuration + Agent/NtfSubSupervisor.hs — Notification subscription management + Agent/TSessionSubs.hs — Transport session subscriptions + Agent/Stats.hs — Agent statistics + Agent/RetryInterval.hs — Retry interval logic + Agent/Lock.hs — Named locks + Agent/QueryString.hs — Query string parsing + Transport.hs — TLS transport abstraction + handshake + Transport/Client.hs, Transport/Server.hs — TLS client + server + Transport/HTTP2.hs — HTTP/2 transport setup + Transport/HTTP2/Client.hs — HTTP/2 client + Transport/HTTP2/Server.hs — HTTP/2 server + Transport/HTTP2/File.hs — HTTP/2 file streaming + Transport/WebSockets.hs — WebSocket adapter + Transport/Buffer.hs — Transport buffering + Transport/KeepAlive.hs — TCP keepalive + Transport/Shared.hs — Certificate chain validation + Transport/Credentials.hs — TLS credential generation + Crypto.hs — All cryptographic primitives + Crypto/File.hs — File encryption (NaCl secret box + lazy) + Crypto/Lazy.hs — Lazy hashing + encryption + Crypto/Ratchet.hs — Double ratchet + PQDR + Crypto/ShortLink.hs — Short link key derivation + Crypto/SNTRUP761.hs — Post-quantum KEM hybrid secret + Crypto/SNTRUP761/Bindings.hs — sntrup761 C FFI bindings + Notifications/Protocol.hs — NTF wire protocol types + encoding + Notifications/Types.hs — NTF agent types (tokens, subscriptions) + Notifications/Transport.hs — NTF transport handshake + Notifications/Client.hs — NTF client operations + Notifications/Server.hs — NTF server + Notifications/Server/Env.hs — NTF server environment + config + Notifications/Server/Store.hs — NTF server storage (STM) + Notifications/Server/Store/Postgres.hs — NTF server storage (Postgres) + Notifications/Server/Push/APNS.hs — Apple push notification integration + Notifications/Server/Push/APNS/Internal.hs — APNS HTTP/2 client + Notifications/Server/Main.hs — NTF server CLI + Notifications/Server/Stats.hs — NTF server metrics + Notifications/Server/Prometheus.hs — NTF Prometheus metrics + Notifications/Server/Control.hs — NTF server control + Encoding.hs, Encoding/String.hs — Binary + string encoding + Version.hs, Version/Internal.hs — Version ranges + negotiation + Util.hs — Utilities (error handling, STM, grouping) + Parsers.hs — Attoparsec parser combinators + TMap.hs — Transactional map (STM) + Compression.hs — Zstd compression + ServiceScheme.hs — Service scheme + server location types + Session.hs — Session variables (TVar-based) + SystemTime.hs — Rounded system time types + FileTransfer/ + Protocol.hs — XFTP wire protocol types + encoding + Client.hs — XFTP client operations + Client/Agent.hs — XFTP client agent (connection pooling) + Client/Main.hs — XFTP CLI client implementation + Client/Presets.hs — Default XFTP servers + Server.hs — XFTP server request handling + Server/Env.hs — XFTP server environment + config + Server/Store.hs — XFTP server storage + Server/StoreLog.hs — XFTP server store log + Server/Main.hs — XFTP server CLI + Server/Stats.hs — XFTP server metrics + Server/Prometheus.hs — XFTP Prometheus metrics + Server/Control.hs — XFTP server control + Agent.hs — XFTP agent operations + Description.hs — File description format + Transport.hs — XFTP transport + Crypto.hs — File encryption for transfer + Types.hs — File transfer types + Chunks.hs — Chunk sizing + RemoteControl/ + Client.hs — XRCP client (ctrl device) + Invitation.hs — XRCP invitation handling + Discovery.hs — Local network discovery + Discovery/Multicast.hsc — Multicast discovery (C FFI) + Types.hs — XRCP types + version + +apps/ + smp-server/Main.hs — SMP server executable + smp-server/web/Static.hs — SMP server web static files + xftp-server/Main.hs — XFTP server executable + xftp/Main.hs — XFTP CLI executable + ntf-server/Main.hs — Notification server executable + smp-agent/Main.hs — SMP agent (experimental, not in cabal) +``` + +## Linking Conventions + +### spec → src +Fully qualified exported function names inline in prose: `Simplex.Messaging.Client.connectSMPProxiedRelay`. Use Grep/Glob to locate in source. For app targets: `xftp/Main.main`. + +### src → spec +Comment above function: +```haskell +-- spec/crypto-tls.md#certificate-chain-validation +-- Validates relay certificate chain to prevent proxy MITM (SI-XX) +connectSMPProxiedRelay :: ... +``` + +### spec ↔ spec +Named markdown heading anchors: `spec/crypto.md#ed25519-signing` + +### spec ↔ product +Cross-references: `product/rules.md#pr-05`, `spec/security-invariants.md#si-01` + +### protocol/ references +`protocol/simplex-messaging.md` with section name + +## Build Flags + +simplexmq builds with several flag combinations: + +| Flag | Effect | +|------|--------| +| (none) | Default: SQLite storage, all executables | +| `-fserver_postgres` | Postgres backend for SMP server | +| `-fclient_postgres` | Postgres backend for agent storage | +| `-fclient_library` | Library-only build (no server executables) | +| `-fswift` | Swift JSON format for mobile bindings | +| `-fuse_crypton` | Use crypton in cryptostore | + +All flag combinations must compile with `--enable-tests`. Verify with: +``` +cabal build all --ghc-options="-O0" [-flags] [--enable-tests] +``` + +## Change Protocol + +Every code change must maintain coherence across all three layers: + +1. **Code change** — Implement in src/ +2. **Spec update** — Update the relevant spec/ file(s): types, call graphs, security notes +3. **Product update** — If user-visible behavior changed, update product/ files +4. **Protocol update** — If wire protocol changed, amend protocol/ spec (requires user approval) +5. **Security check** — If the change touches a trust boundary, update spec/security-invariants.md + +Protocol spec amendments require explicit user approval before committing. diff --git a/product/README.md b/product/README.md new file mode 100644 index 000000000..a3466bc66 --- /dev/null +++ b/product/README.md @@ -0,0 +1,22 @@ +# SimpleX Network — Product Layer + +> What does this do? Who uses it? What must never break? + +## Vision + + + +## Components + +| Component | Description | Spec | Protocol | +|-----------|-------------|------|----------| +| SMP | Simplex messaging queues | spec/smp-protocol.md | protocol/simplex-messaging.md | +| Agent | Duplex connections over simplex queues | spec/agent.md | protocol/agent-protocol.md | +| XFTP | File transfer via encrypted chunks | spec/xftp-protocol.md | protocol/xftp.md | +| XRCP | Remote control of mobile clients | spec/remote-control.md | protocol/xrcp.md | +| NTF | Push notifications with privacy | spec/ntf-protocol.md | protocol/push-notifications.md | +| Servers | SMP, XFTP, NTF server operation | spec/smp-server.md | — | + +## Capability Map + + diff --git a/product/components/agent.md b/product/components/agent.md new file mode 100644 index 000000000..f09b878a1 --- /dev/null +++ b/product/components/agent.md @@ -0,0 +1,9 @@ +# Agent — Duplex Connections + +> Bidirectional connections built over pairs of simplex queues. + +## Users + +## Connection Lifecycle + +## Guarantees diff --git a/product/components/notifications.md b/product/components/notifications.md new file mode 100644 index 000000000..4ad41f463 --- /dev/null +++ b/product/components/notifications.md @@ -0,0 +1,7 @@ +# Push Notifications + +> Push notifications with metadata privacy. + +## Users + +## Privacy Trade-offs diff --git a/product/components/servers.md b/product/components/servers.md new file mode 100644 index 000000000..9e6ea5300 --- /dev/null +++ b/product/components/servers.md @@ -0,0 +1,9 @@ +# Server Operation + +> SMP, XFTP, and NTF server deployment and operation. + +## Deployment + +## Configuration + +## Monitoring diff --git a/product/components/smp.md b/product/components/smp.md new file mode 100644 index 000000000..a7fcadc98 --- /dev/null +++ b/product/components/smp.md @@ -0,0 +1,9 @@ +# SMP — Simplex Messaging Protocol + +> Unidirectional messaging queues with sender/receiver separation. + +## Users + +## Guarantees + +## Privacy Properties diff --git a/product/components/xftp.md b/product/components/xftp.md new file mode 100644 index 000000000..104a5f9b4 --- /dev/null +++ b/product/components/xftp.md @@ -0,0 +1,9 @@ +# XFTP — File Transfer + +> Encrypted file transfer via content-addressed chunks. + +## Users + +## Guarantees + +## Privacy Properties diff --git a/product/components/xrcp.md b/product/components/xrcp.md new file mode 100644 index 000000000..bd6b01576 --- /dev/null +++ b/product/components/xrcp.md @@ -0,0 +1,7 @@ +# XRCP — Remote Control + +> Remote control of mobile clients from desktop. + +## Users + +## Trust Model diff --git a/product/concepts.md b/product/concepts.md new file mode 100644 index 000000000..c67705a8f --- /dev/null +++ b/product/concepts.md @@ -0,0 +1,5 @@ +# Concepts & Entity Index + +> Domain concepts with cross-references to spec/ and src/. + + diff --git a/product/glossary.md b/product/glossary.md new file mode 100644 index 000000000..87ad50fbf --- /dev/null +++ b/product/glossary.md @@ -0,0 +1,5 @@ +# Glossary + +> Domain terminology used across simplexmq. + + diff --git a/product/goals.md b/product/goals.md new file mode 100644 index 000000000..26f0fcf43 --- /dev/null +++ b/product/goals.md @@ -0,0 +1,5 @@ +# Design Goals + +> Verified against protocol specs and code. + + diff --git a/product/rules.md b/product/rules.md new file mode 100644 index 000000000..c363de59f --- /dev/null +++ b/product/rules.md @@ -0,0 +1,5 @@ +# Invariant Rules + +> Invariants users depend on: privacy, delivery, ordering, security. + + diff --git a/product/threat-model.md b/product/threat-model.md new file mode 100644 index 000000000..511cdebcf --- /dev/null +++ b/product/threat-model.md @@ -0,0 +1,22 @@ +# Threat Model + +> Comprehensive threat model across all protocols. + +Consistent with threat models in: +- `protocol/overview-tjr.md` (cross-protocol) +- `protocol/simplex-messaging.md` (SMP) +- `protocol/xftp.md` (XFTP) +- `protocol/xrcp.md` (XRCP) +- `protocol/push-notifications.md` (notifications) + +## Actors + + + +## Trust Boundaries + + + +## Security Properties + + diff --git a/spec/README.md b/spec/README.md new file mode 100644 index 000000000..83ce5097c --- /dev/null +++ b/spec/README.md @@ -0,0 +1,64 @@ +# Spec Layer + +> How does the code work? What does each function do? What are the security invariants? + +## Conventions + +Each spec file documents: +1. **Purpose** — What this component does +2. **Protocol reference** — Link to `protocol/` file (where applicable) +3. **Types** — Key data types with field descriptions +4. **Functions** — Every exported function with call graph +5. **Security notes** — Trust assumptions, validation requirements + +Function documentation format: +``` +### Module.functionName +**Purpose**: ... +**Calls**: Module.a, Module.b +**Called by**: Module.c +**Invariant**: SI-XX +**Security**: ... +``` + +## Index + +### Protocol Implementation +- [smp-protocol.md](smp-protocol.md) — SMP commands, types, encoding +- [xftp-protocol.md](xftp-protocol.md) — XFTP commands, chunk operations +- [ntf-protocol.md](ntf-protocol.md) — NTF commands, token/subscription lifecycle +- [xrcp-protocol.md](xrcp-protocol.md) — XRCP session handshake, commands +- [agent-protocol.md](agent-protocol.md) — Agent connection procedures, queue rotation + +### Cryptography +- [crypto.md](crypto.md) — All primitives: Ed25519, X25519, NaCl, AES-GCM, SHA, HKDF +- [crypto-ratchet.md](crypto-ratchet.md) — Double ratchet + PQDR +- [crypto-tls.md](crypto-tls.md) — TLS setup, certificate chains, validation + +### Transport +- [transport.md](transport.md) — Transport abstraction, handshake, block padding +- [transport-http2.md](transport-http2.md) — HTTP/2 framing, file streaming +- [transport-websocket.md](transport-websocket.md) — WebSocket adapter + +### Server Implementations +- [smp-server.md](smp-server.md) — SMP server +- [xftp-server.md](xftp-server.md) — XFTP server +- [ntf-server.md](ntf-server.md) — Notification server + +### Client Implementations +- [smp-client.md](smp-client.md) — SMP client, proxy relay +- [xftp-client.md](xftp-client.md) — XFTP client +- [agent.md](agent.md) — SMP agent, duplex connections + +### Storage +- [storage-server.md](storage-server.md) — Server storage backends +- [storage-agent.md](storage-agent.md) — Agent storage backends + +### Auxiliary +- [encoding.md](encoding.md) — Binary and string encoding +- [version.md](version.md) — Version ranges and negotiation +- [remote-control.md](remote-control.md) — XRCP implementation +- [compression.md](compression.md) — Zstd compression + +### Security +- [security-invariants.md](security-invariants.md) — All security invariants diff --git a/spec/agent-protocol.md b/spec/agent-protocol.md new file mode 100644 index 000000000..b84ffb9ce --- /dev/null +++ b/spec/agent-protocol.md @@ -0,0 +1,13 @@ +# Agent Protocol Implementation + +> Implements agent connection procedures, queue rotation, and duplex messaging. + +**Protocol reference**: [`protocol/agent-protocol.md`](../protocol/agent-protocol.md) + +## Types + +## Connection Procedures + +## Queue Rotation + +## Functions diff --git a/spec/agent.md b/spec/agent.md new file mode 100644 index 000000000..250bf2253 --- /dev/null +++ b/spec/agent.md @@ -0,0 +1,13 @@ +# SMP Agent + +> SMP agent implementation: duplex connections, queue rotation, ratchet sync, and notification subscriptions. + +## Duplex Connections + +## Queue Rotation + +## Ratchet Sync + +## Notification Subscriptions + +## Functions diff --git a/spec/compression.md b/spec/compression.md new file mode 100644 index 000000000..7e457438b --- /dev/null +++ b/spec/compression.md @@ -0,0 +1,7 @@ +# Compression + +> Compression support for SimpleX protocols. + +## Zstd + +## Functions diff --git a/spec/crypto-ratchet.md b/spec/crypto-ratchet.md new file mode 100644 index 000000000..de5af38a2 --- /dev/null +++ b/spec/crypto-ratchet.md @@ -0,0 +1,13 @@ +# Double Ratchet & PQDR + +> Implements the double ratchet algorithm with post-quantum extensions (PQDR). + +**Protocol reference**: [`protocol/pqdr.md`](../protocol/pqdr.md) + +## State + +## Transitions + +## Key Derivation + +## Functions diff --git a/spec/crypto-tls.md b/spec/crypto-tls.md new file mode 100644 index 000000000..9327ae69a --- /dev/null +++ b/spec/crypto-tls.md @@ -0,0 +1,11 @@ +# TLS & Certificate Chains + +> TLS session setup, certificate chain construction, and server identity validation. + +## TLS Setup + +## Certificate Validation + +## Trust Anchoring + +## Functions diff --git a/spec/crypto.md b/spec/crypto.md new file mode 100644 index 000000000..ec8fb0a49 --- /dev/null +++ b/spec/crypto.md @@ -0,0 +1,19 @@ +# Cryptographic Primitives + +> All cryptographic primitives used across SimpleX protocols. + +## Ed25519 + +## X25519 + +## NaCl + +## AES-GCM + +## SHA + +## HKDF + +## Key Generation + +## Functions diff --git a/spec/encoding.md b/spec/encoding.md new file mode 100644 index 000000000..2b8dded01 --- /dev/null +++ b/spec/encoding.md @@ -0,0 +1,11 @@ +# Encoding + +> Binary and string encoding used across all SimpleX protocols. + +## Binary Encoding + +## String Encoding + +## Parsers + +## Functions diff --git a/spec/ntf-protocol.md b/spec/ntf-protocol.md new file mode 100644 index 000000000..c826e7e72 --- /dev/null +++ b/spec/ntf-protocol.md @@ -0,0 +1,15 @@ +# NTF Protocol Implementation + +> Implements NTF commands, token registration, and subscription lifecycle for push notifications. + +**Protocol reference**: [`protocol/push-notifications.md`](../protocol/push-notifications.md) + +## Types + +## Commands + +## Token Lifecycle + +## Subscription Lifecycle + +## Functions diff --git a/spec/ntf-server.md b/spec/ntf-server.md new file mode 100644 index 000000000..4a39957e3 --- /dev/null +++ b/spec/ntf-server.md @@ -0,0 +1,11 @@ +# Notification Server + +> Notification server implementation: token management, subscriptions, and APNS integration. + +## Token Management + +## Subscription Management + +## APNS Integration + +## Functions diff --git a/spec/remote-control.md b/spec/remote-control.md new file mode 100644 index 000000000..5a064437c --- /dev/null +++ b/spec/remote-control.md @@ -0,0 +1,11 @@ +# Remote Control (XRCP) + +> XRCP implementation: discovery, invitation, and session management. + +## Discovery + +## Invitation + +## Session Management + +## Functions diff --git a/spec/security-invariants.md b/spec/security-invariants.md new file mode 100644 index 000000000..fc1323665 --- /dev/null +++ b/spec/security-invariants.md @@ -0,0 +1,19 @@ +# Security Invariants + +> Every security invariant with enforcement and test coverage. + +## Format + +``` +### SI-XX: [Name] +**Statement**: [Precise invariant] +**Threat**: [What attack this prevents] +**Actors**: [Which threat model actors are relevant] +**Enforced by**: [Qualified function names] — [how] +**Tested by**: [test module.function] or [MISSING TEST] +**Product rule**: PR-XX +``` + +## Invariants + + diff --git a/spec/smp-client.md b/spec/smp-client.md new file mode 100644 index 000000000..39ae87f9a --- /dev/null +++ b/spec/smp-client.md @@ -0,0 +1,11 @@ +# SMP Client + +> SMP client implementation: protocol operations, proxy relay, and reconnection logic. + +## Protocol Operations + +## Proxy Relay + +## Reconnection + +## Functions diff --git a/spec/smp-protocol.md b/spec/smp-protocol.md new file mode 100644 index 000000000..0def97941 --- /dev/null +++ b/spec/smp-protocol.md @@ -0,0 +1,13 @@ +# SMP Protocol Implementation + +> Implements SMP commands, types, and binary encoding for the SimpleX Messaging Protocol. + +**Protocol reference**: [`protocol/simplex-messaging.md`](../protocol/simplex-messaging.md) + +## Types + +## Commands + +## Encoding + +## Functions diff --git a/spec/smp-server.md b/spec/smp-server.md new file mode 100644 index 000000000..696d19067 --- /dev/null +++ b/spec/smp-server.md @@ -0,0 +1,13 @@ +# SMP Server + +> SMP server implementation: connection handling, queue operations, proxying, and control port. + +## Connection Handling + +## Queue Operations + +## Proxying + +## Control + +## Functions diff --git a/spec/storage-agent.md b/spec/storage-agent.md new file mode 100644 index 000000000..4ba4c414c --- /dev/null +++ b/spec/storage-agent.md @@ -0,0 +1,11 @@ +# Agent Storage + +> Agent storage backends: SQLite, Postgres, and migration framework. + +## SQLite Backend + +## Postgres Backend + +## Migration Framework + +## Functions diff --git a/spec/storage-server.md b/spec/storage-server.md new file mode 100644 index 000000000..b2dec1842 --- /dev/null +++ b/spec/storage-server.md @@ -0,0 +1,9 @@ +# Server Storage + +> Server storage backends: STM queues and message stores (STM, Journal, Postgres). + +## STM Queues + +## Message Stores (STM, Journal, Postgres) + +## Functions diff --git a/spec/transport-http2.md b/spec/transport-http2.md new file mode 100644 index 000000000..2594b8431 --- /dev/null +++ b/spec/transport-http2.md @@ -0,0 +1,13 @@ +# HTTP/2 Transport + +> HTTP/2 framing, client and server sessions, and file streaming for XFTP. + +## Framing + +## Client Sessions + +## Server Sessions + +## File Streaming + +## Functions diff --git a/spec/transport-websocket.md b/spec/transport-websocket.md new file mode 100644 index 000000000..182a43c47 --- /dev/null +++ b/spec/transport-websocket.md @@ -0,0 +1,7 @@ +# WebSocket Transport + +> WebSocket adapter for browser-based SimpleX clients. + +## Adapter + +## Functions diff --git a/spec/transport.md b/spec/transport.md new file mode 100644 index 000000000..0e50a67d9 --- /dev/null +++ b/spec/transport.md @@ -0,0 +1,11 @@ +# Transport Layer + +> Transport abstraction, handshake protocol, and block padding for metadata privacy. + +## Abstraction + +## Handshake Protocol + +## Block Padding + +## Functions diff --git a/spec/version.md b/spec/version.md new file mode 100644 index 000000000..f5b954534 --- /dev/null +++ b/spec/version.md @@ -0,0 +1,9 @@ +# Version Negotiation + +> Version ranges and compatibility checking for protocol evolution. + +## Version Ranges + +## Compatibility + +## Functions diff --git a/spec/xftp-client.md b/spec/xftp-client.md new file mode 100644 index 000000000..99306bb73 --- /dev/null +++ b/spec/xftp-client.md @@ -0,0 +1,11 @@ +# XFTP Client + +> XFTP client implementation: file operations, CLI interface, and agent integration. + +## File Operations + +## CLI + +## Agent + +## Functions diff --git a/spec/xftp-protocol.md b/spec/xftp-protocol.md new file mode 100644 index 000000000..26eb950be --- /dev/null +++ b/spec/xftp-protocol.md @@ -0,0 +1,13 @@ +# XFTP Protocol Implementation + +> Implements XFTP commands, types, and chunk operations for the SimpleX File Transfer Protocol. + +**Protocol reference**: [`protocol/xftp.md`](../protocol/xftp.md) + +## Types + +## Commands + +## Chunk Operations + +## Functions diff --git a/spec/xftp-server.md b/spec/xftp-server.md new file mode 100644 index 000000000..bdcbbb9aa --- /dev/null +++ b/spec/xftp-server.md @@ -0,0 +1,11 @@ +# XFTP Server + +> XFTP server implementation: chunk storage, recipient management, and control port. + +## Chunk Storage + +## Recipient Management + +## Control + +## Functions diff --git a/spec/xrcp-protocol.md b/spec/xrcp-protocol.md new file mode 100644 index 000000000..8f084f7ca --- /dev/null +++ b/spec/xrcp-protocol.md @@ -0,0 +1,13 @@ +# XRCP Protocol Implementation + +> Implements XRCP session handshake and commands for remote control of SimpleX clients. + +**Protocol reference**: [`protocol/xrcp.md`](../protocol/xrcp.md) + +## Types + +## Session Handshake + +## Commands + +## Functions From c7ff63743796ae554cc42190a49e9c3dab18eb8f Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Mon, 9 Mar 2026 10:29:12 +0000 Subject: [PATCH 02/61] update CODE.md --- contributing/CODE.md | 1 + 1 file changed, 1 insertion(+) diff --git a/contributing/CODE.md b/contributing/CODE.md index ab5d7efcc..eefe68f6c 100644 --- a/contributing/CODE.md +++ b/contributing/CODE.md @@ -92,6 +92,7 @@ cabal list-bin exe:smp-server ### Cabal Flags - `swift`: Enable Swift JSON format +- `use_crypton`: Use crypton in cryptostore (default: enabled) - `client_library`: Build without server code - `client_postgres`: Use PostgreSQL instead of SQLite for agent persistence - `server_postgres`: PostgreSQL support for server queue/notification store From 40875e319985b6dcec43a4a646aaaff7775ea751 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Mon, 9 Mar 2026 12:27:02 +0000 Subject: [PATCH 03/61] update --- contributing/PROJECT.md | 2 +- product/threat-model.md | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/contributing/PROJECT.md b/contributing/PROJECT.md index cda71597d..5e8f4c6b5 100644 --- a/contributing/PROJECT.md +++ b/contributing/PROJECT.md @@ -13,7 +13,7 @@ Key components: - **SMP Client**: Functional API with STM-based message delivery ([code](../src/Simplex/Messaging/Client.hs)). - **SMP Agent**: High-level duplex connections via multiple simplex queues with E2E encryption ([code](../src/Simplex/Messaging/Agent.hs)). Implements Agent-to-agent protocol ([code](../src/Simplex/Messaging/Agent/Protocol.hs), [spec](../protocol/agent-protocol.md)) via intermediary agent client ([code](../src/Simplex/Messaging/Agent/Client.hs)). - **XFTP**: SimpleX File Transfer Protocol, server and CLI client ([code](../src/Simplex/FileTransfer/), [spec](../protocol/xftp.md)). -- **XRCP**: SimpleX Remote Control Protocol ([code](`../src/Simplex/RemoteControl/`), [spec](../protocol/xrcp.md)). +- **XRCP**: SimpleX Remote Control Protocol ([code](../src/Simplex/RemoteControl/), [spec](../protocol/xrcp.md)). - **Notifications**: Push notifications server requires PostgreSQL ([code](../src/Simplex/Messaging/Notifications), [executable](../apps/ntf-server/)). Client protocol is used for clients to communicate with the server ([code](../src/Simplex/Messaging/Notifications/Protocol.hs), [spec](../protocol/push-notifications.md)). For subscribing to SMP notifications the server uses [lightweight SMP client](../src/Simplex/Messaging/Client/Agent.hs). ## Architecture diff --git a/product/threat-model.md b/product/threat-model.md index 511cdebcf..efc2effd0 100644 --- a/product/threat-model.md +++ b/product/threat-model.md @@ -5,6 +5,7 @@ Consistent with threat models in: - `protocol/overview-tjr.md` (cross-protocol) - `protocol/simplex-messaging.md` (SMP) +- `protocol/agent-protocol.md` (Agent: duplex connections, ratchet, queue rotation) - `protocol/xftp.md` (XFTP) - `protocol/xrcp.md` (XRCP) - `protocol/push-notifications.md` (notifications) From 0bba2efc4578fcabce00488da4b009c6aed29775 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Tue, 10 Mar 2026 22:29:51 +0000 Subject: [PATCH 04/61] add rcv services spec --- spec/README.md | 3 + spec/rcv-services.md | 741 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 744 insertions(+) create mode 100644 spec/rcv-services.md diff --git a/spec/README.md b/spec/README.md index 83ce5097c..7154aa957 100644 --- a/spec/README.md +++ b/spec/README.md @@ -60,5 +60,8 @@ Function documentation format: - [remote-control.md](remote-control.md) — XRCP implementation - [compression.md](compression.md) — Zstd compression +### Cross-cutting Features +- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) + ### Security - [security-invariants.md](security-invariants.md) — All security invariants diff --git a/spec/rcv-services.md b/spec/rcv-services.md new file mode 100644 index 000000000..b0d97d9f7 --- /dev/null +++ b/spec/rcv-services.md @@ -0,0 +1,741 @@ +# Receive Services (Service Certificates) + +> Cross-cutting specification for the rcv-services feature: service certificates enabling high-volume SMP clients (notification routers, chat relays, directory services) to bulk-subscribe to queues. + +**Source branch**: `rcv-services` +**Protocol reference**: [`protocol/simplex-messaging.md`](../protocol/simplex-messaging.md) +**Phase**: 3.0a (Protocol + Transport + Server), 3.0b (Client + Agent + Store + NTF) + +## Overview + +A **service client** is a high-volume SMP client that presents a TLS client certificate during handshake. The server assigns it a persistent `ServiceId` derived from the certificate fingerprint. Individual queues are then **associated** with this ServiceId via per-queue `SUB` commands carrying a service signature. Once associated, the service client can **bulk-subscribe** all its queues in a single `SUBS` command instead of issuing per-queue `SUB` commands on each reconnection. + +This matters for notification servers, chat relays, and directory services that manage thousands to millions of queues per SMP server. Without service certificates, reconnection requires O(n) SUB commands; with them, it requires O(1) SUBS. + +### Design summary + +``` +Service client SMP Server + | | + |---- TLS + service cert --------->| Three-way handshake + |<--- ServiceId -------------------| (Transport layer) + | | + |---- SUB + service sig ---------->| Per-queue association + |<--- SOK(ServiceId) --------------| (Protocol layer, one-time) + | ...repeat per queue... | + | | + |---- SUBS count idsHash --------->| Bulk subscribe + |<--- SOKS count' idsHash' --------| (count/hash from server) + |<--- MSG ... MSG ... MSG ---------| Buffered messages + |<--- ALLS ------------------------| All delivered + | | +``` + +## Version gates + +| Constant | Value | Gate | Source | +|----------|-------|------|--------| +| `serviceCertsSMPVersion` | 16 | Service handshake, `SOK`, `useServiceAuth` | Transport.hs:214 | +| `rcvServiceSMPVersion` | 19 | `SUBS`/`NSUBS` parameters, `SOKS`/`ENDS` idsHash, messaging service role in handshake | Transport.hs:223 | + +The two-version split means: +- v16-18 servers accept service certificates and per-queue `SUB` with service auth, but `SUBS`/`NSUBS` send no count/hash parameters (bare command tag only). +- v19+ servers send and receive full count + idsHash with `SUBS`/`NSUBS`/`SOKS`/`ENDS`. +- Messaging services (`SRMessaging`) are only included in the client handshake at v >= 19. Notifier services (`SRNotifier`) are included at v >= 16. + +## Types + +### ServiceId + +`ServiceId` is an `EntityId` (24-byte base64url-encoded identifier) assigned by the server during the three-way handshake. It is derived from the service certificate fingerprint via `getCreateService` in QueueStore. + +### SMPServiceRole + +```haskell +data SMPServiceRole = SRMessaging | SRNotifier | SRProxy +-- Wire: "M" | "N" | "P" +``` +Source: Transport.hs:594 + +### Party (service-related constructors) + +```haskell +data Party = ... | RecipientService | NotifierService | ... +``` +Source: Protocol.hs:335-346 + +The `ServiceParty` type family constrains to `RecipientService | NotifierService` only: +```haskell +type family ServiceParty (p :: Party) :: Constraint where + ServiceParty RecipientService = () + ServiceParty NotifierService = () + ServiceParty p = (Int ~ Bool, TypeError ...) -- compile-time error +``` +Source: Protocol.hs:430-434 + +### IdsHash + +16-byte XOR of MD5 hashes, used for drift detection between client and server subscription state. + +```haskell +newtype IdsHash = IdsHash {unIdsHash :: BS.ByteString} + +instance Semigroup IdsHash where + (IdsHash s1) <> (IdsHash s2) = IdsHash $! BS.pack $ BS.zipWith xor s1 s2 + +instance Monoid IdsHash where + mempty = IdsHash $ BS.replicate 16 0 + +queueIdHash :: QueueId -> IdsHash +queueIdHash = IdsHash . C.md5Hash . unEntityId +``` +Source: Protocol.hs:1501-1526 + +**Key property**: XOR is self-inverse, so `addServiceSubs` and `subtractServiceSubs` both use `<>` (XOR) for the hash component: +```haskell +addServiceSubs (n', idsHash') (n, idsHash) = (n + n', idsHash <> idsHash') +subtractServiceSubs (n', idsHash') (n, idsHash) + | n > n' = (n - n', idsHash <> idsHash') + | otherwise = (0, mempty) +``` +Source: Protocol.hs:1528-1534 + +### ServiceSub / ServiceSubResult / ServiceSubError + +Client-side types for comparing expected vs actual subscription state: +```haskell +data ServiceSub = ServiceSub + { smpServiceId :: ServiceId, + smpQueueCount :: Int64, + smpQueueIdsHash :: IdsHash } + +data ServiceSubResult = ServiceSubResult (Maybe ServiceSubError) ServiceSub + +data ServiceSubError + = SSErrorServiceId {expectedServiceId, subscribedServiceId :: ServiceId} + | SSErrorQueueCount {expectedQueueCount, subscribedQueueCount :: Int64} + | SSErrorQueueIdsHash {expectedQueueIdsHash, subscribedQueueIdsHash :: IdsHash} +``` +Source: Protocol.hs:1476-1499 + +`serviceSubResult` compares expected vs actual, returning the first mismatch (priority: serviceId > count > idsHash). + +### STMService (QueueStore) + +```haskell +data STMService = STMService + { serviceRec :: ServiceRec, + serviceRcvQueues :: TVar (Set RecipientId, IdsHash), + serviceNtfQueues :: TVar (Set NotifierId, IdsHash) } +``` +Source: QueueStore/STM.hs:64-68 + +Tracks the set of queue IDs and their cumulative XOR hash per service, per role (receive vs notify). + +## Transport layer: service handshake + +### Three-way handshake + +Standard SMP handshake is two messages: server sends `SMPServerHandshake`, client sends `SMPClientHandshake`. Service clients extend this to three messages: + +1. **Server -> Client**: `SMPServerHandshake` (standard, with session ID and auth key) +2. **Client -> Server**: `SMPClientHandshake` with `clientService :: Maybe SMPClientHandshakeService` +3. **Server -> Client**: `SMPServerHandshakeResponse {serviceId}` or `SMPServerHandshakeError {handshakeError}` + +Source: Transport.hs:752-791 (server), Transport.hs:796-848 (client) + +### SMPClientHandshakeService + +```haskell +data SMPClientHandshakeService = SMPClientHandshakeService + { serviceRole :: SMPServiceRole, + serviceCertKey :: CertChainPubKey } +``` +Source: Transport.hs:582-585 + +The `serviceCertKey` contains the TLS client certificate chain and a proof-of-possession: the service's Ed25519 session key signed by the service's X.509 signing key (`C.signX509 serviceSignKey $ C.publicToX509 k`). + +### Server-side validation (`getClientService`) + +1. Verify certificate chain matches TLS peer certificate: `getPeerCertChain c == cc` +2. Extract identity certificate and service key from chain +3. Verify signed session key: `C.verifyX509 serviceCertKey exact` +4. Compute fingerprint: `XV.getFingerprint idCert X.HashSHA256` +5. Call `getService` callback (QueueStore.getCreateService) to get/create ServiceId +6. Send `SMPServerHandshakeResponse {serviceId}` back to client + +Source: Transport.hs:775-791 + +### Client-side reception (`getClientService`) + +Client receives either `SMPServerHandshakeResponse {serviceId}` (success) or `SMPServerHandshakeError {handshakeError}` (failure). On success, stores `THClientService {serviceId, serviceRole, serviceCertHash, serviceKey}`. + +Source: Transport.hs:843-847 + +### Version-gated service role filtering (`mkClientService`) + +```haskell +mkClientService v (ServiceCredentials {serviceRole, ...}, (k, _)) + | serviceRole == SRMessaging && v < rcvServiceSMPVersion = Nothing + | otherwise = Just SMPClientHandshakeService {..} +``` +Source: Transport.hs:838-842 + +Messaging services are suppressed below v19. Notifier services are sent at v16+. + +### ServiceCredentials (client-side persistent state) + +```haskell +data ServiceCredentials = ServiceCredentials + { serviceRole :: SMPServiceRole, + serviceCreds :: T.Credential, -- TLS certificate + private key + serviceCertHash :: XV.Fingerprint, + serviceSignKey :: C.APrivateSignKey } +``` +Source: Transport.hs:587-592 + +## Protocol layer: commands and messages + +### Commands + +| Command | Party | Entity | Auth | Description | +|---------|-------|--------|------|-------------| +| `SUB` | Recipient | QueueId | Queue key + optional service sig | Subscribe single queue; if service sig present, associates queue with service | +| `NSUB` | Notifier | NotifierId | Queue key + optional service sig | Subscribe single notifier; if service sig present, associates with service | +| `NEW` | Creator | NoEntity | Queue key + optional service sig | Create queue; if service sig present, associates at creation | +| `SUBS count idsHash` | RecipientService | ServiceId | Service session key | Bulk-subscribe all associated receive queues | +| `NSUBS count idsHash` | NotifierService | ServiceId | Service session key | Bulk-subscribe all associated notifier queues | + +### Double authenticator (`useServiceAuth`) + +Only `NEW`, `SUB`, and `NSUB` carry a service signature (when sent from a service connection): +```haskell +useServiceAuth = \case + Cmd _ (NEW _) -> True + Cmd _ SUB -> True + Cmd _ NSUB -> True + _ -> False +``` +Source: Protocol.hs:1737-1742 + +For these commands, `tEncodeAuth` appends both the primary queue key signature and an optional service Ed25519 signature. `SUBS`/`NSUBS` use the ServiceId as entity and are signed only by the service session key. + +### Broker messages (responses) + +| Message | Fields | Description | +|---------|--------|-------------| +| `SOK` | `Maybe ServiceId` | Per-queue subscription success; `Just serviceId` when queue was associated with service | +| `SOKS` | `Int64, IdsHash` | Bulk subscription success; server's actual count and hash | +| `ALLS` | (none) | Marker: all buffered messages for this SUBS have been delivered | +| `END` | (none) | Per-queue subscription ended (another client subscribed) | +| `ENDS` | `Int64, IdsHash` | Service subscription ended (another service client took over); server's count and hash at takeover time | + +### Wire encoding (version-dependent) + +**SUBS/NSUBS encoding:** +``` +v >= 19: tag SP count idsHash +v < 19: tag (bare, no parameters) +``` +Source: Protocol.hs:1769-1771, 1787-1789 + +**SOKS/ENDS encoding:** +``` +v >= 19: tag SP count idsHash +v < 19: tag SP count (no idsHash) +``` +Source: Protocol.hs:1951-1953 + +**SOKS/ENDS decoding:** +``` +v >= 19: tag -> resp <$> _smpP <*> smpP (count + idsHash) +v < 19: tag -> resp <$> _smpP <*> pure mempty (count only, mempty hash) +``` +Source: Protocol.hs:1996-1998 + +## Server layer + +### Client state (Env/STM.hs) + +Each connected client tracks: +```haskell +data Client s = Client + { ... + serviceSubscribed :: TVar Bool, -- has SUBS been received? + ntfServiceSubscribed :: TVar Bool, -- has NSUBS been received? + serviceSubsCount :: TVar (Int64, IdsHash), -- running (count, hash) for receive queues + ntfServiceSubsCount :: TVar (Int64, IdsHash), -- running (count, hash) for notifier queues + ... } +``` +Source: Env/STM.hs:437-456 + +Server-global state: +```haskell +data ServerSubscribers s = ServerSubscribers + { subQ :: TQueue (ClientSub, ClientId), + queueSubscribers :: SubscribedClients s, -- per-queue lookup + serviceSubscribers :: SubscribedClients s, -- per-service lookup + totalServiceSubs :: TVar (Int64, IdsHash), -- global service sub count + subClients :: TVar IntSet, + pendingEvents :: TVar (IntMap (NonEmpty (EntityId, BrokerMsg))) } +``` +Source: Env/STM.hs:362-369 + +### ClientSub events + +```haskell +data ClientSub + = CSClient QueueId (Maybe ServiceId) (Maybe ServiceId) -- prev and new service IDs + | CSDeleted QueueId (Maybe ServiceId) -- prev service ID + | CSService ServiceId (Int64, IdsHash) -- service subscription change +``` +Source: Env/STM.hs:426-429 + +These are enqueued into `subQ` and processed by `serverThread` (the subscription event loop). + +### SUBS command flow + +``` +Client sends SUBS count idsHash + | + v +subscribeServiceMessages(serviceId, (count, idsHash)) Server.hs:1800 + | + +-- sharedSubscribeService(SRecipientService, ...) Server.hs:1849 + | | + | +-- If already subscribed: return cached (count, hash) + | | + | +-- First time: + | +-- getServiceQueueCountHash(party, serviceId) QueueStore + | | -> returns server's actual (count', idsHash') + | | + | +-- atomically: + | | writeTVar clientServiceSubscribed True + | | writeTVar clientServiceSubs (count', idsHash') + | | + | +-- Compute drift stats: + | | count == -1 && match -> srvSubOk++ (old NTF server) + | | diff > 0 -> srvSubMore++ (server has more) + | | diff < 0 -> srvSubFewer++ (server has fewer) + | | otherwise -> srvSubDiff++ (count match, hash mismatch) + | | + | +-- Enqueue CSService event to subQ + | + +-- If not already subscribed: + | fork "deliverServiceMessages" Server.hs:1806 + | | + | +-- foldRcvServiceMessages(serviceId, deliverQueueMsg, acc) + | | MsgStore + | +-- For each queue in service: + | | +-- Read queue record + first pending message + | | +-- Call deliverQueueMsg(acc, rId, result) Server.hs:1822 + | | | + | | +-- Error -> accumulate ERR + | | +-- No message -> skip + | | +-- Has message: + | | +-- getSubscription(rId) Server.hs:1835 + | | | If sub exists -> Nothing (skip, already delivering) + | | | Else -> create new Sub, insert in subscriptions + | | +-- setDelivered sub msg + | | +-- writeTBQueue msgQ [(corrId, rId, MSG ...)] + | | + | +-- After fold: write ALLS to msgQ + | + +-- Return SOKS count' idsHash' +``` + +### Per-queue SUB with service association + +`sharedSubscribeQueue` handles four cases (Server.hs:1738-1798): + +**Case 1: Service client, queue already associated with this service** (`queueServiceId == Just serviceId`) +- Duplicate association (retry after timeout/error) +- If no service sub exists yet, increment service queue count and enqueue CSClient +- Stats: `srvAssocDuplicate++` + +**Case 2: Service client, queue not yet associated** (new or different service) +- Call `setQueueService(queue, party, Just serviceId)` to update QueueStore +- Increment client's `serviceSubsCount` by `(1, queueIdHash rId)` +- Enqueue CSClient event +- Stats: `srvAssocNew++` or `srvAssocUpdated++` + +**Case 3: Non-service client, queue has service association** (downgrade) +- Call `setQueueService(queue, party, Nothing)` to remove association +- Stats: `srvAssocRemoved++` +- Create normal per-queue subscription + +**Case 4: Non-service client, no service association** (standard SUB) +- Create/return per-queue subscription as normal + +### Message delivery for service queues + +When a new message arrives for a queue (`tryDeliverMessage`, Server.hs:1985-2024): + +```haskell +getSubscribed = case rcvServiceId qr of + Just serviceId -> getSubscribedClient serviceId $ serviceSubscribers subscribers + Nothing -> getSubscribedClient rId $ queueSubscribers subscribers +``` + +If the queue has `rcvServiceId`, the server looks up the subscriber in `serviceSubscribers` (by ServiceId) rather than `queueSubscribers` (by QueueId). + +**On-demand Sub creation** (`newServiceDeliverySub`, Server.hs:2019-2024): When a message arrives for a service queue but no `Sub` exists in the client's `subscriptions` TMap, one is created on the fly. This handles messages arriving after SUBS but before the fold reaches that queue. + +### serverThread subscription event loop + +`serverThread` (Server.hs:250-351) processes `ClientSub` events from `subQ`: + +**CSClient** (per-queue subscription): +- If service association changed: end previous service subscription for that queue +- If new service: increment `totalServiceSubs`, end any per-queue subscriber, cancel previous service subscriber +- If no service: standard per-queue upsert + +**CSDeleted** (queue deletion): +- End both queue and service subscriptions + +**CSService** (bulk SUBS): +- Subtract changed subs from `totalServiceSubs` (because the client already has them counted) +- Cancel previous service subscriber for this ServiceId (sends ENDS to old client) + +**Service takeover** (`cancelServiceSubs`, Server.hs:317-321): +When a new service client subscribes (same ServiceId), the previous client's service subs are zeroed out: +```haskell +cancelServiceSubs serviceId = checkAnotherClient $ \c -> do + changedSubs <- swapTVar (clientServiceSubs c) (0, mempty) + pure [(c, CSADecreaseSubs changedSubs, (serviceId, ENDS n idsHash))] +``` +The previous client receives `ENDS count idsHash`. + +### Client disconnect cleanup + +`clientDisconnected` (Server.hs:1090-1121): +1. Set `connected = False` +2. Swap out all subscriptions and ntf subscriptions (clear TMap) +3. Cancel per-queue Subs +4. Update `queueSubscribers` (delete per-queue entries) and `serviceSubscribers` (delete service entry) +5. Subtract client's `serviceSubsCount` from `totalServiceSubs` +6. Kill delivery threads + +**Queue-service associations persist**: Only live subscription state is cleaned up. The `rcvServiceId` field on `QueueRec` and the `STMService` queue sets survive disconnect. On reconnection, `SUBS` resubscribes without re-associating. + +### Notification service subscription (`NSUBS`) + +`subscribeServiceNotifications` (Server.hs:1845-1847) is a thin wrapper around `sharedSubscribeService` with `SNotifierService` party. Unlike `SUBS`, it does NOT fork a delivery thread -- notification delivery is handled by the separate `deliverNtfsThread`. + +`deliverNtfsThread` (Server.hs:353) periodically scans `subClients` (which includes service subscribers) and delivers pending notifications. + +## QueueStore layer + +### getCreateService + +Lookup by certificate fingerprint; create if not found (Server/QueueStore/STM.hs:284-310): +1. `TM.lookup fp serviceCerts` -- fast IO lookup +2. If miss: STM transaction to double-check and create +3. If hit: verify service role matches; error `SERVICE` on role mismatch +4. On new service: log via store log + +### setQueueService + +Updates the `rcvServiceId` (or `ntfServiceId`) field on a `QueueRec` and maintains the service's queue set (Server/QueueStore/STM.hs:312-338): +1. Read queue record +2. If same service -> no-op +3. If different: `removeServiceQueue` from old, `addServiceQueue` to new +4. Update `QueueRec` in-place + +### addServiceQueue / removeServiceQueue + +Both use `setServiceQueues_` which XORs the queue's `queueIdHash` into the service's running hash (Server/QueueStore/STM.hs:383-398): +```haskell +update (s, idsHash) = + let !s' = updateSet qId s -- Set insert/delete + !idsHash' = queueIdHash qId <> idsHash -- XOR (self-inverse) + in (s', idsHash') +``` + +## Test coverage + +### Existing tests (ServerTests.hs) + +| Test | Lines | What it covers | +|------|-------|----------------| +| `testServiceDeliverSubscribe` | 682-742 | Create queue as service, reconnect, SUBS, message delivery, ALLS | +| `testServiceUpgradeAndDowngrade` | 744-859 | Regular SUB -> service SUB -> SUBS -> downgrade back to regular SUB | +| `testMessageServiceNotifications` | 1313-1388 | NSUB with service, service takeover (ENDS), NSUBS bulk subscribe | +| `testServiceNotificationsTwoRestarts` | 1390-1434 | NSUBS persistence across two server restarts | + +### Test gaps + +| Gap | Severity | Description | +|-----|----------|-------------| +| **TG-SVC-01** | High | No concurrent SUBS + regular SUB on same queue -- race between fold delivery and per-queue subscription | +| **TG-SVC-02** | High | No queue deletion during SUBS fold -- what happens when a queue is deleted mid-fold? | +| **TG-SVC-03** | Medium | No duplicate SUBS test -- what if client sends SUBS twice? (code returns cached count) | +| **TG-SVC-04** | Medium | No drift detection verification -- no test checks that stats are actually logged on count/hash mismatch | +| **TG-SVC-05** | Medium | No SUBS with 0 queues -- edge case where service has no associated queues | +| **TG-SVC-06** | Medium | No concurrent message delivery during fold -- messages sent while fold is in progress | +| **TG-SVC-07** | Low | No large-scale test -- fold performance with 10k+ queues | +| **TG-SVC-08** | Low | No test for `subtractServiceSubs` underflow (`n <= n'` -> `(0, mempty)`) | + +## Security invariants + +| ID | Invariant | Enforced by | Test | +|----|-----------|-------------|------| +| **SI-SVC-01** | Service certificate must match TLS peer certificate | `getClientService`: `getPeerCertChain c == cc` | Implicit in all service tests | +| **SI-SVC-02** | Service session key proof-of-possession: signed by X.509 key | `C.verifyX509 serviceCertKey exact` in `getClientService` | Implicit | +| **SI-SVC-03** | Only NEW, SUB, NSUB carry service signature | `useServiceAuth` pattern match | testServiceDeliverSubscribe (ERR SERVICE on unsigned) | +| **SI-SVC-04** | SUBS/NSUBS require service session key, not queue key | Entity is ServiceId, auth is service key | testServiceDeliverSubscribe (ERR CMD NO_AUTH on wrong key) | +| **SI-SVC-05** | Service role mismatch rejected | `getCreateService`: role check -> `Left SERVICE` | testServiceDeliverSubscribe (ERR SERVICE on wrong role) | +| **SI-SVC-06** | Non-service client cannot send SUBS | `ERR SERVICE` when no service handshake | testServiceUpgradeAndDowngrade (ERR SERVICE on plain client) | +| **SI-SVC-07** | Queue-service associations persist across disconnect | `clientDisconnected` only clears live state | testServiceNotificationsTwoRestarts | +| **SI-SVC-08** | Service takeover sends ENDS to previous client | `cancelServiceSubs` -> ENDS | testMessageServiceNotifications | +| **SI-SVC-09** | Drift is informational only -- server never rejects | `sharedSubscribeService` logs stats, always returns subs | No direct test (TG-SVC-04) | + +## Identified risks + +| ID | Risk | Severity | Description | +|----|------|----------|-------------| +| **R-SVC-01** | Postgres fold full table scan | High | `foldRcvServiceMessages` (Postgres.hs:127-139) uses `ROW_NUMBER() OVER (PARTITION BY recipient_id ORDER BY message_id ASC)` as a subquery joined to `msg_queues`. This window function scans the **entire `messages` table** before filtering. For a service with 100k+ queues and millions of messages, this query can be very slow. The STM backend iterates an in-memory Set (fast), and the Journal backend uses per-queue file locks (moderate). Only the Postgres path has this scaling problem. Consider rewriting to use a lateral join or per-queue subquery to avoid the full-table window. | +| **R-SVC-02** | `totalServiceSubs` accounting drift | Low | `totalServiceSubs` is incremented by `serverThread` when processing CSClient events (line 281), but `clientDisconnected` subtracts the full `clientServiceSubs` (line 1120) which was eagerly updated by `sharedSubscribeQueue`. If CSClient events are still pending in `subQ` at disconnect time, `totalServiceSubs` is decremented for increments that never happened, causing negative drift. `totalServiceSubs` is never read for any decision (only written), so this is cosmetic. Resets on server restart. Consider periodic reconciliation or removing the counter if unused. | +| **R-SVC-03** | Fold thread continues after service takeover | Needs analysis | When a second service client connects (same cert), `cancelServiceSubs` sends ENDS to the old client. But the old client's `deliverServiceMessages` fold thread (forked via `forkClient`, tracked in `endThreads`) keeps running -- it writes MSG to the old client's `msgQ` (captured in closure). The old client receives and can ACK these messages. After ALLS the thread exits. New messages route to the new client via `tryDeliverMessage`. Questions: (1) Can the old client's ACKs interfere with the new client's subscription state? (2) If the old client disconnects mid-fold, `clientDisconnected` kills the fold thread (line 1111) -- are partially-delivered Subs cleaned up correctly? (3) Could the fold's `getSubscription` (which inserts into old client's `subscriptions`) conflict with the old client's subscription TMap being swapped out by `clientDisconnected`? | +| **R-SVC-04** | Cert rotation = full re-association | Medium (operational) | `getCreateService` maps cert fingerprint -> ServiceId. A new cert = new fingerprint = new ServiceId. All existing queue associations remain on the old ServiceId. The service must re-SUB every queue with the new service signature -- O(n), exactly the cost SUBS was designed to avoid. Old fingerprint->ServiceId mappings remain in memory/DB (no GC). For a notification server with millions of queues, cert rotation means a full re-association storm. | +| **R-SVC-05** | Fold blocking | Low | `foldRcvServiceMessages` iterates all service queues sequentially, reading queue records and first messages. For services with many queues, this could take significant time. It runs in a forked thread, so it doesn't block the client's command processing, but the ALLS marker is delayed. No progress signal between SOKS and ALLS -- client doesn't know how many messages to expect. | +| **R-SVC-06** | XOR hash collision | Very Low | IdsHash uses XOR of MD5 hashes. XOR is commutative and associative, so different queue sets with the same XOR-combined hash would not be detected. Given 16-byte hashes, collision probability is negligible for realistic queue counts, but the hash provides no ordering information. | +| **R-SVC-07** | Count underflow in subtractServiceSubs | Very Low | If `n <= n'`, the function returns `(0, mempty)` -- a full reset. This is a defensive fallback but could mask accounting errors. | + +### Considered and dismissed + +- **Fold-delivery race**: Both the fold's `getSubscription` (Server.hs:1828) and `newServiceDeliverySub` (Server.hs:1999-2023) operate on the same `subscriptions clnt` TMap within `atomically` blocks. STM serialization ensures at most one creates the Sub; the other sees it and skips. No race exists. +- **Sub accumulation during fold**: Each service queue with a pending message gets a Sub created in the client's `subscriptions` TMap. This is necessary and correct -- the Sub holds the `delivered` TVar for ACK verification and `subThread` for delivery state. Without per-queue Subs the server cannot track what was delivered or verify ACKs. Subs are cleaned on ACK or disconnect. +- **Store log replay ordering**: `writeQueueStore` writes all services before queues. `addQueue_` (QueueStore/STM.hs:119-132) calls `addServiceQueue` when `rcvServiceId` is present in QueueRec, so snapshot replay correctly rebuilds STMService queue sets. Incremental `QueueService` log entries are always preceded by `NewService` because the handshake (which creates the service) happens before SUB (which associates queues). No ordering issue. + +--- + +## SMP Client layer (Client.hs) + +### Service subscription command + +```haskell +subscribeService :: (PartyI p, ServiceParty p) => SMPClient -> SParty p -> Int64 -> IdsHash -> ExceptT SMPClientError IO ServiceSub +subscribeService c party n idsHash = case smpClientService c of + Just THClientService {serviceId, serviceKey} -> do + sendSMPCommand c NRMBackground (Just (C.APrivateAuthKey C.SEd25519 serviceKey)) serviceId subCmd >>= \case + SOKS n' idsHash' -> pure $ ServiceSub serviceId n' idsHash' + r -> throwE $ unexpectedResponse r + where subCmd = case party of + SRecipientService -> SUBS n idsHash + SNotifierService -> NSUBS n idsHash + Nothing -> throwE PCEServiceUnavailable +``` +Source: Client.hs:921-934 + +Entity is `serviceId`, auth key is the service session key (Ed25519). The client passes its expected count and hash; the server returns its own. + +### Per-queue SUB with service + +`subscribeSMPQueue` (Client.hs:843-846) and `subscribeSMPQueues` (Client.hs:850-855) send `SUB` commands. The response handler `processSUBResponse_` (Client.hs:867-872) accepts both `OK` (no service) and `SOK serviceId_` (service-associated). + +`nsubResponse_` (Client.hs:914-918) does the same for `NSUB`. + +### Dual signature scheme (`authTransmission`) + +When `serviceAuth = True` and `useServiceAuth` returns True for the command (Client.hs:1385-1403): + +1. The entity key signs over `serviceCertHash || transmission` (not just transmission) +2. The service key signs over `transmission` alone + +This prevents MITM service substitution inside TLS: an attacker cannot replace the service certificate hash without invalidating the entity key signature. + +```haskell +(t', serviceSig) = case clientService =<< thAuth of + Just THClientService {serviceCertHash = XV.Fingerprint fp, serviceKey} | serviceAuth -> + (fp <> t, Just $ C.sign' serviceKey t) + _ -> (t, Nothing) +``` +Source: Client.hs:1398-1401 + +### Service runtime accessors + +```haskell +smpClientService :: SMPClient -> Maybe THClientService +smpClientService = thAuth . thParams >=> clientService + +smpClientServiceId :: SMPClient -> Maybe ServiceId +smpClientServiceId = fmap (\THClientService {serviceId} -> serviceId) . smpClientService +``` +Source: Client.hs:936-942 + +### Configuration + +`ProtocolClientConfig` (Client.hs:466-483) carries `serviceCredentials :: Maybe ServiceCredentials`. On handshake, the client generates a fresh Ed25519 key pair per connection and signs it with the service's X.509 key (via `mkClientService`). + +`serviceAuth` flag is set to `thVersion >= serviceCertsSMPVersion` (Client.hs:230), enabling dual signatures for all commands on v16+ connections. + +## Agent layer + +### Agent events + +Four service-specific events (Agent/Protocol.hs:401-404): + +| Event | Payload | When | +|-------|---------|------| +| `SERVICE_UP` | `SMPServer, ServiceSubResult` | SUBS succeeded; carries drift info | +| `SERVICE_DOWN` | `SMPServer, ServiceSub` | Server disconnected while service was subscribed | +| `SERVICE_ALL` | `SMPServer` | ALLS received — all buffered messages delivered | +| `SERVICE_END` | `SMPServer, ServiceSub` | ENDS received — another service client took over | + +### Service subscription flow (`Agent/Client.hs`) + +``` +subscribeClientService(c, withEvent, userId, srv, serviceSub) Client.hs:1743 + | + +-- withServiceClient(c, tSess, ...) Client.hs:1752 + | | + | +-- Get SMPClient for tSess + | +-- Check smpClientServiceId is Just -> smpServiceId + | + +-- setPendingServiceSub(tSess, serviceSub, currentSubs) TSessionSubs + | + +-- subscribeClientService_(c, withEvent, tSess, smp, serviceSub) Client.hs:1760 + | + +-- subscribeService smp SRecipientService n idsHash -> ServiceSub + +-- serviceSubResult expected subscribed -> ServiceSubResult + +-- atomically: setActiveServiceSub(tSess, sessId, subscribed) + +-- if withEvent: notify SERVICE_UP srv result +``` + +### Reconnection / resubscription (`Agent/Client.hs:1727-1740`) + +On service subscription failure during resubscription: +- `SSErrorServiceId` (server returned different ServiceId): fall back to `unassocSubscribeQueues` — removes all service associations for this server and resubscribes queues individually +- `clientServiceError`: same fallback +- Other errors: propagated + +### Startup subscription (`Agent.hs:1622-1641`) + +At agent startup, `subscribeService` is called in parallel per server. On `SSErrorServiceId` or `SSErrorQueueCount {n > 0, n' == 0}` (service exists but has no queues): falls back to unassociating queues and resubscribing individually. + +### Server disconnection (`Agent/Client.hs:787-800`) + +`serverDown` emits `SERVICE_DOWN`, then resubscribes: +- If session mode matches: full `resubscribeSMPSession` +- Otherwise: `resubscribeClientService` for service, then `subscribeQueues` for individual queues + +## TSessionSubs (Agent/TSessionSubs.hs) + +Per-session subscription state tracking, ~264 lines. + +```haskell +data SessSubs = SessSubs + { subsSessId :: TVar (Maybe SessionId), + activeSubs :: TMap RecipientId RcvQueueSub, + pendingSubs :: TMap RecipientId RcvQueueSub, + activeServiceSub :: TVar (Maybe ServiceSub), + pendingServiceSub :: TVar (Maybe ServiceSub) } +``` +Source: TSessionSubs.hs:59-65 + +Key operations: +- `setPendingServiceSub`: stores expected ServiceSub before SUBS is sent +- `setActiveServiceSub`: promotes to active after SOKS, validates session ID +- `updateActiveService`: increments count/hash when per-queue SUBs with service signature succeed (used by `Client/Agent.hs` when individual SUBs return `SOK(Just serviceId)`) +- `deleteServiceSub`: clears both active and pending (on ENDS) + +## Agent Store (AgentStore.hs) + +### `client_services` table + +```sql +CREATE TABLE client_services( + user_id INTEGER NOT NULL REFERENCES users ON DELETE CASCADE, + host TEXT NOT NULL, port TEXT NOT NULL, + server_key_hash BLOB, + service_cert BLOB NOT NULL, + service_cert_hash BLOB NOT NULL, + service_priv_key BLOB NOT NULL, + service_id BLOB, -- assigned by server, NULL until first handshake + service_queue_count INTEGER NOT NULL DEFAULT 0, + service_queue_ids_hash BLOB NOT NULL DEFAULT x'00000000000000000000000000000000' +); +``` +Source: Agent/Store/SQLite/Migrations/M20260115_service_certs.hs:11-23 + +### `rcv_queues.rcv_service_assoc` + +Boolean column added to `rcv_queues`. When set, the queue is associated with the service for this server. SQLite triggers automatically maintain `service_queue_count` and `service_queue_ids_hash` on insert/delete/update of `rcv_queues` rows. + +Triggers: `tr_rcv_queue_insert`, `tr_rcv_queue_delete`, `tr_rcv_queue_update_remove`, `tr_rcv_queue_update_add` (same migration file, lines 30-76). All use `simplex_xor_md5_combine` — the SQLite equivalent of Haskell's `queueIdHash <>`. + +### Key CRUD operations + +| Function | What it does | +|----------|--------------| +| `getClientServiceCredentials` | Load cert + key for a server; returns `Maybe ((KeyHash, TLS.Credential), Maybe ServiceId)` | +| `getSubscriptionService` | Load `ServiceSub` (serviceId, count, hash) for reconnection | +| `setClientServiceId` | Store ServiceId after first handshake | +| `setRcvServiceAssocs` | Mark queues as service-associated (sets `rcv_service_assoc = 1`) | +| `removeRcvServiceAssocs` | Remove service association for all queues on a server | +| `unassocUserServerRcvQueueSubs` | Remove association and return queues for re-subscription | + +Source: AgentStore.hs:419-494, 2378-2414 + +### Service ID nullification on cert change + +`INSERT ... ON CONFLICT DO UPDATE SET ... service_id = NULL` (AgentStore.hs:429) — when service credentials are updated (new cert), the stored `service_id` is cleared, forcing a new handshake to get a fresh ServiceId. + +## Notification server (Notifications/Server.hs) + +The NTF server is the primary consumer of service certificates for `SRNotifier` role. + +### Configuration + +`NtfServerConfig.useServiceCreds :: Bool` (Env.hs:80) — controls whether the NTF server uses service certificates for SMP subscriptions. + +### Credential generation + +On first use per SMP server, `mkDbService` (Env.hs:126-142) generates a self-signed TLS certificate (valid ~2400 days) and stores it in the `smp_servers` table. The cert is reused across connections to the same SMP server. + +### Startup subscription + +`subscribeSrvSubs` (Server.hs:460-481): +1. If service credentials exist: send NSUBS first (one command for all associated queues) +2. Then subscribe remaining individual queues in batches via `subscribeQueuesNtfs` + +### Event handling + +| Event | Handler | +|-------|---------| +| `CAServiceSubscribed` | Log count/hash match or mismatch | +| `CAServiceDisconnected` | Log disconnection | +| `CAServiceSubError` | Log error (non-fatal; fatal errors go to `CAServiceUnavailable`) | +| `CAServiceUnavailable` | **Critical recovery path**: calls `removeServiceAndAssociations`, wipes service creds, resubscribes all queues individually | + +Source: Server.hs:567-602 + +### `removeServiceAndAssociations` (Store/Postgres.hs:620-652) + +Nuclear recovery: clears `ntf_service_id`, `ntf_service_cert*`, resets `smp_notifier_count`/`smp_notifier_ids_hash`, and removes all `ntf_service_assoc` flags from subscriptions. Used when the service subscription is irrecoverably broken (e.g., ServiceId mismatch after cert rotation). + +### NTF Postgres schema + +The `smp_servers` table stores per-SMP-server state: +- `ntf_service_id`, `ntf_service_cert`, `ntf_service_cert_hash`, `ntf_service_priv_key` — service identity +- `smp_notifier_count`, `smp_notifier_ids_hash` — maintained by Postgres triggers on the `subscriptions` table + +Triggers use `xor_combine` (Postgres equivalent of XOR hash combine) and fire on `ntf_service_assoc` changes. + +## Agent test coverage + +### Existing tests + +| Test | File | What it covers | +|------|------|----------------| +| `testMigrateToServiceSubscriptions` | AgentTests/NotificationTests.hs:930-1016 | Full lifecycle: no service -> enable service (creates association) -> use service (NSUBS) -> disable service (downgrade to individual) -> re-enable | + +### Additional test gaps (Phase 3.0b) + +| Gap | Severity | Description | +|-----|----------|-------------| +| **TG-SVC-09** | Medium | No agent-level test for `SSErrorServiceId` recovery — the `unassocQueues` fallback path | +| **TG-SVC-10** | Medium | No agent-level test for concurrent reconnection — service resubscription racing with individual queue resubscription | +| **TG-SVC-11** | Medium | No test for `SERVICE_END` agent event handling — what does the agent do after receiving ENDS? | +| **TG-SVC-12** | Low | No test for SQLite trigger correctness — verifying `service_queue_count`/`service_queue_ids_hash` match expected values after insert/delete/update cycles | From ea2a62ab7e740515ac71546526eb0b246b15ef3f Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 07:32:57 +0000 Subject: [PATCH 05/61] more specs --- spec/compression.md | 81 ++++++++++- spec/encoding.md | 341 +++++++++++++++++++++++++++++++++++++++++++- spec/version.md | 194 ++++++++++++++++++++++++- 3 files changed, 609 insertions(+), 7 deletions(-) diff --git a/spec/compression.md b/spec/compression.md index 7e457438b..faa8c275f 100644 --- a/spec/compression.md +++ b/spec/compression.md @@ -1,7 +1,84 @@ # Compression -> Compression support for SimpleX protocols. +> Zstd compression for SimpleX protocol messages. -## Zstd +**Source file**: [`Compression.hs`](../src/Simplex/Messaging/Compression.hs) + +## Overview + +Optional Zstd compression for SMP message bodies. Short messages bypass compression entirely to avoid overhead. The `Compressed` type carries a tag byte indicating whether the payload is compressed or passthrough, making it self-describing on the wire. + +## Types + +### `Compressed` + +**Source**: `Compression.hs:17-22` + +```haskell +data Compressed + = Passthrough ByteString -- short messages, left intact + | Compressed Large -- Zstd-compressed, 2-byte length prefix +``` + +Wire encoding (`Compression.hs:30-38`): + +``` +Passthrough → '0' ++ smpEncode ByteString (1-byte tag + 1-byte length + data) +Compressed → '1' ++ smpEncode Large (1-byte tag + 2-byte length + data) +``` + +Tags are `'0'` (0x30) and `'1'` (0x31) — same ASCII convention as `Maybe` encoding. + +`Passthrough` uses standard `ByteString` encoding (max 255 bytes, 1-byte length prefix). `Compressed` uses `Large` encoding (max 65535 bytes, 2-byte Word16 length prefix), since compressed output can exceed 255 bytes for larger inputs. + +## Constants + +| Constant | Value | Purpose | Source | +|----------|-------|---------|--------| +| `maxLengthPassthrough` | 180 | Messages at or below this length are not compressed | `Compression.hs:24-25` | +| `compressionLevel` | 3 | Zstd compression level | `Compression.hs:27-28` | + +The 180-byte threshold was "sampled from real client data" — messages above this length show rapidly increasing compression ratio. Below 180 bytes, compression overhead (FFI call, dictionary-less Zstd startup) outweighs savings. ## Functions + +### `compress1` + +**Source**: `Compression.hs:40-43` + +```haskell +compress1 :: ByteString -> Compressed +``` + +Compress a message body: +- If `B.length bs <= 180` → `Passthrough bs` +- Otherwise → `Compressed (Large (Z1.compress 3 bs))` + +No context or dictionary — each message is independently compressed ("1" in `compress1` refers to single-shot compression). + +### `decompress1` + +**Source**: `Compression.hs:45-53` + +```haskell +decompress1 :: Int -> Compressed -> Either String ByteString +``` + +Decompress with size limit: +- `Passthrough bs` → `Right bs` (no check needed — already bounded by encoding) +- `Compressed (Large bs)` → check `Z1.decompressedSize bs`: + - If size is known and within `limit` → decompress + - If size unknown or exceeds `limit` → `Left` error + +The size limit check happens **before** decompression, using Zstd's frame header (which includes the decompressed size when the compressor wrote it). This prevents decompression bombs — an attacker cannot cause unbounded memory allocation by sending a small compressed payload that expands to gigabytes. + +The `Z1.decompress` result is pattern-matched for three cases: +- `Z1.Error e` → `Left e` +- `Z1.Skip` → `Right mempty` (zero-length output) +- `Z1.Decompress bs'` → `Right bs'` + +## Security notes + +- **Decompression bomb protection**: `decompress1` requires an explicit size limit and checks `decompressedSize` before allocating. Callers must pass an appropriate limit (typically the SMP block size). +- **No dictionary/context**: Each message is independently compressed. No shared state between messages that could leak information across compression boundaries. +- **Passthrough for short messages**: Messages ≤ 180 bytes are never compressed, avoiding timing side channels from compression ratio differences on short, potentially-predictable messages. diff --git a/spec/encoding.md b/spec/encoding.md index 2b8dded01..3a4fdcd27 100644 --- a/spec/encoding.md +++ b/spec/encoding.md @@ -2,10 +2,345 @@ > Binary and string encoding used across all SimpleX protocols. -## Binary Encoding +**Source files**: [`Encoding.hs`](../src/Simplex/Messaging/Encoding.hs), [`Encoding/String.hs`](../src/Simplex/Messaging/Encoding/String.hs), [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs) -## String Encoding +## Overview + +Two encoding layers serve different purposes: + +- **`Encoding`** — Binary wire format for SMP protocol transmissions. Compact, no delimiters between fields. Used in all on-the-wire protocol messages. +- **`StrEncoding`** — Human-readable string format for configuration, URIs, logs, and JSON serialization. Uses base64url for binary data, decimal for numbers, comma-separated lists, space-separated tuples. + +Both are typeclasses with `MINIMAL` pragmas requiring `encode` + (`decode` | `parser`), with the missing one derived from the other. + +## Binary Encoding (`Encoding` class) + +**Source**: `Encoding.hs:38-52` + +```haskell +class Encoding a where + smpEncode :: a -> ByteString + smpDecode :: ByteString -> Either String a -- default: parseAll smpP + smpP :: Parser a -- default: smpDecode <$?> smpP +``` + +### Length-prefix conventions + +| Type | Prefix | Max size | Source | +|------|--------|----------|--------| +| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | `Encoding.hs:102-106` | +| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | `Encoding.hs:135-143` | +| `Tail` (newtype) | None — consumes rest of input | Unlimited | `Encoding.hs:126-132` | +| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | `Encoding.hs:155-159` | +| `NonEmpty` | Same as list (fails on count=0) | 255 items | `Encoding.hs:173-178` | + +### Scalar types + +| Type | Encoding | Bytes | Source | +|------|----------|-------|--------| +| `Char` | Raw byte | 1 | `Encoding.hs:54-58` | +| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | `Encoding.hs:60-70` | +| `Word16` | Big-endian | 2 | `Encoding.hs:72-76` | +| `Word32` | Big-endian | 4 | `Encoding.hs:78-82` | +| `Int64` | Two big-endian Word32s (high then low) | 8 | `Encoding.hs:84-99` | +| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | `Encoding.hs:145-149` | +| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | `Encoding.hs:161-165` | +| `String` | `B.pack` then ByteString encoding | 1 + len | `Encoding.hs:167-171` | + +### `Maybe a` + +**Source**: `Encoding.hs:116-124` + +``` +Nothing → '0' (0x30) +Just x → '1' (0x31) ++ smpEncode x +``` + +Tags are ASCII characters `'0'`/`'1'`, not binary 0x00/0x01. + +### Tuples + +**Source**: `Encoding.hs:180-220` + +Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's `smpP`. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size). + +### Combinators + +| Function | Signature | Purpose | Source | +|----------|-----------|---------|--------| +| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | `Encoding.hs:151-152` | +| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | `Encoding.hs:155-156` | +| `smpListP` | `Parser [a]` | Parse count then that many items | `Encoding.hs:158-159` | +| `lenEncode` | `Int -> Char` | Int to single-byte length char | `Encoding.hs:108-110` | + +## String Encoding (`StrEncoding` class) + +**Source**: `Encoding/String.hs:56-67` + +```haskell +class StrEncoding a where + strEncode :: a -> ByteString + strDecode :: ByteString -> Either String a -- default: parseAll strP + strP :: Parser a -- default: strDecode <$?> base64urlP +``` + +Key difference from `Encoding`: the default `strP` parses base64url input first, then applies `strDecode`. This means types that only implement `strDecode` will automatically accept base64url-encoded input. + +### Instance conventions + +| Type | Encoding | Source | +|------|----------|--------| +| `ByteString` | base64url (non-empty required) | `String.hs:70-76` | +| `Word16`, `Word32` | Decimal string | `String.hs:114-124` | +| `Int`, `Int64` | Signed decimal | `String.hs:138-148` | +| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | `String.hs:126-136` | +| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | `String.hs:108-112` | +| `Text` | UTF-8 bytes, parsed until space/newline | `String.hs:97-99` | +| `SystemTime` | `systemSeconds` as Int64 (decimal) | `String.hs:150-152` | +| `UTCTime` | ISO 8601 string | `String.hs:154-156` | +| `CertificateChain` | Comma-separated base64url blobs | `String.hs:158-162` | +| `Fingerprint` | base64url of fingerprint bytes | `String.hs:164-168` | + +### Collection encoding + +| Type | Separator | Source | +|------|-----------|--------| +| Lists (`strEncodeList`) | Comma `,` | `String.hs:171-175` | +| `NonEmpty` | Comma (fails on empty) | `String.hs:178-180` | +| `Set a` | Comma | `String.hs:182-184` | +| `IntSet` | Comma | `String.hs:186-188` | +| Tuples (2-6) | Space (` `) | `String.hs:193-221` | + +### `Str` newtype + +**Source**: `String.hs:84-89` + +Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded. + +### `TextEncoding` class + +**Source**: `String.hs:51-53` + +```haskell +class TextEncoding a where + textEncode :: a -> Text + textDecode :: Text -> Maybe a +``` + +Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Used for types that need Text representation (e.g., enum display names). + +### JSON bridge functions + +| Function | Purpose | Source | +|----------|---------|--------| +| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | `String.hs:229-231` | +| `strToJEncoding` | Same, for Aeson encoding | `String.hs:233-235` | +| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | `String.hs:237-238` | +| `textToJSON` | `TextEncoding a => a -> J.Value` | `String.hs:240-242` | +| `textToEncoding` | Same, for Aeson encoding | `String.hs:244-246` | +| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | `String.hs:248-249` | ## Parsers -## Functions +**Source**: [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs) + +### Core parsing functions + +| Function | Signature | Purpose | Source | +|----------|-----------|---------|--------| +| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | `Parsers.hs:64-65` | +| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | `Parsers.hs:61-62` | +| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | `Parsers.hs:67-68` | +| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | `Parsers.hs:70-71` | +| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | `Parsers.hs:76-77` | +| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | `Parsers.hs:89-90` | + +### `base64P` + +**Source**: `Parsers.hs:44-53` + +Standard base64 parser (not base64url — uses `+`/`/` alphabet). Takes alphanumeric + `+`/`/` characters, optional `=` padding, then decodes. Contrast with `base64urlP` in `Encoding/String.hs` which uses `-`/`_` alphabet. + +### JSON options helpers + +Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift). + +| Function | Purpose | Source | +|----------|---------|--------| +| `enumJSON` | All-nullary constructors as strings, with tag modifier | `Parsers.hs:101-106` | +| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | `Parsers.hs:109-114` | +| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | `Parsers.hs:119-128` | +| `singleFieldJSON` | `{"Tag": value}` format | `Parsers.hs:137-149` | +| `defaultJSON` | Default options with `omitNothingFields = True` | `Parsers.hs:151-152` | + +Pattern synonyms for JSON field names: +- `TaggedObjectJSONTag = "type"` (`Parsers.hs:131`) +- `TaggedObjectJSONData = "data"` (`Parsers.hs:134`) +- `SingleFieldJSONTag = "_owsf"` (`Parsers.hs:117`) + +### String helpers + +| Function | Purpose | Source | +|----------|---------|--------| +| `fstToLower` | Lowercase first character | `Parsers.hs:92-94` | +| `dropPrefix` | Remove prefix string, lowercase remainder | `Parsers.hs:96-99` | +| `textP` | Parse rest of input as UTF-8 `String` | `Parsers.hs:154-155` | + +## Auxiliary Types and Utilities + +### TMap + +**Source**: [`TMap.hs`](../src/Simplex/Messaging/TMap.hs) + +```haskell +type TMap k a = TVar (Map k a) +``` + +STM-based concurrent map. Wraps `Data.Map.Strict` in a `TVar`. All mutations use `modifyTVar'` (strict) to prevent thunk accumulation. + +| Function | Notes | Source | +|----------|-------|--------| +| `emptyIO` | IO allocation (`newTVarIO`) | `TMap.hs:32-34` | +| `singleton` | STM allocation | `TMap.hs:36-38` | +| `clear` | Reset to empty | `TMap.hs:40-42` | +| `lookup` / `lookupIO` | STM / non-transactional IO read | `TMap.hs:48-54` | +| `member` / `memberIO` | STM / non-transactional IO membership | `TMap.hs:56-62` | +| `insert` / `insertM` | Insert value / insert from STM action | `TMap.hs:64-70` | +| `delete` | Remove key | `TMap.hs:72-74` | +| `lookupInsert` | Atomic lookup-then-insert (returns old value) | `TMap.hs:76-78` | +| `lookupDelete` | Atomic lookup-then-delete | `TMap.hs:80-82` | +| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | `TMap.hs:84-100` | +| `union` | Merge `Map` into `TMap` | `TMap.hs:102-104` | + +`lookupIO`/`memberIO` use `readTVarIO` — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations. + +### SessionVar + +**Source**: [`Session.hs`](../src/Simplex/Messaging/Session.hs) + +Race-safe session management using TMVar + monotonic ID. + +```haskell +data SessionVar a = SessionVar + { sessionVar :: TMVar a -- result slot + , sessionVarId :: Int -- monotonic ID from TVar counter + , sessionVarTs :: UTCTime -- creation timestamp + } +``` + +| Function | Purpose | Source | +|----------|---------|--------| +| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | `Session.hs:24-33` | +| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | `Session.hs:35-39` | +| `tryReadSessVar` | Non-blocking read of session result | `Session.hs:41-42` | + +The ID-match check in `removeSessVar` (`sessionVarId v == sessionVarId v'`) prevents a race where: +1. Thread A creates session #5, starts work +2. Thread B creates session #6 (replacing #5 in TMap) +3. Thread A finishes, tries to remove — ID mismatch, removal blocked + +### ServiceScheme + +**Source**: [`ServiceScheme.hs`](../src/Simplex/Messaging/ServiceScheme.hs) + +```haskell +data ServiceScheme = SSSimplex | SSAppServer SrvLoc +data SrvLoc = SrvLoc HostName ServiceName +``` + +URI scheme for SimpleX service addresses. `SSSimplex` encodes as `"simplex:"`, `SSAppServer` as `"https://host:port"`. + +`simplexChat :: ServiceScheme` is the constant `SSAppServer (SrvLoc "simplex.chat" "")` (`ServiceScheme.hs:38-39`). + +### SystemTime + +**Source**: [`SystemTime.hs`](../src/Simplex/Messaging/SystemTime.hs) + +```haskell +newtype RoundedSystemTime (t :: Nat) = RoundedSystemTime { roundedSeconds :: Int64 } +type SystemDate = RoundedSystemTime 86400 -- day precision +type SystemSeconds = RoundedSystemTime 1 -- second precision +``` + +Phantom-typed time rounding. The `Nat` type parameter specifies rounding granularity in seconds. + +| Function | Purpose | Source | +|----------|---------|--------| +| `getRoundedSystemTime` | Get current time rounded to `t` seconds | `SystemTime.hs:40-43` | +| `getSystemDate` | Alias for day-rounded time | `SystemTime.hs:45-47` | +| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | `SystemTime.hs:49-51` | +| `roundedToUTCTime` | Convert back to `UTCTime` | `SystemTime.hs:53-55` | + +`RoundedSystemTime` derives `FromField`/`ToField` for SQLite storage and `FromJSON`/`ToJSON` for API serialization. + +### Util + +**Source**: [`Util.hs`](../src/Simplex/Messaging/Util.hs) + +Selected utilities used across the codebase: + +**Monadic combinators**: + +| Function | Signature | Purpose | Source | +|----------|-----------|---------|--------| +| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | `Util.hs:119-121` | +| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | `Util.hs:165-167` | +| `ifM` / `whenM` / `unlessM` | Monadic conditionals | `Util.hs:147-157` | +| `anyM` | Short-circuit `any` for monadic predicates (strict) | `Util.hs:159-161` | + +**Error handling**: + +| Function | Purpose | Source | +|----------|---------|--------| +| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | `Util.hs:273-275` | +| `catchAllErrors` | Same with handler | `Util.hs:281-283` | +| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | `Util.hs:322-324` | +| `catchAllOwnErrors` | Same with handler | `Util.hs:330-332` | +| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | `Util.hs:297-304` | +| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | `Util.hs:306-310` | +| `catchThrow` | Catch exceptions, wrap in Left | `Util.hs:289-291` | +| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | `Util.hs:293-295` | + +The own-vs-async distinction is critical: `catchOwn`/`tryAllOwnErrors` never swallow async cancellation (`ThreadKilled`, `UserInterrupt`, etc.), only synchronous exceptions and resource exhaustion (`StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded`). + +**STM**: + +| Function | Purpose | Source | +|----------|---------|--------| +| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | `Util.hs:256-261` | + +**Database result helpers**: + +| Function | Purpose | Source | +|----------|---------|--------| +| `firstRow` | Extract first row with transform, or Left error | `Util.hs:346-347` | +| `maybeFirstRow` | Extract first row as Maybe | `Util.hs:349-350` | +| `firstRow'` | Like `firstRow` but transform can also fail | `Util.hs:355-356` | + +**Collection utilities**: + +| Function | Purpose | Source | +|----------|---------|--------| +| `groupOn` | `groupBy` using equality on projected key | `Util.hs:358-359` | +| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | `Util.hs:372-373` | +| `toChunks` | Split list into `NonEmpty` chunks of size n | `Util.hs:376-380` | +| `packZipWith` | Optimized ByteString zipWith (direct memory access) | `Util.hs:236-254` | + +**Miscellaneous**: + +| Function | Purpose | Source | +|----------|---------|--------| +| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | `Util.hs:382-386` | +| `bshow` / `tshow` | `show` to `ByteString` / `Text` | `Util.hs:123-129` | +| `threadDelay'` | `Int64` delay (handles overflow by looping) | `Util.hs:391-399` | +| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | `Util.hs:401-407` | +| `labelMyThread` | Label current thread for debugging | `Util.hs:409-410` | +| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | `Util.hs:415-421` | +| `traverseWithKey_` | `Map` traversal discarding results | `Util.hs:423-425` | + +## Security notes + +- **Length prefix overflow**: `ByteString` encoding uses 1-byte length — silently truncates strings > 255 bytes. Callers must ensure size bounds before encoding. `Large` extends to 65535 bytes via Word16 prefix. +- **`Tail` unbounded**: `Tail` consumes all remaining input with no size check. Only safe when total message size is already bounded (e.g., within a padded SMP block). +- **base64 vs base64url**: `Parsers.base64P` uses standard alphabet (`+`/`/`), while `String.base64urlP` uses URL-safe alphabet (`-`/`_`). Mixing them causes silent decode failures. +- **`safeDecodeUtf8`**: Replaces invalid UTF-8 with `'?'` rather than failing. Suitable for logging/display, not for security-critical string comparison. diff --git a/spec/version.md b/spec/version.md index f5b954534..6d9a23c09 100644 --- a/spec/version.md +++ b/spec/version.md @@ -2,8 +2,198 @@ > Version ranges and compatibility checking for protocol evolution. -## Version Ranges +**Source files**: [`Version.hs`](../src/Simplex/Messaging/Version.hs), [`Version/Internal.hs`](../src/Simplex/Messaging/Version/Internal.hs) -## Compatibility +## Overview + +All SimpleX protocols use version negotiation during handshake. Each party advertises a `VersionRange` (min..max supported), and negotiation produces a `Compatible` proof value if the ranges overlap — choosing the highest mutually-supported version. + +The `Compatible` newtype can only be constructed internally (constructor is not exported), so the type system enforces that compatibility was actually checked. + +## Types + +### `Version v` + +**Source**: `Version/Internal.hs:11-12` + +```haskell +newtype Version v = Version Word16 +``` + +Phantom-typed version number. The phantom `v` distinguishes version spaces (e.g., SMP versions vs Agent versions vs XFTP versions) at the type level, preventing accidental comparison across protocols. + +- `Encoding`: 2 bytes big-endian (via Word16 instance) +- `StrEncoding`: decimal string +- JSON: numeric value +- Derives: `Eq`, `Ord`, `Show` + +The constructor is exported from `Version.Internal` but not from `Version`, so application code cannot fabricate versions — they must come from protocol constants or parsing. + +### `VersionRange v` + +**Source**: `Version.hs:46-50` + +```haskell +data VersionRange v = VRange + { minVersion :: Version v + , maxVersion :: Version v + } +``` + +Invariant: `minVersion <= maxVersion` (enforced by smart constructors). + +The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only, `Version.hs:41-44`) is public. + +- `Encoding`: two Word16s concatenated (4 bytes total, `Version.hs:80-84`) +- `StrEncoding`: `"min-max"` or `"v"` if min == max (`Version.hs:86-93`) +- JSON: `{"minVersion": n, "maxVersion": n}` + +### `VersionScope v` + +**Source**: `Version.hs:64` + +```haskell +class VersionScope v +``` + +Empty typeclass used as a constraint on version operations. Each protocol declares its version scope: + +```haskell +instance VersionScope SMP +instance VersionScope Agent +``` + +This prevents accidentally mixing version ranges from different protocols in negotiation functions. + +### `Compatible a` + +**Source**: `Version.hs:117-122` + +```haskell +newtype Compatible a = Compatible_ a + +pattern Compatible :: a -> Compatible a +pattern Compatible a <- Compatible_ a +``` + +Proof that compatibility was checked. The `Compatible_` constructor is not exported — `Compatible` is a read-only pattern synonym. The only way to obtain a `Compatible` value is through `compatibleVersion`, `compatibleVRange`, `proveCompatible`, or the internal `mkCompatibleIf`. + +### `VersionI` / `VersionRangeI` type classes + +**Source**: `Version.hs:95-115` + +Multi-param typeclasses with functional dependencies for generic version/range operations. Allow extension types that wrap `Version` or `VersionRange` to participate in negotiation: + +```haskell +class VersionScope v => VersionI v a | a -> v where + type VersionRangeT v a -- associated type: range form + version :: a -> Version v + toVersionRangeT :: a -> VersionRange v -> VersionRangeT v a + +class VersionScope v => VersionRangeI v a | a -> v where + type VersionT v a -- associated type: version form + versionRange :: a -> VersionRange v + toVersionRange :: a -> VersionRange v -> a + toVersionT :: a -> Version v -> VersionT v a +``` + +Identity instances exist for `Version v` and `VersionRange v` themselves. ## Functions + +### Construction + +| Function | Signature | Purpose | Source | +|----------|-----------|---------|--------| +| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | `Version.hs:67-70` | +| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | `Version.hs:72-75` | +| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | `Version.hs:77-78` | + +### Compatibility checking + +#### `isCompatible` + +**Source**: `Version.hs:124-125` + +```haskell +isCompatible :: VersionI v a => a -> VersionRange v -> Bool +``` + +Check if a single version falls within a range. + +#### `isCompatibleRange` + +**Source**: `Version.hs:127-130` + +```haskell +isCompatibleRange :: VersionRangeI v a => a -> VersionRange v -> Bool +``` + +Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. + +#### `proveCompatible` + +**Source**: `Version.hs:132-133` + +```haskell +proveCompatible :: VersionI v a => a -> VersionRange v -> Maybe (Compatible a) +``` + +If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. + +### Negotiation + +#### `compatibleVersion` + +**Source**: `Version.hs:135-140` + +```haskell +compatibleVersion :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible (VersionT v a)) +``` + +Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. + +#### `compatibleVRange` + +**Source**: `Version.hs:143-148` + +```haskell +compatibleVRange :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible a) +``` + +Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty (i.e., ranges don't overlap). + +#### `compatibleVRange'` + +**Source**: `Version.hs:151-156` + +```haskell +compatibleVRange' :: VersionRangeI v a => a -> Version v -> Maybe (Compatible a) +``` + +Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. + +## Protocol version constants + +Version constants for each protocol are defined in their respective Transport modules. For SMP, key gates include: + +- `currentSMPAgentVersion`, `supportedSMPAgentVRange` — current negotiation range +- `serviceCertsSMPVersion = 16` — service certificate handshake +- `rcvServiceSMPVersion = 19` — service subscription commands + +See [`transport.md`](transport.md) and [`rcv-services.md`](rcv-services.md) for protocol-specific version constants. + +## Negotiation protocol + +During handshake: +1. Client sends its `VersionRange` to server +2. Server computes `compatibleVRange clientRange serverRange` +3. If `Nothing` → reject connection (incompatible) +4. If `Just (Compatible agreedRange)` → use `maxVersion agreedRange` as the effective protocol version + +The `Compatible` proof flows through the connection setup, ensuring all subsequent version-gated code paths have evidence that negotiation occurred. + +## Security notes + +- **No downgrade attack protection in negotiation itself** — an active MITM could modify the version range to force a lower version. Protection comes from the TLS layer (authentication prevents MITM) and from servers setting minimum version floors. +- **`mkVersionRange` uses `error`** — only safe for compile-time constants. Runtime construction must use `safeVersionRange`. From 66d7efa61ea03a771e555dce4216f9d8caf6191d Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 08:53:57 +0000 Subject: [PATCH 06/61] some modules documented --- spec/README.md | 99 +++--- spec/TOPICS.md | 5 + spec/encoding.md | 290 +++++++++--------- spec/modules/README.md | 155 ++++++++++ spec/modules/Simplex/Messaging/Compression.md | 17 + spec/modules/Simplex/Messaging/Encoding.md | 41 +++ .../Simplex/Messaging/Encoding/String.md | 40 +++ spec/modules/Simplex/Messaging/Parsers.md | 21 ++ .../Simplex/Messaging/ServiceScheme.md | 7 + spec/modules/Simplex/Messaging/Session.md | 15 + spec/modules/Simplex/Messaging/SystemTime.md | 13 + spec/modules/Simplex/Messaging/TMap.md | 17 + spec/modules/Simplex/Messaging/Util.md | 52 ++++ spec/modules/Simplex/Messaging/Version.md | 27 ++ .../Simplex/Messaging/Version/Internal.md | 7 + spec/rcv-services.md | 42 +-- spec/version.md | 62 ++-- 17 files changed, 632 insertions(+), 278 deletions(-) create mode 100644 spec/TOPICS.md create mode 100644 spec/modules/README.md create mode 100644 spec/modules/Simplex/Messaging/Compression.md create mode 100644 spec/modules/Simplex/Messaging/Encoding.md create mode 100644 spec/modules/Simplex/Messaging/Encoding/String.md create mode 100644 spec/modules/Simplex/Messaging/Parsers.md create mode 100644 spec/modules/Simplex/Messaging/ServiceScheme.md create mode 100644 spec/modules/Simplex/Messaging/Session.md create mode 100644 spec/modules/Simplex/Messaging/SystemTime.md create mode 100644 spec/modules/Simplex/Messaging/TMap.md create mode 100644 spec/modules/Simplex/Messaging/Util.md create mode 100644 spec/modules/Simplex/Messaging/Version.md create mode 100644 spec/modules/Simplex/Messaging/Version/Internal.md diff --git a/spec/README.md b/spec/README.md index 7154aa957..c993f108d 100644 --- a/spec/README.md +++ b/spec/README.md @@ -2,66 +2,73 @@ > How does the code work? What does each function do? What are the security invariants? -## Conventions +## Structure + +Spec has two levels: + +### `spec/modules/` — Per-module documentation + +Mirrors the `src/Simplex/` directory structure exactly. Each `.hs` file has a corresponding `.md` file at the same relative path. Contains only information that is **not obvious from reading the code** and cannot fit in a one-line source comment: + +- Non-obvious behavior (subtle invariants, ordering dependencies, concurrency assumptions) +- Usage considerations (when to use X vs Y, common mistakes, caller obligations) +- Relationships to other modules not visible from imports +- Security notes specific to this module + +**Not included**: type signatures, code snippets, function-by-function prose that restates the source. If reading the code tells you everything, the module doc says so briefly. -Each spec file documents: -1. **Purpose** — What this component does -2. **Protocol reference** — Link to `protocol/` file (where applicable) -3. **Types** — Key data types with field descriptions -4. **Functions** — Every exported function with call graph -5. **Security notes** — Trust assumptions, validation requirements +Function references use fully qualified names with markdown links: +``` +[Simplex.Messaging.Server.subscribeServiceMessages](./modules/Simplex/Messaging/Server.md#subscribeServiceMessages) +``` -Function documentation format: +Source code links back via comments: +```haskell +-- spec: spec/modules/Simplex/Messaging/Server.md#subscribeServiceMessages +subscribeServiceMessages :: ... ``` -### Module.functionName + +### `spec/` root — Topic documentation + +Cross-module documentation that follows a feature, mechanism, or concern across the entire stack. Topics answer "how does X work end-to-end?" rather than "what does this file do?" + +Topics reference module docs rather than restating implementation details. They focus on: +- End-to-end data flow across modules +- Cross-cutting security analysis and invariants +- Design rationale, risks, test gaps +- Version gates and compatibility concerns + +Some topics may migrate to `product/` if they are primarily about user-visible behavior and guarantees rather than implementation mechanics. + +### `spec/security-invariants.md` — All security invariants + +Cross-referenced from both module docs and topic docs. + +## Conventions + +Module doc entry format: +``` +## functionName **Purpose**: ... -**Calls**: Module.a, Module.b -**Called by**: Module.c +**Calls**: [Module.a](./modules/path.md#a), [Module.b](./modules/path.md#b) +**Called by**: [Module.c](./modules/path.md#c) **Invariant**: SI-XX **Security**: ... ``` ## Index -### Protocol Implementation -- [smp-protocol.md](smp-protocol.md) — SMP commands, types, encoding -- [xftp-protocol.md](xftp-protocol.md) — XFTP commands, chunk operations -- [ntf-protocol.md](ntf-protocol.md) — NTF commands, token/subscription lifecycle -- [xrcp-protocol.md](xrcp-protocol.md) — XRCP session handshake, commands -- [agent-protocol.md](agent-protocol.md) — Agent connection procedures, queue rotation - -### Cryptography -- [crypto.md](crypto.md) — All primitives: Ed25519, X25519, NaCl, AES-GCM, SHA, HKDF -- [crypto-ratchet.md](crypto-ratchet.md) — Double ratchet + PQDR -- [crypto-tls.md](crypto-tls.md) — TLS setup, certificate chains, validation - -### Transport -- [transport.md](transport.md) — Transport abstraction, handshake, block padding -- [transport-http2.md](transport-http2.md) — HTTP/2 framing, file streaming -- [transport-websocket.md](transport-websocket.md) — WebSocket adapter - -### Server Implementations -- [smp-server.md](smp-server.md) — SMP server -- [xftp-server.md](xftp-server.md) — XFTP server -- [ntf-server.md](ntf-server.md) — Notification server - -### Client Implementations -- [smp-client.md](smp-client.md) — SMP client, proxy relay -- [xftp-client.md](xftp-client.md) — XFTP client -- [agent.md](agent.md) — SMP agent, duplex connections - -### Storage -- [storage-server.md](storage-server.md) — Server storage backends -- [storage-agent.md](storage-agent.md) — Agent storage backends - -### Auxiliary +### Topics + +- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) - [encoding.md](encoding.md) — Binary and string encoding - [version.md](version.md) — Version ranges and negotiation -- [remote-control.md](remote-control.md) — XRCP implementation - [compression.md](compression.md) — Zstd compression -### Cross-cutting Features -- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) +### Modules + +See `spec/modules/` — mirrors `src/Simplex/` structure. ### Security + - [security-invariants.md](security-invariants.md) — All security invariants diff --git a/spec/TOPICS.md b/spec/TOPICS.md new file mode 100644 index 000000000..a0c1f4eaf --- /dev/null +++ b/spec/TOPICS.md @@ -0,0 +1,5 @@ +# Topic Candidates + +> Cross-cutting patterns noticed during module documentation. Each entry may become a topic doc in `spec/` after all module docs are complete. + +- **Exception handling strategy**: `catchOwn`/`catchAll`/`tryAllErrors` pattern (defined in Util.hs) used across server, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site. diff --git a/spec/encoding.md b/spec/encoding.md index 3a4fdcd27..f5501cfab 100644 --- a/spec/encoding.md +++ b/spec/encoding.md @@ -15,8 +15,6 @@ Both are typeclasses with `MINIMAL` pragmas requiring `encode` + (`decode` | `pa ## Binary Encoding (`Encoding` class) -**Source**: `Encoding.hs:38-52` - ```haskell class Encoding a where smpEncode :: a -> ByteString @@ -26,31 +24,29 @@ class Encoding a where ### Length-prefix conventions -| Type | Prefix | Max size | Source | -|------|--------|----------|--------| -| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | `Encoding.hs:102-106` | -| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | `Encoding.hs:135-143` | -| `Tail` (newtype) | None — consumes rest of input | Unlimited | `Encoding.hs:126-132` | -| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | `Encoding.hs:155-159` | -| `NonEmpty` | Same as list (fails on count=0) | 255 items | `Encoding.hs:173-178` | +| Type | Prefix | Max size | +|------|--------|----------| +| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | +| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | +| `Tail` (newtype) | None — consumes rest of input | Unlimited | +| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | +| `NonEmpty` | Same as list (fails on count=0) | 255 items | ### Scalar types -| Type | Encoding | Bytes | Source | -|------|----------|-------|--------| -| `Char` | Raw byte | 1 | `Encoding.hs:54-58` | -| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | `Encoding.hs:60-70` | -| `Word16` | Big-endian | 2 | `Encoding.hs:72-76` | -| `Word32` | Big-endian | 4 | `Encoding.hs:78-82` | -| `Int64` | Two big-endian Word32s (high then low) | 8 | `Encoding.hs:84-99` | -| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | `Encoding.hs:145-149` | -| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | `Encoding.hs:161-165` | -| `String` | `B.pack` then ByteString encoding | 1 + len | `Encoding.hs:167-171` | +| Type | Encoding | Bytes | +|------|----------|-------| +| `Char` | Raw byte | 1 | +| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | +| `Word16` | Big-endian | 2 | +| `Word32` | Big-endian | 4 | +| `Int64` | Two big-endian Word32s (high then low) | 8 | +| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | +| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | +| `String` | `B.pack` then ByteString encoding | 1 + len | ### `Maybe a` -**Source**: `Encoding.hs:116-124` - ``` Nothing → '0' (0x30) Just x → '1' (0x31) ++ smpEncode x @@ -60,23 +56,19 @@ Tags are ASCII characters `'0'`/`'1'`, not binary 0x00/0x01. ### Tuples -**Source**: `Encoding.hs:180-220` - Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's `smpP`. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size). ### Combinators -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | `Encoding.hs:151-152` | -| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | `Encoding.hs:155-156` | -| `smpListP` | `Parser [a]` | Parse count then that many items | `Encoding.hs:158-159` | -| `lenEncode` | `Int -> Char` | Int to single-byte length char | `Encoding.hs:108-110` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | +| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | +| `smpListP` | `Parser [a]` | Parse count then that many items | +| `lenEncode` | `Int -> Char` | Int to single-byte length char | ## String Encoding (`StrEncoding` class) -**Source**: `Encoding/String.hs:56-67` - ```haskell class StrEncoding a where strEncode :: a -> ByteString @@ -88,39 +80,35 @@ Key difference from `Encoding`: the default `strP` parses base64url input first, ### Instance conventions -| Type | Encoding | Source | -|------|----------|--------| -| `ByteString` | base64url (non-empty required) | `String.hs:70-76` | -| `Word16`, `Word32` | Decimal string | `String.hs:114-124` | -| `Int`, `Int64` | Signed decimal | `String.hs:138-148` | -| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | `String.hs:126-136` | -| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | `String.hs:108-112` | -| `Text` | UTF-8 bytes, parsed until space/newline | `String.hs:97-99` | -| `SystemTime` | `systemSeconds` as Int64 (decimal) | `String.hs:150-152` | -| `UTCTime` | ISO 8601 string | `String.hs:154-156` | -| `CertificateChain` | Comma-separated base64url blobs | `String.hs:158-162` | -| `Fingerprint` | base64url of fingerprint bytes | `String.hs:164-168` | +| Type | Encoding | +|------|----------| +| `ByteString` | base64url (non-empty required) | +| `Word16`, `Word32` | Decimal string | +| `Int`, `Int64` | Signed decimal | +| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | +| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | +| `Text` | UTF-8 bytes, parsed until space/newline | +| `SystemTime` | `systemSeconds` as Int64 (decimal) | +| `UTCTime` | ISO 8601 string | +| `CertificateChain` | Comma-separated base64url blobs | +| `Fingerprint` | base64url of fingerprint bytes | ### Collection encoding -| Type | Separator | Source | -|------|-----------|--------| -| Lists (`strEncodeList`) | Comma `,` | `String.hs:171-175` | -| `NonEmpty` | Comma (fails on empty) | `String.hs:178-180` | -| `Set a` | Comma | `String.hs:182-184` | -| `IntSet` | Comma | `String.hs:186-188` | -| Tuples (2-6) | Space (` `) | `String.hs:193-221` | +| Type | Separator | +|------|-----------| +| Lists (`strEncodeList`) | Comma `,` | +| `NonEmpty` | Comma (fails on empty) | +| `Set a` | Comma | +| `IntSet` | Comma | +| Tuples (2-6) | Space (` `) | ### `Str` newtype -**Source**: `String.hs:84-89` - Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded. ### `TextEncoding` class -**Source**: `String.hs:51-53` - ```haskell class TextEncoding a where textEncode :: a -> Text @@ -131,14 +119,14 @@ Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Use ### JSON bridge functions -| Function | Purpose | Source | -|----------|---------|--------| -| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | `String.hs:229-231` | -| `strToJEncoding` | Same, for Aeson encoding | `String.hs:233-235` | -| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | `String.hs:237-238` | -| `textToJSON` | `TextEncoding a => a -> J.Value` | `String.hs:240-242` | -| `textToEncoding` | Same, for Aeson encoding | `String.hs:244-246` | -| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | `String.hs:248-249` | +| Function | Purpose | +|----------|---------| +| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | +| `strToJEncoding` | Same, for Aeson encoding | +| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | +| `textToJSON` | `TextEncoding a => a -> J.Value` | +| `textToEncoding` | Same, for Aeson encoding | +| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | ## Parsers @@ -146,45 +134,43 @@ Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Use ### Core parsing functions -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | `Parsers.hs:64-65` | -| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | `Parsers.hs:61-62` | -| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | `Parsers.hs:67-68` | -| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | `Parsers.hs:70-71` | -| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | `Parsers.hs:76-77` | -| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | `Parsers.hs:89-90` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | +| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | +| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | +| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | +| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | +| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | ### `base64P` -**Source**: `Parsers.hs:44-53` - Standard base64 parser (not base64url — uses `+`/`/` alphabet). Takes alphanumeric + `+`/`/` characters, optional `=` padding, then decodes. Contrast with `base64urlP` in `Encoding/String.hs` which uses `-`/`_` alphabet. ### JSON options helpers Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift). -| Function | Purpose | Source | -|----------|---------|--------| -| `enumJSON` | All-nullary constructors as strings, with tag modifier | `Parsers.hs:101-106` | -| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | `Parsers.hs:109-114` | -| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | `Parsers.hs:119-128` | -| `singleFieldJSON` | `{"Tag": value}` format | `Parsers.hs:137-149` | -| `defaultJSON` | Default options with `omitNothingFields = True` | `Parsers.hs:151-152` | +| Function | Purpose | +|----------|---------| +| `enumJSON` | All-nullary constructors as strings, with tag modifier | +| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | +| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | +| `singleFieldJSON` | `{"Tag": value}` format | +| `defaultJSON` | Default options with `omitNothingFields = True` | Pattern synonyms for JSON field names: -- `TaggedObjectJSONTag = "type"` (`Parsers.hs:131`) -- `TaggedObjectJSONData = "data"` (`Parsers.hs:134`) -- `SingleFieldJSONTag = "_owsf"` (`Parsers.hs:117`) +- `TaggedObjectJSONTag = "type"` +- `TaggedObjectJSONData = "data"` +- `SingleFieldJSONTag = "_owsf"` ### String helpers -| Function | Purpose | Source | -|----------|---------|--------| -| `fstToLower` | Lowercase first character | `Parsers.hs:92-94` | -| `dropPrefix` | Remove prefix string, lowercase remainder | `Parsers.hs:96-99` | -| `textP` | Parse rest of input as UTF-8 `String` | `Parsers.hs:154-155` | +| Function | Purpose | +|----------|---------| +| `fstToLower` | Lowercase first character | +| `dropPrefix` | Remove prefix string, lowercase remainder | +| `textP` | Parse rest of input as UTF-8 `String` | ## Auxiliary Types and Utilities @@ -198,19 +184,19 @@ type TMap k a = TVar (Map k a) STM-based concurrent map. Wraps `Data.Map.Strict` in a `TVar`. All mutations use `modifyTVar'` (strict) to prevent thunk accumulation. -| Function | Notes | Source | -|----------|-------|--------| -| `emptyIO` | IO allocation (`newTVarIO`) | `TMap.hs:32-34` | -| `singleton` | STM allocation | `TMap.hs:36-38` | -| `clear` | Reset to empty | `TMap.hs:40-42` | -| `lookup` / `lookupIO` | STM / non-transactional IO read | `TMap.hs:48-54` | -| `member` / `memberIO` | STM / non-transactional IO membership | `TMap.hs:56-62` | -| `insert` / `insertM` | Insert value / insert from STM action | `TMap.hs:64-70` | -| `delete` | Remove key | `TMap.hs:72-74` | -| `lookupInsert` | Atomic lookup-then-insert (returns old value) | `TMap.hs:76-78` | -| `lookupDelete` | Atomic lookup-then-delete | `TMap.hs:80-82` | -| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | `TMap.hs:84-100` | -| `union` | Merge `Map` into `TMap` | `TMap.hs:102-104` | +| Function | Notes | +|----------|-------| +| `emptyIO` | IO allocation (`newTVarIO`) | +| `singleton` | STM allocation | +| `clear` | Reset to empty | +| `lookup` / `lookupIO` | STM / non-transactional IO read | +| `member` / `memberIO` | STM / non-transactional IO membership | +| `insert` / `insertM` | Insert value / insert from STM action | +| `delete` | Remove key | +| `lookupInsert` | Atomic lookup-then-insert (returns old value) | +| `lookupDelete` | Atomic lookup-then-delete | +| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | +| `union` | Merge `Map` into `TMap` | `lookupIO`/`memberIO` use `readTVarIO` — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations. @@ -228,13 +214,13 @@ data SessionVar a = SessionVar } ``` -| Function | Purpose | Source | -|----------|---------|--------| -| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | `Session.hs:24-33` | -| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | `Session.hs:35-39` | -| `tryReadSessVar` | Non-blocking read of session result | `Session.hs:41-42` | +| Function | Purpose | +|----------|---------| +| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | +| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | +| `tryReadSessVar` | Non-blocking read of session result | -The ID-match check in `removeSessVar` (`sessionVarId v == sessionVarId v'`) prevents a race where: +The ID-match check in `removeSessVar` prevents a race where: 1. Thread A creates session #5, starts work 2. Thread B creates session #6 (replacing #5 in TMap) 3. Thread A finishes, tries to remove — ID mismatch, removal blocked @@ -250,7 +236,7 @@ data SrvLoc = SrvLoc HostName ServiceName URI scheme for SimpleX service addresses. `SSSimplex` encodes as `"simplex:"`, `SSAppServer` as `"https://host:port"`. -`simplexChat :: ServiceScheme` is the constant `SSAppServer (SrvLoc "simplex.chat" "")` (`ServiceScheme.hs:38-39`). +`simplexChat` is the constant `SSAppServer (SrvLoc "simplex.chat" "")`. ### SystemTime @@ -264,12 +250,12 @@ type SystemSeconds = RoundedSystemTime 1 -- second precision Phantom-typed time rounding. The `Nat` type parameter specifies rounding granularity in seconds. -| Function | Purpose | Source | -|----------|---------|--------| -| `getRoundedSystemTime` | Get current time rounded to `t` seconds | `SystemTime.hs:40-43` | -| `getSystemDate` | Alias for day-rounded time | `SystemTime.hs:45-47` | -| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | `SystemTime.hs:49-51` | -| `roundedToUTCTime` | Convert back to `UTCTime` | `SystemTime.hs:53-55` | +| Function | Purpose | +|----------|---------| +| `getRoundedSystemTime` | Get current time rounded to `t` seconds | +| `getSystemDate` | Alias for day-rounded time | +| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | +| `roundedToUTCTime` | Convert back to `UTCTime` | `RoundedSystemTime` derives `FromField`/`ToField` for SQLite storage and `FromJSON`/`ToJSON` for API serialization. @@ -281,62 +267,62 @@ Selected utilities used across the codebase: **Monadic combinators**: -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | `Util.hs:119-121` | -| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | `Util.hs:165-167` | -| `ifM` / `whenM` / `unlessM` | Monadic conditionals | `Util.hs:147-157` | -| `anyM` | Short-circuit `any` for monadic predicates (strict) | `Util.hs:159-161` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | +| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | +| `ifM` / `whenM` / `unlessM` | Monadic conditionals | | +| `anyM` | Short-circuit `any` for monadic predicates (strict) | | **Error handling**: -| Function | Purpose | Source | -|----------|---------|--------| -| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | `Util.hs:273-275` | -| `catchAllErrors` | Same with handler | `Util.hs:281-283` | -| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | `Util.hs:322-324` | -| `catchAllOwnErrors` | Same with handler | `Util.hs:330-332` | -| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | `Util.hs:297-304` | -| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | `Util.hs:306-310` | -| `catchThrow` | Catch exceptions, wrap in Left | `Util.hs:289-291` | -| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | `Util.hs:293-295` | +| Function | Purpose | +|----------|---------| +| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | +| `catchAllErrors` | Same with handler | +| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | +| `catchAllOwnErrors` | Same with handler | +| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | +| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | +| `catchThrow` | Catch exceptions, wrap in Left | +| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | The own-vs-async distinction is critical: `catchOwn`/`tryAllOwnErrors` never swallow async cancellation (`ThreadKilled`, `UserInterrupt`, etc.), only synchronous exceptions and resource exhaustion (`StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded`). **STM**: -| Function | Purpose | Source | -|----------|---------|--------| -| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | `Util.hs:256-261` | +| Function | Purpose | +|----------|---------| +| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | **Database result helpers**: -| Function | Purpose | Source | -|----------|---------|--------| -| `firstRow` | Extract first row with transform, or Left error | `Util.hs:346-347` | -| `maybeFirstRow` | Extract first row as Maybe | `Util.hs:349-350` | -| `firstRow'` | Like `firstRow` but transform can also fail | `Util.hs:355-356` | +| Function | Purpose | +|----------|---------| +| `firstRow` | Extract first row with transform, or Left error | +| `maybeFirstRow` | Extract first row as Maybe | +| `firstRow'` | Like `firstRow` but transform can also fail | **Collection utilities**: -| Function | Purpose | Source | -|----------|---------|--------| -| `groupOn` | `groupBy` using equality on projected key | `Util.hs:358-359` | -| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | `Util.hs:372-373` | -| `toChunks` | Split list into `NonEmpty` chunks of size n | `Util.hs:376-380` | -| `packZipWith` | Optimized ByteString zipWith (direct memory access) | `Util.hs:236-254` | +| Function | Purpose | +|----------|---------| +| `groupOn` | `groupBy` using equality on projected key | +| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | +| `toChunks` | Split list into `NonEmpty` chunks of size n | +| `packZipWith` | Optimized ByteString zipWith (direct memory access) | **Miscellaneous**: -| Function | Purpose | Source | -|----------|---------|--------| -| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | `Util.hs:382-386` | -| `bshow` / `tshow` | `show` to `ByteString` / `Text` | `Util.hs:123-129` | -| `threadDelay'` | `Int64` delay (handles overflow by looping) | `Util.hs:391-399` | -| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | `Util.hs:401-407` | -| `labelMyThread` | Label current thread for debugging | `Util.hs:409-410` | -| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | `Util.hs:415-421` | -| `traverseWithKey_` | `Map` traversal discarding results | `Util.hs:423-425` | +| Function | Purpose | +|----------|---------| +| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | +| `bshow` / `tshow` | `show` to `ByteString` / `Text` | +| `threadDelay'` | `Int64` delay (handles overflow by looping) | +| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | +| `labelMyThread` | Label current thread for debugging | +| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | +| `traverseWithKey_` | `Map` traversal discarding results | ## Security notes diff --git a/spec/modules/README.md b/spec/modules/README.md new file mode 100644 index 000000000..1d18b32e3 --- /dev/null +++ b/spec/modules/README.md @@ -0,0 +1,155 @@ +# How to Document a Module + +> Read this before writing any module doc. It defines what goes in, what stays out, and why. + +## Purpose + +Module docs exist for one reason: to capture knowledge that **cannot be obtained by reading the source code**. If reading the `.hs` file tells you everything you need to know, the module doc should be brief or empty. + +These docs are an investment — their value compounds over time as multiple people (and LLMs) work on the code. Optimize for long-term value, not for looking thorough today. + +## Process + +**Read every line of the source file.** The non-obvious filter applies to what you *write*, not to what you *read*. Without reading each line, you will produce documentation from inferences rather than facts. Many non-obvious behaviors only become visible when you see a specific line of code and recognize that its implications would surprise a reader who doesn't have the surrounding context. + +## File structure + +Module docs mirror `src/Simplex/` exactly. Same subfolder structure, `.hs` replaced with `.md`: + +``` +src/Simplex/Messaging/Server.hs → spec/modules/Simplex/Messaging/Server.md +src/Simplex/Messaging/Crypto.hs → spec/modules/Simplex/Messaging/Crypto.md +src/Simplex/FileTransfer/Agent.hs → spec/modules/Simplex/FileTransfer/Agent.md +``` + +## What to include + +### 1. Non-obvious behavior +Things that would surprise a competent Haskell developer reading the code for the first time: +- Subtle invariants maintained across function calls +- Ordering dependencies ("must call X before Y because...") +- Concurrency assumptions ("this TVar is only written from thread Z") +- Implicit contracts between caller and callee not captured by types + +### 2. Usage considerations +- When to use function X vs function Y +- Common mistakes callers make +- Caller obligations not enforced by the type system +- Performance characteristics that affect usage decisions + +### 3. Cross-module relationships +- Dependencies on other modules' behavior not visible from import lists +- Assumptions about how other modules use this one +- Coordination patterns (e.g., "Server.hs reads this TVar, Agent.hs writes it") + +### 4. Security notes +- Trust boundaries this module enforces or relies on +- What happens if inputs are malicious +- Which functions are security-critical and why (reference SI-XX invariants) + +### 5. Design rationale +- Why the code is structured this way (when not obvious) +- Alternatives considered and rejected +- Known limitations and their justification + +## What NOT to include + +- **Type signatures** — the code has them +- **Code snippets** — if you're pasting code, you're making a stale copy +- **Function-by-function prose that restates the implementation** — "this function takes X and returns Y by doing Z" adds nothing +- **Line numbers** — they're brittle and break on every edit +- **Comments that fit in one line in source** — put those in the source file instead as `-- spec:` comments + +## Format + +Each module doc has a header, then entries for functions/types that need documentation. + +```markdown +# Module.Name + +> One-line description of what this module does. + +**Source**: [`Path/To/Module.hs`](relative link to source) + +## Overview + +[Only if the module's purpose or architecture is non-obvious. +Skip for simple modules.] + +## functionName + +**Purpose**: [What this does that isn't obvious from the name and type] +**Calls**: [Qualified.Name.a](link), [Qualified.Name.b](link) +**Called by**: [Qualified.Name.c](link) +**Invariant**: SI-XX +**Security**: [What this function ensures for the threat model] + +[Free-form notes about non-obvious behavior, gotchas, etc.] + +## anotherFunction + +... +``` + +**For trivial modules** (< 100 LOC, no non-obvious behavior): + +```markdown +# Module.Name + +> One-line description. + +**Source**: [`Path/To/Module.hs`](relative link to source) + +No non-obvious behavior. See source. +``` + +This is valuable — it confirms someone looked and found nothing to document. + +## Linking conventions + +### Module doc → other module docs +Use fully qualified names as link text: +```markdown +[Simplex.Messaging.Server.subscribeServiceMessages](./Simplex/Messaging/Server.md#subscribeServiceMessages) +``` + +### Module doc → topic docs +```markdown +See [rcv-services](../rcv-services.md) for the end-to-end service subscription flow. +``` + +### Source → module doc +Comment above function in source: +```haskell +-- spec: spec/modules/Simplex/Messaging/Server.md#subscribeServiceMessages +-- Delivers buffered messages for all service queues after SUBS (SI-SVC-07) +subscribeServiceMessages :: ... +``` + +Only add `-- spec:` comments where the module doc actually has something to say. Don't add links to "No non-obvious behavior" docs. + +## Topic candidate tracking + +While documenting modules, you will notice cross-cutting patterns — behaviors that span multiple modules and can't be understood from any single one. Note these in `spec/TOPICS.md` for later. Don't write the topic doc during module work; just record: + +```markdown +- **Queue rotation**: Agent.hs initiates, Client.hs sends commands, Server.hs processes, + Protocol.hs defines types. End-to-end flow not obvious from any single module. +``` + +## Quality bar + +Before finishing a module doc, ask: +1. Does every entry document something NOT in the source code? +2. Would removing any entry lose information? If not, remove it. +3. Are cross-module relationships captured that imports alone don't reveal? +4. Are security-critical functions flagged with invariant IDs? +5. Is this doc short enough that someone will actually read it? + +If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes. + +## Exclusions + +- **Individual migration files** (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs. +- **Auto-generated files** (GitCommit.hs): Skip. +- **Pure boilerplate** (Prometheus.hs metrics, Web/Embedded.hs static files): Document only if non-obvious. diff --git a/spec/modules/Simplex/Messaging/Compression.md b/spec/modules/Simplex/Messaging/Compression.md new file mode 100644 index 000000000..67c7317da --- /dev/null +++ b/spec/modules/Simplex/Messaging/Compression.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Compression + +> Zstd compression with passthrough for short messages. + +**Source**: [`Compression.hs`](../../../../src/Simplex/Messaging/Compression.hs) + +## compress1 + +Messages <= 180 bytes are wrapped as `Passthrough` (no compression). The threshold is empirically derived from real client data — messages above 180 bytes rapidly gain compression ratio. + +## decompress1 + +**Security**: decompression bomb protection. Requires `decompressedSize` to be present in the zstd frame header AND within the caller-specified `limit`. If the compressed data doesn't declare its decompressed size (non-standard zstd frames), decompression is refused entirely. This prevents memory exhaustion from malicious compressed payloads. + +## Wire format + +Tag byte `'0'` (0x30) = passthrough (1-byte length prefix, raw data). Tag byte `'1'` (0x31) = compressed (2-byte `Large` length prefix, zstd data). The passthrough path uses the standard `ByteString` encoding (255-byte limit); the compressed path uses `Large` (65535-byte limit). diff --git a/spec/modules/Simplex/Messaging/Encoding.md b/spec/modules/Simplex/Messaging/Encoding.md new file mode 100644 index 000000000..f485aeaa4 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Encoding.md @@ -0,0 +1,41 @@ +# Simplex.Messaging.Encoding + +> Binary wire-format encoding for SMP protocol transmission. + +**Source**: [`Encoding.hs`](../../../../src/Simplex/Messaging/Encoding.hs) + +## Overview + +`Encoding` is the binary wire format — fixed-size or length-prefixed, no delimiters between fields. Contrast with [Simplex.Messaging.Encoding.String](./Encoding/String.md) which is the human-readable, space-delimited, base64url format used in URIs and logs. + +The two encoding classes share some instances (`Char`, `Bool`, `SystemTime`) but differ fundamentally: `Encoding` is self-delimiting via length prefixes, `StrEncoding` is delimiter-based (spaces, commas). + +## ByteString instance + +**Length prefix is 1 byte.** Maximum encodable length is 255 bytes. If a ByteString exceeds 255 bytes, the length silently wraps via `w2c . fromIntegral` — a 300-byte string encodes length as 44 (300 mod 256). Callers must ensure ByteStrings fit in 255 bytes, or use `Large` for longer values. + +**Security**: silent truncation means a caller encoding untrusted input without length validation could produce a malformed message where the decoder reads fewer bytes than were intended, then misparses the remainder as the next field. + +## Large + +2-byte length prefix (`Word16`). Use for ByteStrings that may exceed 255 bytes. Maximum 65535 bytes. + +## Maybe instance + +Tags are ASCII characters `'0'` (0x30) and `'1'` (0x31), not bytes 0x00/0x01. `Nothing` encodes as the single byte 0x30; `Just x` encodes as 0x31 followed by `smpEncode x`. + +## Tail + +Consumes all remaining input. Must be the last field in any composite encoding — placing it elsewhere silently eats subsequent fields. + +## Tuple instances + +Sequential concatenation with no separators. Works because each element's encoding is self-delimiting (length-prefixed ByteString, fixed-size Word16/Word32/Int64/Char, etc.). If an element type isn't self-delimiting, the tuple won't round-trip. + +## SystemTime + +Only seconds are encoded (as Int64); nanoseconds are discarded on encode and set to 0 on decode. + +## smpEncodeList / smpListP + +1-byte length prefix for lists — same 255-item limit as ByteString's 255-byte limit. diff --git a/spec/modules/Simplex/Messaging/Encoding/String.md b/spec/modules/Simplex/Messaging/Encoding/String.md new file mode 100644 index 000000000..60ac9e496 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Encoding/String.md @@ -0,0 +1,40 @@ +# Simplex.Messaging.Encoding.String + +> Human-readable, URI-friendly string encoding for SMP and agent protocols. + +**Source**: [`Encoding/String.hs`](../../../../../src/Simplex/Messaging/Encoding/String.hs) + +## Overview + +`StrEncoding` is the human-readable counterpart to [Simplex.Messaging.Encoding](../Encoding.md)'s binary `Encoding`. Key differences: + +| Aspect | `Encoding` (binary) | `StrEncoding` (string) | +|--------|---------------------|------------------------| +| ByteString | 1-byte length prefix, raw bytes | base64url encoded | +| Tuple separator | none (self-delimiting) | space-delimited | +| List separator | 1-byte count prefix | comma-separated | +| Default parser fallback | `smpP` via `parseAll` | `strP` via `base64urlP` | + +## ByteString instance + +Encodes as base64url. The parser (`strP`) only accepts non-empty strings — empty base64url input fails. + +## String instance + +Inherits from ByteString via `B.pack` / `B.unpack`. Only Char8 (Latin-1) characters round-trip; `B.pack` truncates unicode codepoints above 255. The source comment warns about this. + +## strToJSON / strParseJSON + +`strToJSON` uses `decodeLatin1`, not `decodeUtf8'`. This preserves arbitrary byte sequences (e.g., base64url-encoded binary data) as JSON strings without UTF-8 validation errors, but means the JSON representation is Latin-1, not UTF-8. + +## Default strP fallback + +If only `strDecode` is defined (no custom `strP`), the default parser runs `base64urlP` first, then passes the decoded bytes to `strDecode`. This means the type's own `strDecode` receives raw bytes, not the base64url text. Easy to confuse when implementing a new instance. + +## listItem + +Items are delimited by `,`, ` `, or `\n`. List items cannot contain these characters in their `strEncode` output. No escaping mechanism exists. + +## Str newtype + +Plain text (no base64). Delimited by spaces. `strP` consumes the trailing space — this is unusual and means `Str` parsing has a side effect on the input position that other `StrEncoding` parsers don't. diff --git a/spec/modules/Simplex/Messaging/Parsers.md b/spec/modules/Simplex/Messaging/Parsers.md new file mode 100644 index 000000000..d6b054378 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Parsers.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Parsers + +> Attoparsec helpers and Aeson JSON encoding options. + +**Source**: [`Parsers.hs`](../../../../src/Simplex/Messaging/Parsers.hs) + +## sumTypeJSON (platform-dependent JSON encoding) + +On Darwin with the `swiftJSON` CPP flag, `sumTypeJSON` uses `ObjectWithSingleField` encoding with tag `"_owsf"`. On all other platforms, it uses `TaggedObject` encoding with `"type"` / `"data"` keys. + +This means the same Haskell type produces **different JSON** on macOS/iOS vs Linux. Cross-platform JSON interchange must use `taggedObjectJSON` or `singleFieldJSON` directly, not `sumTypeJSON`. + +The `_owsf` tag enables Swift clients to convert between the two encodings — it's a marker that the value was encoded as ObjectWithSingleField rather than TaggedObject. + +## parseE vs parseE' + +`parseE` requires full input consumption (`endOfInput`). `parseE'` does not — it succeeds if the parser matches a prefix. Using `parseE'` where `parseE` is needed silently ignores trailing input. + +## base64P + +Parses standard base64 (`+` and `/`), not base64url (`-` and `_`). Contrast with `base64urlP` in [Simplex.Messaging.Encoding.String](./Encoding/String.md) which parses URL-safe base64. diff --git a/spec/modules/Simplex/Messaging/ServiceScheme.md b/spec/modules/Simplex/Messaging/ServiceScheme.md new file mode 100644 index 000000000..409e8854d --- /dev/null +++ b/spec/modules/Simplex/Messaging/ServiceScheme.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.ServiceScheme + +> URI scheme for SimpleX service addresses. + +**Source**: [`ServiceScheme.hs`](../../../../src/Simplex/Messaging/ServiceScheme.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Session.md b/spec/modules/Simplex/Messaging/Session.md new file mode 100644 index 000000000..22c5c90ca --- /dev/null +++ b/spec/modules/Simplex/Messaging/Session.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Session + +> Atomic get-or-create session variables with identity-safe removal. + +**Source**: [`Session.hs`](../../../../src/Simplex/Messaging/Session.hs) + +## getSessVar + +Returns `Left newVar` if the key was absent (variable created), `Right existingVar` if already present. The new variable gets an atomically incremented `sessionVarId` from the shared counter, and its `sessionVar` TMVar starts empty. + +The caller uses the `Left`/`Right` distinction to decide whether to populate the TMVar (new session) or wait on the existing one. + +## removeSessVar + +Only removes if the stored variable's `sessionVarId` matches the one being removed. This is a compare-and-swap pattern: between the time a caller obtained a `SessionVar` and the time it tries to remove it, another thread may have replaced it with a new session (via `getSessVar`). Without the ID check, the stale caller would remove the new session. diff --git a/spec/modules/Simplex/Messaging/SystemTime.md b/spec/modules/Simplex/Messaging/SystemTime.md new file mode 100644 index 000000000..92bf8e546 --- /dev/null +++ b/spec/modules/Simplex/Messaging/SystemTime.md @@ -0,0 +1,13 @@ +# Simplex.Messaging.SystemTime + +> Type-level precision timestamps for date bucketing and expiration. + +**Source**: [`SystemTime.hs`](../../../../src/Simplex/Messaging/SystemTime.hs) + +## getRoundedSystemTime + +Rounds **down** (truncation): `(seconds / precision) * precision`. A timestamp at 23:59:59 with `SystemDate` (precision 86400) rounds to the start of the current day, not the nearest day. + +## roundedToUTCTime + +Sets nanoseconds to 0. Any `RoundedSystemTime` converted to `UTCTime` and back to `SystemTime` will differ from the original `getSystemTime` value. diff --git a/spec/modules/Simplex/Messaging/TMap.md b/spec/modules/Simplex/Messaging/TMap.md new file mode 100644 index 000000000..f994adab1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/TMap.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.TMap + +> STM-safe concurrent map (`TVar (Map k a)`). + +**Source**: [`TMap.hs`](../../../../src/Simplex/Messaging/TMap.hs) + +## lookupInsert / lookupDelete + +Atomic swap operations using `stateTVar` + `alterF`. `lookupInsert` returns the previous value (if any) while inserting the new one; `lookupDelete` returns the value while removing it. Both are single STM operations — no window between lookup and modification. + +## union + +Left-biased: the passed-in `Map` wins on key conflicts. `union additions tmap` overwrites existing keys in `tmap` with values from `additions`. + +## alterF + +The STM action `f` runs inside the same STM transaction. If `f` retries, the entire `alterF` retries. If `f` has side effects via other TVars, they compose atomically with the map modification. diff --git a/spec/modules/Simplex/Messaging/Util.md b/spec/modules/Simplex/Messaging/Util.md new file mode 100644 index 000000000..3b9fd3777 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Util.md @@ -0,0 +1,52 @@ +# Simplex.Messaging.Util + +> Shared utility functions: exception handling, monadic combinators, data helpers. + +**Source**: [`Util.hs`](../../../../src/Simplex/Messaging/Util.hs) + +## Overview + +Most of this module is straightforward. The exception handling scheme is the part that warrants documentation — the naming is misleading and the semantics are subtle. + +## Exception handling scheme + +Three categories of exceptions, two catch strategies: + +| Category | Examples | `catchAll` | `catchOwn` | +|----------|----------|------------|------------| +| Synchronous | IOError, protocol errors | caught | caught | +| "Own" async | StackOverflow, HeapOverflow, AllocationLimitExceeded | caught | caught | +| Async cancellation | ThreadKilled, all other SomeAsyncException | caught | **re-thrown** | + +### isOwnException + +Classifies `StackOverflow`, `HeapOverflow`, and `AllocationLimitExceeded` as "own" — exceptions caused by this thread's resource usage, not by external cancellation. Despite being `AsyncException`, these should be caught like synchronous exceptions because they reflect the thread's own failure. + +### isAsyncCancellation + +True for any `SomeAsyncException` that is NOT an own exception. These represent external cancellation (e.g., `cancel`, `killThread`) and must be re-thrown to preserve structured concurrency guarantees. + +### catchOwn / catchOwn' + +Despite the name, these catch **all exceptions except async cancellations** — including synchronous exceptions. The name suggests "catch only own exceptions" but the actual semantics are "catch non-cancellation exceptions." This is the standard pattern for exception-safe cleanup in concurrent Haskell. + +### tryAllErrors vs tryAllOwnErrors + +- `tryAllErrors` / `catchAllErrors`: catch everything including async cancellations. Use when you need to convert any failure into an error value (e.g., returning error responses on a connection). +- `tryAllOwnErrors` / `catchAllOwnErrors`: catch everything except async cancellations. Use in normal business logic where cancellation should propagate. + +### AnyError typeclass + +Bridges `SomeException` into application error types via `fromSomeException`. All the `tryAll*` / `catchAll*` functions require this constraint. + +## raceAny_ + +Runs all actions concurrently, waits for any one to complete, then cancels all others. Uses nested `withAsync` — earlier-launched actions are canceled last (LIFO unwinding). + +## threadDelay' + +Handles `Int64` delays exceeding `maxBound :: Int` (~2147 seconds on 32-bit) by looping in chunks. Necessary because `threadDelay` takes `Int`, not `Int64`. + +## toChunks + +Precondition: `n > 0` (comment-only, not enforced). Passing `n = 0` causes infinite loop. diff --git a/spec/modules/Simplex/Messaging/Version.md b/spec/modules/Simplex/Messaging/Version.md new file mode 100644 index 000000000..67bbf1b4f --- /dev/null +++ b/spec/modules/Simplex/Messaging/Version.md @@ -0,0 +1,27 @@ +# Simplex.Messaging.Version + +> Version negotiation with proof-carrying compatibility checks. + +**Source**: [`Version.hs`](../../../../src/Simplex/Messaging/Version.hs) + +## Overview + +The module's central design: `Compatible` and `VRange` constructors are not exported. The only way to obtain a `Compatible` value is through the negotiation functions, and the only way to construct a `VersionRange` is through `mkVersionRange` (which validates) or parsing. This makes "compatibility was checked" a compile-time guarantee — code that holds a `Compatible a` has proof that negotiation succeeded. + +See [Simplex.Messaging.Version.Internal](./Version/Internal.md) for why the `Version` constructor is separated. + +## mkVersionRange + +Uses `error` if `min > max`. Safe only for compile-time constants. Runtime construction must use `safeVersionRange`, which returns `Nothing` on invalid input. + +## compatibleVersion vs compatibleVRange + +`compatibleVersion` selects a single version: `min(max1, max2)` — the highest mutually-supported version. `compatibleVRange` returns the full intersection range: `(max(min1,min2), min(max1,max2))`. The intersection is used when both sides need to remember the agreed range for future version-gated behavior, not just the single negotiated version. + +## compatibleVRange' + +Different from `compatibleVRange`: caps the range's *maximum* at a given version, rather than intersecting two ranges. Returns `Nothing` if the cap is below the range's minimum. Used when a peer reports a specific version and you need to constrain your range accordingly. + +## VersionI / VersionRangeI typeclasses + +Allow extension types that wrap `Version` or `VersionRange` (e.g., types carrying additional handshake parameters alongside the version) to participate in negotiation without unwrapping. The associated types (`VersionT`, `VersionRangeT`) map between the version and range forms of the extension type. diff --git a/spec/modules/Simplex/Messaging/Version/Internal.md b/spec/modules/Simplex/Messaging/Version/Internal.md new file mode 100644 index 000000000..9fe8cffe9 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Version/Internal.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Version.Internal + +> Exports the `Version` constructor for internal use. + +**Source**: [`Version/Internal.hs`](../../../../../src/Simplex/Messaging/Version/Internal.hs) + +This module exists solely to split the `Version` constructor export. `Version.hs` exports `Version` as an opaque type (no constructor); `Version/Internal.hs` exports the `Version` constructor for modules that need to fabricate version values (protocol constants, parsers, tests). Application code should not import this module. diff --git a/spec/rcv-services.md b/spec/rcv-services.md index b0d97d9f7..6518059f2 100644 --- a/spec/rcv-services.md +++ b/spec/rcv-services.md @@ -33,10 +33,10 @@ Service client SMP Server ## Version gates -| Constant | Value | Gate | Source | -|----------|-------|------|--------| -| `serviceCertsSMPVersion` | 16 | Service handshake, `SOK`, `useServiceAuth` | Transport.hs:214 | -| `rcvServiceSMPVersion` | 19 | `SUBS`/`NSUBS` parameters, `SOKS`/`ENDS` idsHash, messaging service role in handshake | Transport.hs:223 | +| Constant | Value | Gate | +|----------|-------|------| +| `serviceCertsSMPVersion` | 16 | Service handshake, `SOK`, `useServiceAuth` | +| `rcvServiceSMPVersion` | 19 | `SUBS`/`NSUBS` parameters, `SOKS`/`ENDS` idsHash, messaging service role in handshake | The two-version split means: - v16-18 servers accept service certificates and per-queue `SUB` with service auth, but `SUBS`/`NSUBS` send no count/hash parameters (bare command tag only). @@ -55,14 +55,12 @@ The two-version split means: data SMPServiceRole = SRMessaging | SRNotifier | SRProxy -- Wire: "M" | "N" | "P" ``` -Source: Transport.hs:594 ### Party (service-related constructors) ```haskell data Party = ... | RecipientService | NotifierService | ... ``` -Source: Protocol.hs:335-346 The `ServiceParty` type family constrains to `RecipientService | NotifierService` only: ```haskell @@ -71,7 +69,6 @@ type family ServiceParty (p :: Party) :: Constraint where ServiceParty NotifierService = () ServiceParty p = (Int ~ Bool, TypeError ...) -- compile-time error ``` -Source: Protocol.hs:430-434 ### IdsHash @@ -89,7 +86,6 @@ instance Monoid IdsHash where queueIdHash :: QueueId -> IdsHash queueIdHash = IdsHash . C.md5Hash . unEntityId ``` -Source: Protocol.hs:1501-1526 **Key property**: XOR is self-inverse, so `addServiceSubs` and `subtractServiceSubs` both use `<>` (XOR) for the hash component: ```haskell @@ -98,7 +94,6 @@ subtractServiceSubs (n', idsHash') (n, idsHash) | n > n' = (n - n', idsHash <> idsHash') | otherwise = (0, mempty) ``` -Source: Protocol.hs:1528-1534 ### ServiceSub / ServiceSubResult / ServiceSubError @@ -116,7 +111,6 @@ data ServiceSubError | SSErrorQueueCount {expectedQueueCount, subscribedQueueCount :: Int64} | SSErrorQueueIdsHash {expectedQueueIdsHash, subscribedQueueIdsHash :: IdsHash} ``` -Source: Protocol.hs:1476-1499 `serviceSubResult` compares expected vs actual, returning the first mismatch (priority: serviceId > count > idsHash). @@ -128,7 +122,6 @@ data STMService = STMService serviceRcvQueues :: TVar (Set RecipientId, IdsHash), serviceNtfQueues :: TVar (Set NotifierId, IdsHash) } ``` -Source: QueueStore/STM.hs:64-68 Tracks the set of queue IDs and their cumulative XOR hash per service, per role (receive vs notify). @@ -142,8 +135,6 @@ Standard SMP handshake is two messages: server sends `SMPServerHandshake`, clien 2. **Client -> Server**: `SMPClientHandshake` with `clientService :: Maybe SMPClientHandshakeService` 3. **Server -> Client**: `SMPServerHandshakeResponse {serviceId}` or `SMPServerHandshakeError {handshakeError}` -Source: Transport.hs:752-791 (server), Transport.hs:796-848 (client) - ### SMPClientHandshakeService ```haskell @@ -151,7 +142,6 @@ data SMPClientHandshakeService = SMPClientHandshakeService { serviceRole :: SMPServiceRole, serviceCertKey :: CertChainPubKey } ``` -Source: Transport.hs:582-585 The `serviceCertKey` contains the TLS client certificate chain and a proof-of-possession: the service's Ed25519 session key signed by the service's X.509 signing key (`C.signX509 serviceSignKey $ C.publicToX509 k`). @@ -164,14 +154,10 @@ The `serviceCertKey` contains the TLS client certificate chain and a proof-of-po 5. Call `getService` callback (QueueStore.getCreateService) to get/create ServiceId 6. Send `SMPServerHandshakeResponse {serviceId}` back to client -Source: Transport.hs:775-791 - ### Client-side reception (`getClientService`) Client receives either `SMPServerHandshakeResponse {serviceId}` (success) or `SMPServerHandshakeError {handshakeError}` (failure). On success, stores `THClientService {serviceId, serviceRole, serviceCertHash, serviceKey}`. -Source: Transport.hs:843-847 - ### Version-gated service role filtering (`mkClientService`) ```haskell @@ -179,7 +165,6 @@ mkClientService v (ServiceCredentials {serviceRole, ...}, (k, _)) | serviceRole == SRMessaging && v < rcvServiceSMPVersion = Nothing | otherwise = Just SMPClientHandshakeService {..} ``` -Source: Transport.hs:838-842 Messaging services are suppressed below v19. Notifier services are sent at v16+. @@ -192,7 +177,6 @@ data ServiceCredentials = ServiceCredentials serviceCertHash :: XV.Fingerprint, serviceSignKey :: C.APrivateSignKey } ``` -Source: Transport.hs:587-592 ## Protocol layer: commands and messages @@ -216,7 +200,6 @@ useServiceAuth = \case Cmd _ NSUB -> True _ -> False ``` -Source: Protocol.hs:1737-1742 For these commands, `tEncodeAuth` appends both the primary queue key signature and an optional service Ed25519 signature. `SUBS`/`NSUBS` use the ServiceId as entity and are signed only by the service session key. @@ -237,21 +220,18 @@ For these commands, `tEncodeAuth` appends both the primary queue key signature a v >= 19: tag SP count idsHash v < 19: tag (bare, no parameters) ``` -Source: Protocol.hs:1769-1771, 1787-1789 **SOKS/ENDS encoding:** ``` v >= 19: tag SP count idsHash v < 19: tag SP count (no idsHash) ``` -Source: Protocol.hs:1951-1953 **SOKS/ENDS decoding:** ``` v >= 19: tag -> resp <$> _smpP <*> smpP (count + idsHash) v < 19: tag -> resp <$> _smpP <*> pure mempty (count only, mempty hash) ``` -Source: Protocol.hs:1996-1998 ## Server layer @@ -267,7 +247,6 @@ data Client s = Client ntfServiceSubsCount :: TVar (Int64, IdsHash), -- running (count, hash) for notifier queues ... } ``` -Source: Env/STM.hs:437-456 Server-global state: ```haskell @@ -279,7 +258,6 @@ data ServerSubscribers s = ServerSubscribers subClients :: TVar IntSet, pendingEvents :: TVar (IntMap (NonEmpty (EntityId, BrokerMsg))) } ``` -Source: Env/STM.hs:362-369 ### ClientSub events @@ -289,7 +267,6 @@ data ClientSub | CSDeleted QueueId (Maybe ServiceId) -- prev service ID | CSService ServiceId (Int64, IdsHash) -- service subscription change ``` -Source: Env/STM.hs:426-429 These are enqueued into `subQ` and processed by `serverThread` (the subscription event loop). @@ -526,7 +503,6 @@ subscribeService c party n idsHash = case smpClientService c of SNotifierService -> NSUBS n idsHash Nothing -> throwE PCEServiceUnavailable ``` -Source: Client.hs:921-934 Entity is `serviceId`, auth key is the service session key (Ed25519). The client passes its expected count and hash; the server returns its own. @@ -551,7 +527,6 @@ This prevents MITM service substitution inside TLS: an attacker cannot replace t (fp <> t, Just $ C.sign' serviceKey t) _ -> (t, Nothing) ``` -Source: Client.hs:1398-1401 ### Service runtime accessors @@ -562,7 +537,6 @@ smpClientService = thAuth . thParams >=> clientService smpClientServiceId :: SMPClient -> Maybe ServiceId smpClientServiceId = fmap (\THClientService {serviceId} -> serviceId) . smpClientService ``` -Source: Client.hs:936-942 ### Configuration @@ -632,8 +606,6 @@ data SessSubs = SessSubs activeServiceSub :: TVar (Maybe ServiceSub), pendingServiceSub :: TVar (Maybe ServiceSub) } ``` -Source: TSessionSubs.hs:59-65 - Key operations: - `setPendingServiceSub`: stores expected ServiceSub before SUBS is sent - `setActiveServiceSub`: promotes to active after SOKS, validates session ID @@ -657,8 +629,6 @@ CREATE TABLE client_services( service_queue_ids_hash BLOB NOT NULL DEFAULT x'00000000000000000000000000000000' ); ``` -Source: Agent/Store/SQLite/Migrations/M20260115_service_certs.hs:11-23 - ### `rcv_queues.rcv_service_assoc` Boolean column added to `rcv_queues`. When set, the queue is associated with the service for this server. SQLite triggers automatically maintain `service_queue_count` and `service_queue_ids_hash` on insert/delete/update of `rcv_queues` rows. @@ -676,8 +646,6 @@ Triggers: `tr_rcv_queue_insert`, `tr_rcv_queue_delete`, `tr_rcv_queue_update_rem | `removeRcvServiceAssocs` | Remove service association for all queues on a server | | `unassocUserServerRcvQueueSubs` | Remove association and return queues for re-subscription | -Source: AgentStore.hs:419-494, 2378-2414 - ### Service ID nullification on cert change `INSERT ... ON CONFLICT DO UPDATE SET ... service_id = NULL` (AgentStore.hs:429) — when service credentials are updated (new cert), the stored `service_id` is cleared, forcing a new handshake to get a fresh ServiceId. @@ -709,8 +677,6 @@ On first use per SMP server, `mkDbService` (Env.hs:126-142) generates a self-sig | `CAServiceSubError` | Log error (non-fatal; fatal errors go to `CAServiceUnavailable`) | | `CAServiceUnavailable` | **Critical recovery path**: calls `removeServiceAndAssociations`, wipes service creds, resubscribes all queues individually | -Source: Server.hs:567-602 - ### `removeServiceAndAssociations` (Store/Postgres.hs:620-652) Nuclear recovery: clears `ntf_service_id`, `ntf_service_cert*`, resets `smp_notifier_count`/`smp_notifier_ids_hash`, and removes all `ntf_service_assoc` flags from subscriptions. Used when the service subscription is irrecoverably broken (e.g., ServiceId mismatch after cert rotation). diff --git a/spec/version.md b/spec/version.md index 6d9a23c09..19ad786fe 100644 --- a/spec/version.md +++ b/spec/version.md @@ -14,8 +14,6 @@ The `Compatible` newtype can only be constructed internally (constructor is not ### `Version v` -**Source**: `Version/Internal.hs:11-12` - ```haskell newtype Version v = Version Word16 ``` @@ -31,8 +29,6 @@ The constructor is exported from `Version.Internal` but not from `Version`, so a ### `VersionRange v` -**Source**: `Version.hs:46-50` - ```haskell data VersionRange v = VRange { minVersion :: Version v @@ -42,16 +38,14 @@ data VersionRange v = VRange Invariant: `minVersion <= maxVersion` (enforced by smart constructors). -The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only, `Version.hs:41-44`) is public. +The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only) is public. -- `Encoding`: two Word16s concatenated (4 bytes total, `Version.hs:80-84`) -- `StrEncoding`: `"min-max"` or `"v"` if min == max (`Version.hs:86-93`) +- `Encoding`: two Word16s concatenated (4 bytes total) +- `StrEncoding`: `"min-max"` or `"v"` if min == max - JSON: `{"minVersion": n, "maxVersion": n}` ### `VersionScope v` -**Source**: `Version.hs:64` - ```haskell class VersionScope v ``` @@ -67,8 +61,6 @@ This prevents accidentally mixing version ranges from different protocols in neg ### `Compatible a` -**Source**: `Version.hs:117-122` - ```haskell newtype Compatible a = Compatible_ a @@ -80,8 +72,6 @@ Proof that compatibility was checked. The `Compatible_` constructor is not expor ### `VersionI` / `VersionRangeI` type classes -**Source**: `Version.hs:95-115` - Multi-param typeclasses with functional dependencies for generic version/range operations. Allow extension types that wrap `Version` or `VersionRange` to participate in negotiation: ```haskell @@ -103,76 +93,64 @@ Identity instances exist for `Version v` and `VersionRange v` themselves. ### Construction -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | `Version.hs:67-70` | -| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | `Version.hs:72-75` | -| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | `Version.hs:77-78` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | +| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | +| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | ### Compatibility checking -#### `isCompatible` +### isCompatible -**Source**: `Version.hs:124-125` +**Purpose**: Check if a single version falls within a range. ```haskell isCompatible :: VersionI v a => a -> VersionRange v -> Bool ``` -Check if a single version falls within a range. - -#### `isCompatibleRange` +### isCompatibleRange -**Source**: `Version.hs:127-130` +**Purpose**: Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. ```haskell isCompatibleRange :: VersionRangeI v a => a -> VersionRange v -> Bool ``` -Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. +### proveCompatible -#### `proveCompatible` - -**Source**: `Version.hs:132-133` +**Purpose**: If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. ```haskell proveCompatible :: VersionI v a => a -> VersionRange v -> Maybe (Compatible a) ``` -If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. - ### Negotiation -#### `compatibleVersion` +### compatibleVersion -**Source**: `Version.hs:135-140` +**Purpose**: Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. ```haskell compatibleVersion :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible (VersionT v a)) ``` -Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. - -#### `compatibleVRange` +### compatibleVRange -**Source**: `Version.hs:143-148` +**Purpose**: Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty. ```haskell compatibleVRange :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible a) ``` -Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty (i.e., ranges don't overlap). +### compatibleVRange' -#### `compatibleVRange'` - -**Source**: `Version.hs:151-156` +**Purpose**: Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. ```haskell compatibleVRange' :: VersionRangeI v a => a -> Version v -> Maybe (Compatible a) ``` -Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. - ## Protocol version constants Version constants for each protocol are defined in their respective Transport modules. For SMP, key gates include: From e5dbe97e1da8ea49a500cd223968e2298c962276 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 09:06:05 +0000 Subject: [PATCH 07/61] spec references in code --- spec/modules/README.md | 19 ++++++++++++++----- src/Simplex/Messaging/Compression.hs | 3 +++ src/Simplex/Messaging/Encoding.hs | 1 + src/Simplex/Messaging/Parsers.hs | 4 +++- src/Simplex/Messaging/Session.hs | 3 +++ src/Simplex/Messaging/Util.hs | 6 ++++++ 6 files changed, 30 insertions(+), 6 deletions(-) diff --git a/spec/modules/README.md b/spec/modules/README.md index 1d18b32e3..9f057b903 100644 --- a/spec/modules/README.md +++ b/spec/modules/README.md @@ -119,14 +119,23 @@ See [rcv-services](../rcv-services.md) for the end-to-end service subscription f ``` ### Source → module doc -Comment above function in source: + +Add `-- spec:` comments as part of the module documentation work — when you document something non-obvious, add the link in source at the same time. Two levels: + +**Module-level** (below the module declaration): when the Overview section has value. +```haskell +module Simplex.Messaging.Util (...) where +-- spec: spec/modules/Simplex/Messaging/Util.md +``` + +**Function-level** (above the function): when that function has a doc entry worth pointing to. ```haskell --- spec: spec/modules/Simplex/Messaging/Server.md#subscribeServiceMessages --- Delivers buffered messages for all service queues after SUBS (SI-SVC-07) -subscribeServiceMessages :: ... +-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn +-- Catches all exceptions except async cancellations (misleading name) +catchOwn :: ... ``` -Only add `-- spec:` comments where the module doc actually has something to say. Don't add links to "No non-obvious behavior" docs. +Only add `-- spec:` comments where the module doc actually says something the code doesn't. Don't add links to "No non-obvious behavior" docs or to entries that merely restate the source. ## Topic candidate tracking diff --git a/src/Simplex/Messaging/Compression.hs b/src/Simplex/Messaging/Compression.hs index 20000ded3..32430bc88 100644 --- a/src/Simplex/Messaging/Compression.hs +++ b/src/Simplex/Messaging/Compression.hs @@ -1,6 +1,7 @@ {-# LANGUAGE LambdaCase #-} {-# LANGUAGE OverloadedStrings #-} +-- spec: spec/modules/Simplex/Messaging/Compression.md module Simplex.Messaging.Compression ( Compressed, maxLengthPassthrough, @@ -42,6 +43,8 @@ compress1 bs | B.length bs <= maxLengthPassthrough = Passthrough bs | otherwise = Compressed . Large $ Z1.compress compressionLevel bs +-- spec: spec/modules/Simplex/Messaging/Compression.md#decompress1 +-- Decompression bomb protection: refuses data without declared size or exceeding limit decompress1 :: Int -> Compressed -> Either String ByteString decompress1 limit = \case Passthrough bs -> Right bs diff --git a/src/Simplex/Messaging/Encoding.hs b/src/Simplex/Messaging/Encoding.hs index d069e5518..4381ff8bb 100644 --- a/src/Simplex/Messaging/Encoding.hs +++ b/src/Simplex/Messaging/Encoding.hs @@ -7,6 +7,7 @@ {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE UndecidableInstances #-} +-- spec: spec/modules/Simplex/Messaging/Encoding.md module Simplex.Messaging.Encoding ( Encoding (..), Tail (..), diff --git a/src/Simplex/Messaging/Parsers.hs b/src/Simplex/Messaging/Parsers.hs index 7acbec743..3a2fd07fc 100644 --- a/src/Simplex/Messaging/Parsers.hs +++ b/src/Simplex/Messaging/Parsers.hs @@ -4,6 +4,7 @@ {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE PatternSynonyms #-} +-- spec: spec/modules/Simplex/Messaging/Parsers.md module Simplex.Messaging.Parsers ( base64P, parse, @@ -105,7 +106,8 @@ enumJSON tagModifier = J.allNullaryToStringTag = True } --- used in platform-specific encoding, includes tag for single-field encoding of sum types to allow conversion to tagged objects +-- spec: spec/modules/Simplex/Messaging/Parsers.md#sumTypeJSON +-- Platform-dependent: ObjectWithSingleField on Darwin+swiftJSON, TaggedObject elsewhere sumTypeJSON :: (String -> String) -> J.Options #if defined(darwin_HOST_OS) && defined(swiftJSON) sumTypeJSON = singleFieldJSON_ $ Just SingleFieldJSONTag diff --git a/src/Simplex/Messaging/Session.hs b/src/Simplex/Messaging/Session.hs index ff5d7e0a0..bb082b1bb 100644 --- a/src/Simplex/Messaging/Session.hs +++ b/src/Simplex/Messaging/Session.hs @@ -2,6 +2,7 @@ {-# LANGUAGE NamedFieldPuns #-} {-# LANGUAGE ScopedTypeVariables #-} +-- spec: spec/modules/Simplex/Messaging/Session.md module Simplex.Messaging.Session ( SessionVar (..), getSessVar, @@ -32,6 +33,8 @@ getSessVar sessSeq sessKey vs sessionVarTs = maybe (Left <$> newSessionVar) (pur TM.insert sessKey v vs pure v +-- spec: spec/modules/Simplex/Messaging/Session.md#removeSessVar +-- Compare-and-swap: only removes if sessionVarId matches, preventing stale removal removeSessVar :: Ord k => SessionVar a -> k -> TMap k (SessionVar a) -> STM () removeSessVar v sessKey vs = TM.lookup sessKey vs >>= \case diff --git a/src/Simplex/Messaging/Util.hs b/src/Simplex/Messaging/Util.hs index 6c1937144..abbf5a3b3 100644 --- a/src/Simplex/Messaging/Util.hs +++ b/src/Simplex/Messaging/Util.hs @@ -3,6 +3,7 @@ {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE ScopedTypeVariables #-} +-- spec: spec/modules/Simplex/Messaging/Util.md module Simplex.Messaging.Util ( AnyError (..), (<$?>), @@ -294,6 +295,7 @@ allFinally :: (AnyError e, MonadUnliftIO m) => ExceptT e m a -> ExceptT e m b -> allFinally action final = tryAllErrors action >>= \r -> final >> except r {-# INLINE allFinally #-} +-- spec: spec/modules/Simplex/Messaging/Util.md#isOwnException isOwnException :: E.SomeException -> Bool isOwnException e = case E.fromException e of Just StackOverflow -> True @@ -303,16 +305,20 @@ isOwnException e = case E.fromException e of _ -> False {-# INLINE isOwnException #-} +-- spec: spec/modules/Simplex/Messaging/Util.md#isAsyncCancellation isAsyncCancellation :: E.SomeException -> Bool isAsyncCancellation e = case E.fromException e of Just (_ :: SomeAsyncException) -> not $ isOwnException e Nothing -> False {-# INLINE isAsyncCancellation #-} +-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn +-- Catches all exceptions EXCEPT async cancellations (name is misleading) catchOwn' :: IO a -> (E.SomeException -> IO a) -> IO a catchOwn' action handleInternal = action `E.catch` \e -> if isAsyncCancellation e then E.throwIO e else handleInternal e {-# INLINE catchOwn' #-} +-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn catchOwn :: MonadUnliftIO m => m a -> (E.SomeException -> m a) -> m a catchOwn action handleInternal = withRunInIO $ \run -> From 7ece87f1b63e28f042c5331f48f174709ccd26a5 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 09:47:18 +0000 Subject: [PATCH 08/61] encoding notes --- spec/modules/Simplex/Messaging/Encoding.md | 4 ++++ spec/modules/Simplex/Messaging/Encoding/String.md | 7 +++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/spec/modules/Simplex/Messaging/Encoding.md b/spec/modules/Simplex/Messaging/Encoding.md index f485aeaa4..8db63d0cc 100644 --- a/spec/modules/Simplex/Messaging/Encoding.md +++ b/spec/modules/Simplex/Messaging/Encoding.md @@ -36,6 +36,10 @@ Sequential concatenation with no separators. Works because each element's encodi Only seconds are encoded (as Int64); nanoseconds are discarded on encode and set to 0 on decode. +## String instance + +`smpEncode` goes through `B.pack`, which silently truncates any Unicode character above codepoint 255 to its lowest byte. A String containing non-Latin-1 characters is silently corrupted on encode with no error. Same issue exists in the `StrEncoding String` instance — see [Simplex.Messaging.Encoding.String](./Encoding/String.md#string-instance). + ## smpEncodeList / smpListP 1-byte length prefix for lists — same 255-item limit as ByteString's 255-byte limit. diff --git a/spec/modules/Simplex/Messaging/Encoding/String.md b/spec/modules/Simplex/Messaging/Encoding/String.md index 60ac9e496..1e60295b8 100644 --- a/spec/modules/Simplex/Messaging/Encoding/String.md +++ b/spec/modules/Simplex/Messaging/Encoding/String.md @@ -27,9 +27,12 @@ Inherits from ByteString via `B.pack` / `B.unpack`. Only Char8 (Latin-1) charact `strToJSON` uses `decodeLatin1`, not `decodeUtf8'`. This preserves arbitrary byte sequences (e.g., base64url-encoded binary data) as JSON strings without UTF-8 validation errors, but means the JSON representation is Latin-1, not UTF-8. -## Default strP fallback +## Class default: strP assumes base64url for all types -If only `strDecode` is defined (no custom `strP`), the default parser runs `base64urlP` first, then passes the decoded bytes to `strDecode`. This means the type's own `strDecode` receives raw bytes, not the base64url text. Easy to confuse when implementing a new instance. +The `MINIMAL` pragma allows defining only `strDecode` without `strP`. But the default `strP = strDecode <$?> base64urlP` then assumes input is base64url-encoded — for *any* type, not just ByteString. Two consequences: + +1. The type's `strDecode` receives raw decoded bytes, not the base64url text. Easy to confuse when implementing a new instance. +2. `base64urlP` requires non-empty input (`takeWhile1`), so the default `strP` cannot parse empty values — even if `strDecode ""` would succeed. Types that can encode to empty output must define `strP` explicitly. ## listItem From 844b5ad3f11e29d6e2fd7561f22c542545b4afa5 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 11:53:18 +0000 Subject: [PATCH 09/61] cryptography modules specs --- spec/TOPICS.md | 4 + spec/modules/Simplex/Messaging/Crypto.md | 82 ++++++++++++++++ spec/modules/Simplex/Messaging/Crypto/File.md | 25 +++++ spec/modules/Simplex/Messaging/Crypto/Lazy.md | 40 ++++++++ .../Simplex/Messaging/Crypto/Ratchet.md | 98 +++++++++++++++++++ .../Simplex/Messaging/Crypto/SNTRUP761.md | 13 +++ .../Simplex/Messaging/Crypto/ShortLink.md | 36 +++++++ src/Simplex/Messaging/Crypto.hs | 6 +- src/Simplex/Messaging/Crypto/Ratchet.hs | 6 +- 9 files changed, 308 insertions(+), 2 deletions(-) create mode 100644 spec/modules/Simplex/Messaging/Crypto.md create mode 100644 spec/modules/Simplex/Messaging/Crypto/File.md create mode 100644 spec/modules/Simplex/Messaging/Crypto/Lazy.md create mode 100644 spec/modules/Simplex/Messaging/Crypto/Ratchet.md create mode 100644 spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md create mode 100644 spec/modules/Simplex/Messaging/Crypto/ShortLink.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index a0c1f4eaf..a8eafc1a1 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -3,3 +3,7 @@ > Cross-cutting patterns noticed during module documentation. Each entry may become a topic doc in `spec/` after all module docs are complete. - **Exception handling strategy**: `catchOwn`/`catchAll`/`tryAllErrors` pattern (defined in Util.hs) used across server, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site. + +- **Padding schemes**: Three different padding formats across the codebase — Crypto.hs uses 2-byte Word16 length prefix (max ~65KB), Crypto/Lazy.hs uses 8-byte Int64 prefix (file-sized), and both use '#' fill character. Ratchet header padding uses fixed sizes (88 or 2310 bytes). All use `pad`/`unPad` but with incompatible formats. The relationship between padding, encryption, and message size limits spans Crypto, Lazy, Ratchet, and the protocol layer. + +- **NaCl construction variants**: crypto_box, secret_box, and KEM hybrid secret all use the same XSalsa20+Poly1305 core (Crypto.hs `xSalsa20`), but with different key sources (DH, symmetric, SHA3_256(DH||KEM)). The lazy streaming variant (Lazy.hs) adds prepend-tag vs tail-tag placement. File.hs wraps lazy streaming with handle-based I/O. Full picture requires reading Crypto.hs, Lazy.hs, File.hs, and SNTRUP761.hs together. diff --git a/spec/modules/Simplex/Messaging/Crypto.md b/spec/modules/Simplex/Messaging/Crypto.md new file mode 100644 index 000000000..f1c660512 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto.md @@ -0,0 +1,82 @@ +# Simplex.Messaging.Crypto + +> Core cryptographic primitives: key types, NaCl crypto_box/secret_box, AEAD-GCM, signing, padding, X509, HKDF. + +**Source**: [`Crypto.hs`](../../../../src/Simplex/Messaging/Crypto.hs) + +## Overview + +This is the largest crypto module (~1540 lines). It defines the type-level algorithm system (GADTs + type families), all key types, and the fundamental encrypt/decrypt/sign/verify operations used throughout the protocol stack. Higher-level modules ([Ratchet](./Crypto/Ratchet.md), [Lazy](./Crypto/Lazy.md), [File](./Crypto/File.md)) build on these primitives. + +## Algorithm type system + +Four algorithms (`Ed25519`, `Ed448`, `X25519`, `X448`) are encoded as a promoted data kind `Algorithm`. Type families constrain which algorithms support which operations: + +- `SignatureAlgorithm`: only `Ed25519`, `Ed448` +- `DhAlgorithm`: only `X25519`, `X448` +- `AuthAlgorithm`: `Ed25519`, `Ed448`, `X25519` (but NOT `X448`) + +Using the wrong algorithm produces a **compile-time error** via `TypeError`. The runtime bridge uses `Dict` from `Data.Constraint` — functions like `signatureAlgorithm :: SAlgorithm a -> Maybe (Dict (SignatureAlgorithm a))` allow dynamic dispatch while preserving type safety. + +## PrivateKeyEd25519 StrEncoding deliberately omitted + +The `StrEncoding` instance for `PrivateKey Ed25519` is commented out with the note "Do not enable, to avoid leaking key data." Only `PrivateKey X25519` has `StrEncoding`, used specifically for the notification store log. This is a deliberate security decision — Ed25519 signing keys should never appear in human-readable formats. + +## Two AEAD initialization paths + +- **`initAEAD`**: Takes 16-byte `IV`, transforms it internally via `cryptonite_aes_gcm_init`. Used by the double ratchet. +- **`initAEADGCM`**: Takes 12-byte `GCMIV`, does NOT transform. Used for WebRTC frame encryption. + +These are **not interchangeable** — using the wrong IV size or init function produces silent corruption. The code comments note that WebCrypto compatibility requires `initAEADGCM`, and the ratchet may need to migrate away from `initAEAD` in the future. + +## cbNonce — silent truncation/padding + +`cbNonce` adjusts any ByteString to exactly 24 bytes: +- If longer: silently truncates to first 24 bytes +- If shorter: silently pads with zero bytes + +No error is raised for incorrect input lengths. This means a programming error passing the wrong-length nonce will produce valid but wrong encryption, not a failure. + +## pad / unPad — 2-byte length prefix + +`pad` prepends a 2-byte big-endian `Word16` length, then the message, then `'#'` padding characters to fill `paddedLen`. Maximum message length is `2^16 - 3 = 65533` bytes. The `'#'` padding character is a convention, not verified on decode — `unPad` only reads the length prefix and extracts that many bytes. + +Contrast with [Simplex.Messaging.Crypto.Lazy.pad](./Crypto/Lazy.md#padding-8-byte-length-prefix) which uses an 8-byte `Int64` prefix for file-sized data. + +## crypto_box / secret_box + +Both use the same underlying `xSalsa20` + `Poly1305.auth` implementation. The difference is only in the key: +- **crypto_box** (`cbEncrypt`/`cbDecrypt`): uses a DH shared secret (`DhSecret X25519`) +- **secret_box** (`sbEncrypt`/`sbDecrypt`): uses a symmetric key (`SbKey`, 32 bytes) + +Both apply `pad`/`unPad` by default. The `NoPad` variants skip padding. + +## xSalsa20 + +The XSalsa20 implementation splits the 24-byte nonce into two 8-byte halves. The first half initializes the cipher state (prepended with 16 zero bytes), the second derives a subkey. The first 32 bytes of output become the Poly1305 one-time key (`rs`), then the rest encrypts the message. This is the standard NaCl construction. + +## CbAuthenticator + +An authentication scheme that encrypts the SHA-512 hash of the message using crypto_box, rather than the message itself. The result is 80 bytes (64 hash + 16 auth tag). Used for authenticating messages where the content is transmitted separately from the authentication proof. + +## Secret box chains (sbcInit / sbcHkdf) + +HKDF-based key chains for deriving sequential key+nonce pairs: +- `sbcInit`: derives two 32-byte chain keys from a salt and shared secret using `HKDF(salt, secret, "SimpleXSbChainInit", 64)` +- `sbcHkdf`: advances a chain key, producing a new chain key (32 bytes), an SbKey (32 bytes), and a CbNonce (24 bytes) from `HKDF("", chainKey, "SimpleXSbChain", 88)` + +## Key encoding + +All keys are encoded as ASN.1 DER (X.509 SubjectPublicKeyInfo for public, PKCS#8 for private). The algorithm is determined by the encoded key length on decode — `decodePubKey` / `decodePrivKey` parse the ASN.1 structure, then dispatch on the X.509 key type. + +## Signature algorithm detection + +`decodeSignature` determines the algorithm by signature length: Ed25519 signatures are 64 bytes, Ed448 signatures are 114 bytes. Any other size is rejected. + +## GCMIV constructor not exported + +`GCMIV` constructor is not exported — only `gcmIV :: ByteString -> Either CryptoError GCMIV` is available, which validates that the input is exactly 12 bytes. This prevents construction of invalid IVs. + +## generateKeyPair is STM + +Key generation uses `TVar ChaChaDRG` and runs in `STM`, not `IO`. This allows key generation inside `atomically` blocks, which is used extensively in handshake and ratchet initialization code. diff --git a/spec/modules/Simplex/Messaging/Crypto/File.md b/spec/modules/Simplex/Messaging/Crypto/File.md new file mode 100644 index 000000000..8fdb22e18 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto/File.md @@ -0,0 +1,25 @@ +# Simplex.Messaging.Crypto.File + +> Streaming encrypted file I/O using NaCl secret_box with tail auth tag. + +**Source**: [`Crypto/File.hs`](../../../../../src/Simplex/Messaging/Crypto/File.hs) + +## Overview + +`CryptoFileHandle` wraps a file `Handle` with an optional `TVar SbState` for streaming encryption/decryption. When `cryptoArgs` is `Nothing`, the file is plaintext and all operations pass through directly. + +## Auth tag position + +The auth tag is written/read **at the end of the file** (tail tag pattern), not prepended. This is important for streaming: `hPut` encrypts chunks as they arrive, accumulating the Poly1305 state in the TVar, and `hPutTag` finalizes and writes the 16-byte tag only after all data is written. + +## hGetTag + +**Security**: Uses `BA.constEq` for constant-time tag comparison, preventing timing side-channels. Must be called after reading all content bytes — it reads exactly `authTagSize` (16) remaining bytes and compares against the finalized Poly1305 state. Caller must know the file size and read only the content portion before calling this. + +## getFileContentsSize + +Subtracts `authTagSize` from the file size when crypto args are present. This gives the content size without the tag, which is needed to know how many bytes to read before calling `hGetTag`. + +## readFile / writeFile + +Whole-file variants that read/write everything at once. `readFile` uses `sbDecryptChunk` (encrypt-then-MAC verification — feeds ciphertext to Poly1305), while `writeFile` uses `sbEncryptChunk`. Both use the tail tag layout via [Simplex.Messaging.Crypto.Lazy](./Lazy.md) functions. diff --git a/spec/modules/Simplex/Messaging/Crypto/Lazy.md b/spec/modules/Simplex/Messaging/Crypto/Lazy.md new file mode 100644 index 000000000..42ba9e02e --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto/Lazy.md @@ -0,0 +1,40 @@ +# Simplex.Messaging.Crypto.Lazy + +> Streaming NaCl secret_box (XSalsa20 + Poly1305) for lazy ByteStrings. + +**Source**: [`Crypto/Lazy.hs`](../../../../../src/Simplex/Messaging/Crypto/Lazy.hs) + +## Overview + +Lazy counterpart to the strict NaCl operations in [Simplex.Messaging.Crypto](../Crypto.md). Processes data chunk-by-chunk via `SbState = (XSalsa.State, Poly1305.State)`, enabling streaming encryption of large files without loading everything into memory. + +## Encrypt-then-MAC asymmetry + +`sbEncryptChunk` and `sbDecryptChunk` both use XSalsa20 for the cipher operation, but feed different data to Poly1305: + +- **Encrypt**: feeds the **ciphertext** to Poly1305 (`Poly1305.update authSt c`) +- **Decrypt**: feeds the **original ciphertext** (the input chunk) to Poly1305 (`Poly1305.update authSt chunk`), not the decrypted plaintext + +This is the correct encrypt-then-MAC pattern: the MAC is always computed over ciphertext, so both sides compute the same tag. + +## Padding: 8-byte length prefix + +`pad` uses an 8-byte `Int64` length prefix (via `smpEncode`), unlike [Simplex.Messaging.Crypto.pad](../Crypto.md#pad) which uses a 2-byte `Word16` prefix. This is because lazy operations handle file-sized data that can exceed 65535 bytes. + +`unPad` / `splitLen` does not validate that the remaining data is at least `len` bytes — it uses `LB.take len` which silently returns a shorter result. The comment notes this is intentional to avoid consuming all chunks for validation. + +## Auth tag placement: prepend vs tail + +Two families of functions: +- **`sbEncrypt` / `sbDecrypt`**: tag is **prepended** (first 16 bytes of output). Used for message-sized data. +- **`sbEncryptTailTag` / `sbDecryptTailTag`**: tag is **appended** (last 16 bytes). More efficient for large files because you don't need to buffer the tag before the content. + +The tail-tag variants also support `KEMHybridSecret` via `kcbEncryptTailTag` / `kcbDecryptTailTag`. + +## sbDecryptTailTag validity + +`sbDecryptTailTag` returns `(Bool, LazyByteString)` — the `Bool` indicates whether the auth tag was valid, but the decrypted data is returned regardless. This allows the caller to decide how to handle invalid tags (e.g., [Simplex.Messaging.Crypto.File](./File.md) uses strict `unless` checks). + +## fastReplicate + +Optimizes large padding by building the lazy ByteString from 64KB chunks (minus GHC overhead for `Int` size) rather than one enormous strict ByteString. This avoids allocating a single contiguous buffer for multi-megabyte padding. diff --git a/spec/modules/Simplex/Messaging/Crypto/Ratchet.md b/spec/modules/Simplex/Messaging/Crypto/Ratchet.md new file mode 100644 index 000000000..ebbc9c5a6 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto/Ratchet.md @@ -0,0 +1,98 @@ +# Simplex.Messaging.Crypto.Ratchet + +> Double ratchet with post-quantum KEM extension (PQ X3DH + header encryption). + +**Source**: [`Crypto/Ratchet.hs`](../../../../../src/Simplex/Messaging/Crypto/Ratchet.hs) + +## Overview + +Implements the Signal double ratchet protocol extended with: +- **Header encryption** (HE variant): message headers are encrypted with separate header keys, hiding the ratchet public key and message counters from observers. +- **Post-quantum KEM** (PQ variant): SNTRUP761 key encapsulation is folded into each ratchet step, providing PQ-resistance alongside X448 DH. + +The ratchet uses X448 (not X25519) for DH operations — `type RatchetX448 = Ratchet 'X448`. + +## PQ X3DH key agreement + +`pqX3dhSnd` / `pqX3dhRcv` perform the extended X3DH: +- Standard triple DH: `DH(rk1, spk2)`, `DH(rk2, spk1)`, `DH(rk2, spk2)` +- Optional KEM shared secret from SNTRUP761 encapsulation +- Combined via `HKDF(salt=64_zeroes, DHs || KEMss, "SimpleXX3DH", 96)` → root key, header key, next-header key + +The roles (who is "Alice" vs "Bob") are **reversed from the double ratchet spec**: the party initiating the connection is Bob (`generateRcvE2EParams`, `initRcvRatchet`), and the party accepting is Alice (`generateSndE2EParams`, `initSndRatchet`). Comments in the source explicitly note this. + +## KDF functions + +- **rootKdf**: `HKDF(rootKey, DH(pubKey, privKey) || KEMss, "SimpleXRootRatchet", 96)` → new root key (32), chain key (32), next header key (32) +- **chainKdf**: `HKDF("", chainKey, "SimpleXChainRatchet", 96)` → new chain key (32), message key (32), two IVs (16 + 16) + +All use HKDF-SHA512 via [Simplex.Messaging.Crypto.hkdf](../Crypto.md). + +## Header encryption and padding + +Headers are encrypted with AEAD-GCM using the header key. The padded header length depends on whether PQ is supported: +- **Without PQ**: 88 bytes (fits DH key + counters) +- **With PQ**: 2310 bytes (fits DH key + KEM params + counters, with reserve for future extension) + +The actual header is ~69 bytes without PQ, ~2288 with PQ. The padding ensures all messages have identical header sizes regardless of content. + +## Version negotiation in headers + +Each message header carries `msgMaxVersion` (the sender's max supported ratchet version). On decryption, the receiver upgrades its `current` version to `min(msgMaxVersion, maxSupported)` but never downgrades. The current version determines: +- Whether KEM params are included in headers (v3+) +- Whether 2-byte length prefixes are used for headers (v3+) + +## largeP — backward-compatible length prefix parsing + +`largeP` detects the length-prefix format by peeking at the first byte: if < 32, it's a 2-byte `Large` prefix (new format); otherwise it's a 1-byte prefix (old format). This allows upgrading the header encoding format in a single message without a version bump. + +## Skipped message keys + +When messages arrive out of order, the ratchet computes and stores the message keys for skipped messages (up to `maxSkip = 512`). Skipped keys are stored in a `Map HeaderKey (Map Word32 MessageKey)` — keyed first by header key, then by message number. + +The `SkippedMsgDiff` type represents changes to the skipped key store as a diff rather than a full replacement — this is persisted to the database, and the full state is loaded for the next message. `applySMDiff` is only used in tests. + +## rcDecrypt flow + +Decryption tries three strategies in order: +1. **Skipped message keys**: try all stored header keys to decrypt the header, then look up the message number in skipped keys +2. **Current receiving ratchet**: decrypt header with `rcHKr` +3. **Next header key**: decrypt header with `rcNHKr` (triggers a ratchet advance) + +If strategy 1 decrypts the header but the message number isn't in skipped keys, it checks whether this header key corresponds to the current or next ratchet to decide whether to advance. + +## rcEncryptHeader — separated from rcEncryptMsg + +Encryption is split into two steps: `rcEncryptHeader` produces a `MsgEncryptKey` (containing the encrypted header and message key), then `rcEncryptMsg` uses that key to encrypt the message body. This separation allows the ratchet state to be updated (persisted) before the message is encrypted, which is important for crash recovery — if the process crashes after encrypting but before sending, the ratchet state must already reflect the advanced counter. + +## PQ ratchet step + +During each ratchet advance (`pqRatchetStep`), the PQ KEM is folded in: +1. Receive: if the header contains a KEM ciphertext and we have the decapsulation key, compute the shared secret +2. Send: generate a new KEM keypair, encapsulate against the received public key, include in the next header +3. The KEM shared secret is concatenated with the DH shared secret before `rootKdf` + +PQ can be enabled/disabled per-message via `pqEnc_` parameter. `rcSupportKEM` can only be enabled (never disabled) — once PQ headers are used, the larger header size is permanent. + +## PQSupport vs PQEncryption + +Two distinct newtypes with identical structure (`Bool` wrapper): +- `PQSupport`: whether PQ **can** be used (determines header padding size, cannot be disabled once enabled) +- `PQEncryption`: whether PQ **is** being used for the current send/receive ratchet + +## Error semantics + +- `CERatchetEarlierMessage n`: message number is `n` positions before the next expected (already processed or skipped-and-consumed) +- `CERatchetDuplicateMessage`: message number is the most recently received (exact repeat) +- `CERatchetTooManySkipped n`: would need to skip `n` messages, exceeding `maxSkip` +- `CERatchetHeader`: header decryption failed with all available keys +- `CERatchetState`: no sending chain (ratchet not initialized for sending) +- `CERatchetKEMState`: KEM state mismatch between parties + +## InitialKeys + +Controls PQ key inclusion in connection establishment: +- `IKUsePQ`: always include PQ keys (used in contact requests and short link data) +- `IKLinkPQ pq`: include PQ keys only in short link data, if `pq` is enabled + +`initialPQEncryption` resolves this based on whether it's a short link context. diff --git a/spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md b/spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md new file mode 100644 index 000000000..d5cd29013 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md @@ -0,0 +1,13 @@ +# Simplex.Messaging.Crypto.SNTRUP761 + +> Hybrid KEM+DH shared secret combining SNTRUP761 and X25519. + +**Source**: [`Crypto/SNTRUP761.hs`](../../../../../src/Simplex/Messaging/Crypto/SNTRUP761.hs) + +## kemHybridSecret + +The hybrid secret is `SHA3_256(DHSecret || KEMSharedKey)` — not a simple concatenation, not HKDF. This follows the approach in draft-josefsson-ntruprime-hybrid. The result is a `ScrubbedBytes` value used as a symmetric key for NaCl crypto_box operations via `sbEncrypt_`/`sbDecrypt_`. + +## kcbEncrypt / kcbDecrypt + +These delegate directly to `sbEncrypt_` / `sbDecrypt_` from [Simplex.Messaging.Crypto](../Crypto.md), using the hybrid secret as the symmetric key. The hybrid secret is 32 bytes (SHA3-256 output), matching the expected key size for XSalsa20. diff --git a/spec/modules/Simplex/Messaging/Crypto/ShortLink.md b/spec/modules/Simplex/Messaging/Crypto/ShortLink.md new file mode 100644 index 000000000..821a30c32 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Crypto/ShortLink.md @@ -0,0 +1,36 @@ +# Simplex.Messaging.Crypto.ShortLink + +> Short link key derivation, encryption, and signature verification for contact/invitation links. + +**Source**: [`Crypto/ShortLink.hs`](../../../../../src/Simplex/Messaging/Crypto/ShortLink.hs) + +## Overview + +Short links encode connection data in two encrypted blobs: fixed data (2048 bytes padded) and user data (13824 bytes padded). Both are encrypted with `sbEncrypt` using a key derived from the link key via HKDF. + +## KDF schemes + +Two distinct HKDF derivations with different info strings: + +- **contactShortLinkKdf**: `HKDF("", linkKey, "SimpleXContactLink", 56)` → splits into 24-byte LinkId + 32-byte SbKey. The LinkId is used as the server-side identifier. +- **invShortLinkKdf**: `HKDF("", linkKey, "SimpleXInvLink", 32)` → 32-byte SbKey only. No LinkId because invitation links don't use server-side lookup. + +## Fixed padding lengths + +- `fixedDataPaddedLength = 2008` (2048 - 24 nonce - 16 auth tag) +- `userDataPaddedLength = 13784` (13824 - 24 - 16) + +These are chosen so the encrypted output (with prepended nonce and appended auth tag) fits exactly in round sizes. + +## decryptLinkData + +**Security**: Performs three-layer verification in order: +1. Hash check: `SHA3_256(fixedData) == linkKey` — ensures data integrity +2. Root key signature: `verify(rootKey, sig1, fixedData)` — ensures authenticity +3. User data signature: `verify(rootKey, sig2, userData)` for invitations, or verify against any owner key for contact links + +For contact links, also calls `validateLinkOwners` to verify the owner chain of trust (each owner is signed by the root key). + +## encodeSign + +Prepends the Ed25519 signature to the data: `smpEncode(sign(pk, data)) <> data`. This is the format expected by `decryptLinkData`'s parser. diff --git a/src/Simplex/Messaging/Crypto.hs b/src/Simplex/Messaging/Crypto.hs index c7b539641..4324b6352 100644 --- a/src/Simplex/Messaging/Crypto.hs +++ b/src/Simplex/Messaging/Crypto.hs @@ -35,6 +35,7 @@ -- -- This module provides cryptography implementation for SMP protocols based on -- . +-- spec: spec/modules/Simplex/Messaging/Crypto.md module Simplex.Messaging.Crypto ( -- * Cryptographic keys Algorithm (..), @@ -1133,7 +1134,8 @@ maxLength :: forall i. KnownNat i => Int maxLength = fromIntegral (natVal $ Proxy @i) {-# INLINE maxLength #-} --- this function requires 16 bytes IV, it transforms IV in cryptonite_aes_gcm_init here: +-- spec: spec/modules/Simplex/Messaging/Crypto.md#two-aead-initialization-paths +-- This function requires 16 bytes IV, it transforms IV in cryptonite_aes_gcm_init here: -- https://github.com/haskell-crypto/cryptonite/blob/master/cbits/cryptonite_aes.c -- This is used for double ratchet encryption, so to make it compatible with WebCrypto we will need to deprecate it and start using initAEADGCM initAEAD :: forall c. AES.BlockCipher c => Key -> IV -> ExceptT CryptoError IO (AES.AEAD c) @@ -1393,6 +1395,8 @@ instance ToJSON CbNonce where instance FromJSON CbNonce where parseJSON = strParseJSON "CbNonce" +-- spec: spec/modules/Simplex/Messaging/Crypto.md#cbNonce--silent-truncationpadding +-- Silently truncates or zero-pads to 24 bytes — no error on wrong length cbNonce :: ByteString -> CbNonce cbNonce s | len == 24 = CryptoBoxNonce s diff --git a/src/Simplex/Messaging/Crypto/Ratchet.hs b/src/Simplex/Messaging/Crypto/Ratchet.hs index 7250a1d60..1ea7760fd 100644 --- a/src/Simplex/Messaging/Crypto/Ratchet.hs +++ b/src/Simplex/Messaging/Crypto/Ratchet.hs @@ -18,6 +18,7 @@ {-# LANGUAGE TypeFamilies #-} {-# OPTIONS_GHC -fno-warn-redundant-constraints #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md module Simplex.Messaging.Crypto.Ratchet ( Ratchet (..), RatchetX448, @@ -435,7 +436,8 @@ generateE2EParams g v useKEM_ = do pure (RKParamsAccepted ct k, PrivateRKParamsAccepted ct shared ks) _ -> pure Nothing --- used by party initiating connection, Bob in double-ratchet spec +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#pq-x3dh-key-agreement +-- used by party initiating connection, Bob in double-ratchet spec (roles are reversed) generateRcvE2EParams :: (AlgorithmI a, DhAlgorithm a) => TVar ChaChaDRG -> VersionE2E -> PQSupport -> IO (PrivateKey a, PrivateKey a, Maybe (PrivRKEMParams 'RKSProposed), E2ERatchetParams 'RKSProposed a) generateRcvE2EParams g v = generateE2EParams g v . proposeKEM_ where @@ -899,6 +901,8 @@ rcCheckCanPad :: Int -> ByteString -> ExceptT CryptoError IO () rcCheckCanPad paddedMsgLen msg = unless (canPad (B.length msg) paddedMsgLen) $ throwE CryptoLargeMsgError +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#rcEncryptHeader--separated-from-rcEncryptMsg +-- Separated from rcEncryptMsg for crash recovery: persist ratchet state between header and message encryption rcEncryptHeader :: AlgorithmI a => Ratchet a -> Maybe PQEncryption -> VersionE2E -> ExceptT CryptoError IO (MsgEncryptKey a, Ratchet a) rcEncryptHeader Ratchet {rcSnd = Nothing} _ _ = throwE CERatchetState rcEncryptHeader rc@Ratchet {rcSnd = Just sr@SndRatchet {rcCKs, rcHKs}, rcDHRs, rcKEM, rcNs, rcPN, rcAD = Str rcAD, rcSupportKEM, rcEnableKEM, rcVersion} pqEnc_ supportedE2EVersion = do From 326d6cc5591a77fc1e0c04a710b4247abaf06f17 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 12:17:46 +0000 Subject: [PATCH 10/61] code comments --- src/Simplex/Messaging/Crypto.hs | 10 ++++++++++ src/Simplex/Messaging/Crypto/File.hs | 4 ++++ src/Simplex/Messaging/Crypto/Lazy.hs | 6 ++++++ src/Simplex/Messaging/Crypto/Ratchet.hs | 9 +++++++++ src/Simplex/Messaging/Crypto/SNTRUP761.hs | 3 +++ src/Simplex/Messaging/Crypto/ShortLink.hs | 4 ++++ 6 files changed, 36 insertions(+) diff --git a/src/Simplex/Messaging/Crypto.hs b/src/Simplex/Messaging/Crypto.hs index 4324b6352..d283ab899 100644 --- a/src/Simplex/Messaging/Crypto.hs +++ b/src/Simplex/Messaging/Crypto.hs @@ -343,6 +343,7 @@ deriving instance Eq (PrivateKey a) deriving instance Show (PrivateKey a) +-- spec: spec/modules/Simplex/Messaging/Crypto.md#privatekeyed25519-strencoding-deliberately-omitted -- Do not enable, to avoid leaking key data -- instance StrEncoding (PrivateKey Ed25519) where @@ -736,6 +737,7 @@ generatePrivateAuthKey a g = APrivateAuthKey a <$> generatePrivateKey g generateDhKeyPair :: (AlgorithmI a, DhAlgorithm a) => SAlgorithm a -> TVar ChaChaDRG -> STM ADhKeyPair generateDhKeyPair a g = bimap (APublicDhKey a) (APrivateDhKey a) <$> generateKeyPair g +-- spec: spec/modules/Simplex/Messaging/Crypto.md#generatekeypair-is-stm generateKeyPair :: forall a. AlgorithmI a => TVar ChaChaDRG -> STM (KeyPair a) generateKeyPair g = stateTVar g (`withDRG` generateKeyPair_) @@ -826,6 +828,7 @@ instance CryptoSignature (Signature s) => Encoding (Signature s) where smpP = decodeSignature <$?> smpP {-# INLINE smpP #-} +-- spec: spec/modules/Simplex/Messaging/Crypto.md#signature-algorithm-detection instance CryptoSignature ASignature where signatureBytes (ASignature _ sig) = signatureBytes sig {-# INLINE signatureBytes #-} @@ -965,6 +968,7 @@ instance ToJSON IV where instance FromJSON IV where parseJSON = fmap IV . strParseJSON "IV" +-- spec: spec/modules/Simplex/Messaging/Crypto.md#gcmiv-constructor-not-exported -- | GCMIV bytes newtype. newtype GCMIV = GCMIV {unGCMIV :: ByteString} @@ -1081,6 +1085,7 @@ canPad msgLen paddedLen = msgLen <= maxMsgLen && padLen >= 0 where padLen = paddedLen - msgLen - 2 +-- spec: spec/modules/Simplex/Messaging/Crypto.md#pad--unpad--2-byte-length-prefix pad :: ByteString -> Int -> Either CryptoError ByteString pad msg paddedLen | len <= maxMsgLen && padLen >= 0 = Right $ encodeWord16 (fromIntegral len) <> msg <> B.replicate padLen '#' @@ -1290,6 +1295,7 @@ dh' (PublicKeyX25519 k) (PrivateKeyX25519 pk) = DhSecretX25519 $ X25519.dh k pk dh' (PublicKeyX448 k) (PrivateKeyX448 pk) = DhSecretX448 $ X448.dh k pk {-# INLINE dh' #-} +-- spec: spec/modules/Simplex/Messaging/Crypto.md#crypto_box--secret_box -- | NaCl @crypto_box@ encrypt with padding with a shared DH secret and 192-bit nonce. cbEncrypt :: DhSecret X25519 -> CbNonce -> ByteString -> Int -> Either CryptoError ByteString cbEncrypt (DhSecretX25519 secret) = sbEncrypt_ secret @@ -1359,6 +1365,7 @@ sbDecryptNoPad_ secret (CbNonce nonce) packet (rs, msg) = xSalsa20 secret nonce c tag = Poly1305.auth rs c +-- spec: spec/modules/Simplex/Messaging/Crypto.md#cbauthenticator -- type for authentication scheme using NaCl @crypto_box@ over the sha512 digest of the message. newtype CbAuthenticator = CbAuthenticator ByteString deriving (Eq, Show) @@ -1454,6 +1461,7 @@ randomSbKey gVar = SecretBoxKey <$> randomBytes 32 gVar newtype SbChainKey = SecretBoxChainKey {unSbChainKey :: ByteString} deriving (Eq, Show) +-- spec: spec/modules/Simplex/Messaging/Crypto.md#secret-box-chains-sbcinit--sbchkdf sbcInit :: ByteArrayAccess secret => ByteString -> secret -> (SbChainKey, SbChainKey) sbcInit salt secret = (SecretBoxChainKey ck1, SecretBoxChainKey ck2) where @@ -1474,6 +1482,7 @@ hkdf salt ikm info n = in H.expand prk info n {-# INLINE hkdf #-} +-- spec: spec/modules/Simplex/Messaging/Crypto.md#xsalsa20 xSalsa20 :: ByteArrayAccess key => key -> ByteString -> ByteString -> (ByteString, ByteString) xSalsa20 secret nonce msg = (rs, msg') where @@ -1501,6 +1510,7 @@ privateToX509 = \case encodeASNObj :: ASN1Object a => a -> ByteString encodeASNObj k = toStrict . encodeASN1 DER $ toASN1 k [] +-- spec: spec/modules/Simplex/Messaging/Crypto.md#key-encoding -- Decoding of binary X509 'CryptoPublicKey'. decodePubKey :: CryptoPublicKey k => ByteString -> Either String k decodePubKey = decodeASNKey >=> x509ToPublic >=> pubKey diff --git a/src/Simplex/Messaging/Crypto/File.hs b/src/Simplex/Messaging/Crypto/File.hs index 3ab491946..e07a0db37 100644 --- a/src/Simplex/Messaging/Crypto/File.hs +++ b/src/Simplex/Messaging/Crypto/File.hs @@ -2,6 +2,7 @@ {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/File.md module Simplex.Messaging.Crypto.File ( CryptoFile (..), CryptoFileArgs (..), @@ -51,6 +52,7 @@ data CryptoFileArgs = CFArgs {fileKey :: C.SbKey, fileNonce :: C.CbNonce} data CryptoFileHandle = CFHandle Handle (Maybe (TVar LC.SbState)) +-- spec: spec/modules/Simplex/Messaging/Crypto/File.md#readfile--writefile readFile :: CryptoFile -> ExceptT FTCryptoError IO LazyByteString readFile (CryptoFile path cfArgs) = do s <- liftIO $ LB.readFile path @@ -91,6 +93,7 @@ hGet (CFHandle h sb_) n = B.hGet h n >>= maybe pure decrypt sb_ where decrypt sb s = atomically $ stateTVar sb (`LC.sbDecryptChunk` s) +-- spec: spec/modules/Simplex/Messaging/Crypto/File.md#hgettag -- | Read and validate the auth tag. -- This function should be called after reading the whole file, it assumes you know the file size and read only the needed bytes. hGetTag :: CryptoFileHandle -> ExceptT FTCryptoError IO () @@ -113,6 +116,7 @@ plain = (`CryptoFile` Nothing) randomArgs :: TVar ChaChaDRG -> STM CryptoFileArgs randomArgs g = CFArgs <$> C.randomSbKey g <*> C.randomCbNonce g +-- spec: spec/modules/Simplex/Messaging/Crypto/File.md#getfilecontentssize getFileContentsSize :: CryptoFile -> IO Integer getFileContentsSize (CryptoFile path cfArgs) = do size <- getFileSize path diff --git a/src/Simplex/Messaging/Crypto/Lazy.hs b/src/Simplex/Messaging/Crypto/Lazy.hs index 6c0cf9613..192cd85b8 100644 --- a/src/Simplex/Messaging/Crypto/Lazy.hs +++ b/src/Simplex/Messaging/Crypto/Lazy.hs @@ -5,6 +5,7 @@ {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TupleSections #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md module Simplex.Messaging.Crypto.Lazy ( sha256Hash, sha512Hash, @@ -65,6 +66,7 @@ sha256Hash = BA.convert . (hashlazy :: LazyByteString -> Digest SHA256) sha512Hash :: LazyByteString -> ByteString sha512Hash = BA.convert . (hashlazy :: LazyByteString -> Digest SHA512) +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md#padding-8-byte-length-prefix -- this function does not validate the length of the message to avoid consuming all chunks, -- but if the passed string is longer it will truncate it to specified length pad :: LazyByteString -> Int64 -> Int64 -> Either CryptoError LazyByteString @@ -75,6 +77,7 @@ pad msg len paddedLen encodedLen = smpEncode len -- 8 bytes Int64 encoded length padLen = paddedLen - len - 8 +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md#fastreplicate fastReplicate :: Int64 -> Char -> LazyByteString fastReplicate n c | n <= 0 = LB.empty @@ -102,6 +105,7 @@ splitLen padded where (lenStr, rest) = LB.splitAt 8 padded +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md#auth-tag-placement-prepend-vs-tail -- | NaCl @secret_box@ lazy encrypt with a symmetric 256-bit key and 192-bit nonce. -- The resulting string will be bigger than paddedLen by the size of the auth tag (16 bytes). sbEncrypt :: SbKey -> CbNonce -> LazyByteString -> Int64 -> Int64 -> Either CryptoError LazyByteString @@ -148,6 +152,7 @@ sbEncryptTailTagNoPad :: SbKeyNonce -> LazyByteString -> Either CryptoError Lazy sbEncryptTailTagNoPad (SbKey key, CbNonce nonce) msg = LB.fromChunks <$> secretBoxTailTag sbEncryptChunk key nonce msg +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md#sbdecrypttailtag-validity -- | NaCl @secret_box@ decrypt with a symmetric 256-bit key and 192-bit nonce with appended auth tag (more efficient with large files). -- paddedLen should NOT include the tag length, it should be the same number that is passed to sbEncrypt / sbEncryptTailTag. sbDecryptTailTag :: SbKey -> CbNonce -> Int64 -> LazyByteString -> Either CryptoError (Bool, LazyByteString) @@ -226,6 +231,7 @@ sbProcessChunkLazy_ :: (SbState -> ByteString -> (ByteString, SbState)) -> SbSta sbProcessChunkLazy_ = first (LB.fromChunks . reverse) .:. secretBoxLazy_ {-# INLINE sbProcessChunkLazy_ #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/Lazy.md#encrypt-then-mac-asymmetry sbEncryptChunk :: SbState -> ByteString -> (ByteString, SbState) sbEncryptChunk (st, authSt) chunk = let (!c, !st') = XSalsa.combine st chunk diff --git a/src/Simplex/Messaging/Crypto/Ratchet.hs b/src/Simplex/Messaging/Crypto/Ratchet.hs index 1ea7760fd..5f91e728b 100644 --- a/src/Simplex/Messaging/Crypto/Ratchet.hs +++ b/src/Simplex/Messaging/Crypto/Ratchet.hs @@ -465,6 +465,7 @@ data RatchetInitParams = RatchetInitParams } deriving (Show) +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#pq-x3dh-key-agreement -- this is used by the peer joining the connection pqX3dhSnd :: DhAlgorithm a => PrivateKey a -> PrivateKey a -> Maybe APrivRKEMParams -> E2ERatchetParams 'RKSProposed a -> Either CryptoError (RatchetInitParams, Maybe KEMKeyPair) -- 3. replied 2. received @@ -588,6 +589,7 @@ data SkippedMsgDiff | SMDRemove HeaderKey Word32 | SMDAdd SkippedMsgKeys +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#skipped-message-keys -- | this function is only used in tests to apply changes in skipped messages, -- in the agent the diff is persisted, and the whole state is loaded for the next message. applySMDiff :: SkippedMsgKeys -> SkippedMsgDiff -> SkippedMsgKeys @@ -712,6 +714,7 @@ data MsgHeader a = MsgHeader } deriving (Show) +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#header-encryption-and-padding -- to allow extension without increasing the size, the actual header length is: -- 69 = 2 (original size) + 2 + 1+56 (Curve448) + 4 + 4 -- The exact size is 2288, added reserve @@ -763,6 +766,7 @@ encodeLarge v s | v >= pqRatchetE2EEncryptVersion = smpEncode $ Large s | otherwise = smpEncode s +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#largep--backward-compatible-length-prefix-parsing -- This parser relies on the fact that header cannot be shorter than 32 bytes (it is ~69 bytes without PQ KEM), -- therefore if the first byte is less or equal to 31 (x1F), then we have 2 byte-length limited to 8191. -- This allows upgrading the current version in one message. @@ -788,6 +792,7 @@ encRatchetMessageP = do (emAuthTag, Tail emBody) <- smpP pure EncRatchetMessage {emHeader, emBody, emAuthTag} +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#pqsupport-vs-pqencryption newtype PQEncryption = PQEncryption {enablePQ :: Bool} deriving (Eq, Show) @@ -863,6 +868,7 @@ instance StrEncoding PQSupport where strP = pqEncToSupport <$> strP {-# INLINE strP #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#initialkeys data InitialKeys = IKUsePQ -- use PQ keys in contact request and short link data | IKLinkPQ PQSupport -- use PQ keys in short link data only, if PQSupport enabled @@ -991,6 +997,7 @@ type DecryptResult a = (Either CryptoError ByteString, Ratchet a, SkippedMsgDiff maxSkip :: Word32 maxSkip = 512 +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#rcdecrypt-flow rcDecrypt :: forall a. (AlgorithmI a, DhAlgorithm a) => @@ -1073,6 +1080,7 @@ rcDecrypt g rc@Ratchet {rcRcv, rcAD = Str rcAD, rcVersion} rcMKSkipped msg' = do rcNHKs = rcNHKs', rcNHKr = rcNHKr' } + -- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#pq-ratchet-step pqRatchetStep :: Ratchet a -> Maybe ARKEMParams -> ExceptT CryptoError IO (Maybe KEMSharedKey, Maybe KEMSharedKey, Maybe RatchetKEM) pqRatchetStep Ratchet {rcKEM, rcEnableKEM = PQEncryption pqEnc, rcVersion = rv} = \case -- received message does not have KEM in header, @@ -1160,6 +1168,7 @@ rcDecrypt g rc@Ratchet {rcRcv, rcAD = Str rcAD, rcVersion} rcMKSkipped msg' = do -- DECRYPT(mk, cipher-text, CONCAT(AD, enc_header)) tryE $ decryptAEAD mk iv (rcAD <> emHeader) emBody emAuthTag +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#kdf-functions rootKdf :: (AlgorithmI a, DhAlgorithm a) => RatchetKey -> PublicKey a -> PrivateKey a -> Maybe KEMSharedKey -> (RatchetKey, RatchetKey, Key) rootKdf (RatchetKey rk) k pk kemSecret_ = let dhOut = dhBytes' (dh' k pk) diff --git a/src/Simplex/Messaging/Crypto/SNTRUP761.hs b/src/Simplex/Messaging/Crypto/SNTRUP761.hs index 839fbc1e7..d5415f829 100644 --- a/src/Simplex/Messaging/Crypto/SNTRUP761.hs +++ b/src/Simplex/Messaging/Crypto/SNTRUP761.hs @@ -2,6 +2,7 @@ {-# LANGUAGE GADTs #-} {-# LANGUAGE LambdaCase #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md module Simplex.Messaging.Crypto.SNTRUP761 ( KEMHybridSecret (..), kcbDecrypt, @@ -22,6 +23,7 @@ import Simplex.Messaging.Crypto.SNTRUP761.Bindings newtype KEMHybridSecret = KEMHybridSecret ScrubbedBytes +-- spec: spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md#kcbencrypt--kcbdecrypt -- | NaCl @crypto_box@ decrypt with a shared hybrid DH + KEM secret and 192-bit nonce. kcbDecrypt :: KEMHybridSecret -> CbNonce -> ByteString -> Either CryptoError ByteString kcbDecrypt (KEMHybridSecret k) = sbDecrypt_ k @@ -30,6 +32,7 @@ kcbDecrypt (KEMHybridSecret k) = sbDecrypt_ k kcbEncrypt :: KEMHybridSecret -> CbNonce -> ByteString -> Int -> Either CryptoError ByteString kcbEncrypt (KEMHybridSecret k) = sbEncrypt_ k +-- spec: spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md#kemhybridsecret kemHybridSecret :: PublicKeyX25519 -> PrivateKeyX25519 -> KEMSharedKey -> KEMHybridSecret kemHybridSecret k pk (KEMSharedKey kem) = let DhSecretX25519 dh = C.dh' k pk diff --git a/src/Simplex/Messaging/Crypto/ShortLink.hs b/src/Simplex/Messaging/Crypto/ShortLink.hs index 013559fd1..12124630f 100644 --- a/src/Simplex/Messaging/Crypto/ShortLink.hs +++ b/src/Simplex/Messaging/Crypto/ShortLink.hs @@ -9,6 +9,7 @@ {-# LANGUAGE TupleSections #-} {-# LANGUAGE TypeApplications #-} +-- spec: spec/modules/Simplex/Messaging/Crypto/ShortLink.md module Simplex.Messaging.Crypto.ShortLink ( contactShortLinkKdf, invShortLinkKdf, @@ -44,6 +45,7 @@ fixedDataPaddedLength = 2008 -- 2048 - 24 (nonce) - 16 (auth tag) userDataPaddedLength :: Int userDataPaddedLength = 13784 -- 13824 - 24 - 16 +-- spec: spec/modules/Simplex/Messaging/Crypto/ShortLink.md#kdf-schemes contactShortLinkKdf :: LinkKey -> (LinkId, C.SbKey) contactShortLinkKdf (LinkKey k) = let (lnkId, sbKey) = B.splitAt 24 $ C.hkdf "" k "SimpleXContactLink" 56 @@ -72,6 +74,7 @@ connLinkData vr = \case UserInvLinkData d -> InvitationLinkData vr d UserContactLinkData d -> ContactLinkData vr d +-- spec: spec/modules/Simplex/Messaging/Crypto/ShortLink.md#encodesign encodeSign :: C.PrivateKeyEd25519 -> ByteString -> ByteString encodeSign pk s = smpEncode (C.sign' pk s) <> s @@ -97,6 +100,7 @@ encryptData g k len s = do ct <- liftEitherWith cryptoError $ C.sbEncrypt k nonce s len pure $ EncDataBytes $ smpEncode nonce <> ct +-- spec: spec/modules/Simplex/Messaging/Crypto/ShortLink.md#decryptlinkdata decryptLinkData :: forall c. ConnectionModeI c => LinkKey -> C.SbKey -> QueueLinkData -> Either AgentErrorType (FixedLinkData c, ConnLinkData c) decryptLinkData linkKey k (encFD, encMD) = do (sig1, fd) <- decrypt encFD From 9e3b47a36237b059b65e5f23485caa4d1d00c12e Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 12:54:56 +0000 Subject: [PATCH 11/61] code refs, additional specs --- spec/modules/README.md | 23 +++++++++++++++ spec/modules/Simplex/Messaging/Crypto.md | 20 ++++++++++--- .../Simplex/Messaging/Crypto/Ratchet.md | 29 ++++++++++++++++++- src/Simplex/Messaging/Crypto.hs | 3 ++ src/Simplex/Messaging/Crypto/Ratchet.hs | 5 ++++ 5 files changed, 75 insertions(+), 5 deletions(-) diff --git a/spec/modules/README.md b/spec/modules/README.md index 9f057b903..ef2b45881 100644 --- a/spec/modules/README.md +++ b/spec/modules/README.md @@ -52,6 +52,21 @@ Things that would surprise a competent Haskell developer reading the code for th - Alternatives considered and rejected - Known limitations and their justification +## Non-obvious threshold + +The guiding principle: **non-obvious state machines and flows require documentation; standard things don't.** + +Document: +- Multi-step protocols and negotiation flows (e.g., KEM propose/accept round-trips) +- Monotonic or irreversible state transitions (e.g., PQ support can only be enabled, never disabled) +- Silent error behaviors (e.g., `verify` returns `False` on algorithm mismatch instead of an error) +- Design rationale for non-standard choices (e.g., why byte-reverse a nonce, why hash-then-encrypt for authenticators) + +Do NOT document: +- Standard algorithm properties (e.g., Ed25519 public key derivable from private key) +- Well-known protocol mechanics (e.g., HKDF usage per RFC 5869, deterministic nonce derivation in double ratchet) +- Implementation details that follow directly from the type signatures + ## What NOT to include - **Type signatures** — the code has them @@ -107,6 +122,14 @@ This is valuable — it confirms someone looked and found nothing to document. ## Linking conventions +### Module doc → protocol docs +When a module implements or is governed by a protocol specification in `protocol/`, link to it near the top of the module doc (after the overview). Do not duplicate protocol content — just reference it: +```markdown +**Protocol spec**: [`protocol/pqdr.md`](../../../../protocol/pqdr.md) — Post-quantum resistant augmented double ratchet algorithm. +``` + +This is especially important for modules in transport, protocol, client, server, and agent layers where behavior is defined by the protocol spec rather than being self-evident from the code. + ### Module doc → other module docs Use fully qualified names as link text: ```markdown diff --git a/spec/modules/Simplex/Messaging/Crypto.md b/spec/modules/Simplex/Messaging/Crypto.md index f1c660512..10f074d5b 100644 --- a/spec/modules/Simplex/Messaging/Crypto.md +++ b/spec/modules/Simplex/Messaging/Crypto.md @@ -55,10 +55,6 @@ Both apply `pad`/`unPad` by default. The `NoPad` variants skip padding. The XSalsa20 implementation splits the 24-byte nonce into two 8-byte halves. The first half initializes the cipher state (prepended with 16 zero bytes), the second derives a subkey. The first 32 bytes of output become the Poly1305 one-time key (`rs`), then the rest encrypts the message. This is the standard NaCl construction. -## CbAuthenticator - -An authentication scheme that encrypts the SHA-512 hash of the message using crypto_box, rather than the message itself. The result is 80 bytes (64 hash + 16 auth tag). Used for authenticating messages where the content is transmitted separately from the authentication proof. - ## Secret box chains (sbcInit / sbcHkdf) HKDF-based key chains for deriving sequential key+nonce pairs: @@ -77,6 +73,22 @@ All keys are encoded as ASN.1 DER (X.509 SubjectPublicKeyInfo for public, PKCS#8 `GCMIV` constructor is not exported — only `gcmIV :: ByteString -> Either CryptoError GCMIV` is available, which validates that the input is exactly 12 bytes. This prevents construction of invalid IVs. +## verify silently returns False on algorithm mismatch + +`verify :: APublicVerifyKey -> ASignature -> ByteString -> Bool` uses `testEquality` on the algorithm singletons. If the key is Ed25519 but the signature is Ed448 (or vice versa), `testEquality` fails and `verify` returns `False` — no error, no indication of a type mismatch. A correctly-formed signature can "fail" simply because the wrong algorithm key was passed. + +## dh' returns raw DH output — no key derivation + +`dh'` returns the raw X25519/X448 shared point with no hashing or HKDF. Callers must apply their own KDF: [SNTRUP761](./Crypto/SNTRUP761.md) hashes with SHA3-256, the [ratchet](./Crypto/Ratchet.md#kdf-functions) uses HKDF-SHA512. Not all DH libraries behave this way — some hash the output automatically. + +## reverseNonce + +`reverseNonce` creates a "reply" nonce by byte-reversing the original 24-byte nonce. Used for bidirectional communication where both sides need distinct nonces derived from the same starting value. The two nonces are guaranteed distinct unless the original is a byte palindrome, which is astronomically unlikely for random 24-byte values. + +## CbAuthenticator + +An authentication scheme that encrypts the SHA-512 hash of the message using crypto_box, rather than the message itself. The result is 80 bytes (64 hash + 16 auth tag). This is the djb-recommended authenticator scheme: it proves knowledge of the shared secret and the message content, without requiring the message to fit in a single crypto_box, and without revealing message content even to someone who compromises the shared key after verification. + ## generateKeyPair is STM Key generation uses `TVar ChaChaDRG` and runs in `STM`, not `IO`. This allows key generation inside `atomically` blocks, which is used extensively in handshake and ratchet initialization code. diff --git a/spec/modules/Simplex/Messaging/Crypto/Ratchet.md b/spec/modules/Simplex/Messaging/Crypto/Ratchet.md index ebbc9c5a6..b26f95ce6 100644 --- a/spec/modules/Simplex/Messaging/Crypto/Ratchet.md +++ b/spec/modules/Simplex/Messaging/Crypto/Ratchet.md @@ -12,6 +12,8 @@ Implements the Signal double ratchet protocol extended with: The ratchet uses X448 (not X25519) for DH operations — `type RatchetX448 = Ratchet 'X448`. +**Protocol spec**: [`protocol/pqdr.md`](../../../../protocol/pqdr.md) — Post-quantum resistant augmented double ratchet algorithm. + ## PQ X3DH key agreement `pqX3dhSnd` / `pqX3dhRcv` perform the extended X3DH: @@ -46,9 +48,13 @@ Each message header carries `msgMaxVersion` (the sender's max supported ratchet `largeP` detects the length-prefix format by peeking at the first byte: if < 32, it's a 2-byte `Large` prefix (new format); otherwise it's a 1-byte prefix (old format). This allows upgrading the header encoding format in a single message without a version bump. +## maxSkip = 512 — DoS protection + +`maxSkip` is a hardcoded constant (not configurable). Messages claiming to be more than 512 positions ahead of the current counter are rejected with `CERatchetTooManySkipped`. This prevents an attacker from forcing the receiver to compute and store an unbounded number of skipped message keys. + ## Skipped message keys -When messages arrive out of order, the ratchet computes and stores the message keys for skipped messages (up to `maxSkip = 512`). Skipped keys are stored in a `Map HeaderKey (Map Word32 MessageKey)` — keyed first by header key, then by message number. +When messages arrive out of order, the ratchet computes and stores the message keys for skipped messages (up to `maxSkip`). Skipped keys are stored in a `Map HeaderKey (Map Word32 MessageKey)` — keyed first by header key, then by message number. The `SkippedMsgDiff` type represents changes to the skipped key store as a diff rather than a full replacement — this is persisted to the database, and the full state is loaded for the next message. `applySMDiff` is only used in tests. @@ -61,6 +67,14 @@ Decryption tries three strategies in order: If strategy 1 decrypts the header but the message number isn't in skipped keys, it checks whether this header key corresponds to the current or next ratchet to decide whether to advance. +### decryptSkipped — linear scan through all stored header keys + +`decryptSkipped` iterates through ALL `(HeaderKey, SkippedHdrMsgKeys)` pairs, attempting header decryption with each key. When header decryption succeeds but the message number is NOT in the skipped keys for that header, the result is `SMHeader` — which includes whether the key matches the current ratchet (`rcHKr` → `SameRatchet`) or the next ratchet (`rcNHKr` → `AdvanceRatchet`). This falls through to normal decryption processing rather than producing an error. + +### decryptMessage — ratchet advances even on failure + +`decryptMessage` returns `Either CryptoError ByteString` inside the `ExceptT` monad — a message decryption failure does NOT abort the ratchet state update. The ratchet counter advances (`rcNr + 1`) and chain key updates (`rcCKr'`) regardless of whether the message body decrypts successfully. This preserves ratchet state consistency for retransmission and error recovery. + ## rcEncryptHeader — separated from rcEncryptMsg Encryption is split into two steps: `rcEncryptHeader` produces a `MsgEncryptKey` (containing the encrypted header and message key), then `rcEncryptMsg` uses that key to encrypt the message body. This separation allows the ratchet state to be updated (persisted) before the message is encrypted, which is important for crash recovery — if the process crashes after encrypting but before sending, the ratchet state must already reflect the advanced counter. @@ -80,6 +94,19 @@ Two distinct newtypes with identical structure (`Bool` wrapper): - `PQSupport`: whether PQ **can** be used (determines header padding size, cannot be disabled once enabled) - `PQEncryption`: whether PQ **is** being used for the current send/receive ratchet +### pqEnableSupport is monotonic + +`pqEnableSupport v sup enc = PQSupport $ sup || (v >= pqRatchetE2EEncryptVersion && enc)`. The `||` means once PQ support is `True`, it stays `True` regardless of subsequent messages. PQ encryption (usage) can be toggled per-message; PQ support (capability / header size) only ratchets up. This prevents the larger header format from being downgraded once negotiated. + +## replyKEM_ — two-step KEM negotiation + +KEM establishment requires two message round-trips, as described in the [PQDR KEM state machine](../../../../protocol/pqdr.md#kem-state-machine): + +1. **Propose**: if the sender has no KEM in their header but the replier supports PQ at sufficient version, the replier includes a KEM proposal (`RKParamsProposed` — their encapsulation public key) +2. **Accept**: if the sender proposed KEM, the replier accepts by encapsulating against the proposed key and including the ciphertext + their own new encapsulation key (`RKParamsAccepted`) + +After acceptance, both sides have a shared KEM secret that is folded into the root KDF. Subsequent ratchet steps continue the KEM exchange with fresh keypairs on each side. + ## Error semantics - `CERatchetEarlierMessage n`: message number is `n` positions before the next expected (already processed or skipped-and-consumed) diff --git a/src/Simplex/Messaging/Crypto.hs b/src/Simplex/Messaging/Crypto.hs index d283ab899..79a9b593c 100644 --- a/src/Simplex/Messaging/Crypto.hs +++ b/src/Simplex/Messaging/Crypto.hs @@ -1285,11 +1285,13 @@ verify' (PublicKeyEd25519 k) (SignatureEd25519 sig) msg = Ed25519.verify k msg s verify' (PublicKeyEd448 k) (SignatureEd448 sig) msg = Ed448.verify k msg sig {-# INLINE verify' #-} +-- spec: spec/modules/Simplex/Messaging/Crypto.md#verify-silently-returns-false-on-algorithm-mismatch verify :: APublicVerifyKey -> ASignature -> ByteString -> Bool verify (APublicVerifyKey a k) (ASignature a' sig) msg = case testEquality a a' of Just Refl -> verify' k sig msg _ -> False +-- spec: spec/modules/Simplex/Messaging/Crypto.md#dh-returns-raw-dh-output--no-key-derivation dh' :: DhAlgorithm a => PublicKey a -> PrivateKey a -> DhSecret a dh' (PublicKeyX25519 k) (PrivateKeyX25519 pk) = DhSecretX25519 $ X25519.dh k pk dh' (PublicKeyX448 k) (PrivateKeyX448 pk) = DhSecretX448 $ X448.dh k pk @@ -1418,6 +1420,7 @@ randomCbNonce = fmap CryptoBoxNonce . randomBytes 24 randomBytes :: Int -> TVar ChaChaDRG -> STM ByteString randomBytes n gVar = stateTVar gVar $ randomBytesGenerate n +-- spec: spec/modules/Simplex/Messaging/Crypto.md#reversenonce reverseNonce :: CbNonce -> CbNonce reverseNonce (CryptoBoxNonce s) = CryptoBoxNonce (B.reverse s) diff --git a/src/Simplex/Messaging/Crypto/Ratchet.hs b/src/Simplex/Messaging/Crypto/Ratchet.hs index 5f91e728b..02ddd6a68 100644 --- a/src/Simplex/Messaging/Crypto/Ratchet.hs +++ b/src/Simplex/Messaging/Crypto/Ratchet.hs @@ -840,9 +840,11 @@ pqEncToSupport (PQEncryption pq) = PQSupport pq pqSupportAnd :: PQSupport -> PQSupport -> PQSupport pqSupportAnd (PQSupport s1) (PQSupport s2) = PQSupport $ s1 && s2 +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#pqenablesupport-is-monotonic pqEnableSupport :: VersionE2E -> PQSupport -> PQEncryption -> PQSupport pqEnableSupport v (PQSupport sup) (PQEncryption enc) = PQSupport $ sup || (v >= pqRatchetE2EEncryptVersion && enc) +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#replykem_--two-step-kem-negotiation replyKEM_ :: VersionE2E -> Maybe (RKEMParams 'RKSProposed) -> PQSupport -> Maybe AUseKEM replyKEM_ v kem_ = \case PQSupportOn | v >= pqRatchetE2EEncryptVersion -> Just $ case kem_ of @@ -994,6 +996,7 @@ data RatchetStep = AdvanceRatchet | SameRatchet type DecryptResult a = (Either CryptoError ByteString, Ratchet a, SkippedMsgDiff) +-- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#maxskip--512--dos-protection maxSkip :: Word32 maxSkip = 512 @@ -1131,6 +1134,7 @@ rcDecrypt g rc@Ratchet {rcRcv, rcAD = Str rcAD, rcVersion} rcMKSkipped msg' = do let (ck', mk, iv, _) = chainKdf ck mks' = M.insert msgNs (MessageKey mk iv) mks in advanceRcvRatchet (n - 1) ck' (msgNs + 1) mks' + -- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#decryptskipped--linear-scan-through-all-stored-header-keys decryptSkipped :: EncMessageHeader -> EncRatchetMessage -> ExceptT CryptoError IO (SkippedMessage a) decryptSkipped encHdr encMsg = tryDecryptSkipped SMNone $ M.assocs rcMKSkipped where @@ -1163,6 +1167,7 @@ rcDecrypt g rc@Ratchet {rcRcv, rcAD = Str rcAD, rcVersion} rcMKSkipped msg' = do decryptHeader k EncMessageHeader {ehVersion, ehBody, ehAuthTag, ehIV} = do header <- decryptAEAD k ehIV rcAD ehBody ehAuthTag `catchE` \_ -> throwE CERatchetHeader parseE' CryptoHeaderError (msgHeaderP ehVersion) header + -- spec: spec/modules/Simplex/Messaging/Crypto/Ratchet.md#decryptmessage--ratchet-advances-even-on-failure decryptMessage :: MessageKey -> EncRatchetMessage -> ExceptT CryptoError IO (Either CryptoError ByteString) decryptMessage (MessageKey mk iv) EncRatchetMessage {emHeader, emBody, emAuthTag} = -- DECRYPT(mk, cipher-text, CONCAT(AD, enc_header)) From 35d4065f325373ca883b9388d826b344a14b8cd9 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 15:32:02 +0000 Subject: [PATCH 12/61] specs for transport --- spec/TOPICS.md | 6 + .../modules/Simplex/FileTransfer/Transport.md | 23 ++++ .../Messaging/Notifications/Transport.md | 36 ++++++ spec/modules/Simplex/Messaging/Transport.md | 114 ++++++++++++++++++ .../Simplex/Messaging/Transport/Buffer.md | 17 +++ .../Simplex/Messaging/Transport/Client.md | 23 ++++ .../Messaging/Transport/Credentials.md | 13 ++ .../Simplex/Messaging/Transport/HTTP2.md | 13 ++ .../Messaging/Transport/HTTP2/Client.md | 19 +++ .../Simplex/Messaging/Transport/HTTP2/File.md | 7 ++ .../Messaging/Transport/HTTP2/Server.md | 15 +++ .../Simplex/Messaging/Transport/KeepAlive.md | 11 ++ .../Simplex/Messaging/Transport/Server.md | 33 +++++ .../Simplex/Messaging/Transport/Shared.md | 28 +++++ .../Simplex/Messaging/Transport/WebSockets.md | 15 +++ src/Simplex/FileTransfer/Transport.hs | 2 + src/Simplex/Messaging/Transport.hs | 6 + .../Messaging/Transport/Credentials.hs | 1 + .../Messaging/Transport/HTTP2/Client.hs | 1 + src/Simplex/Messaging/Transport/Server.hs | 2 + src/Simplex/Messaging/Transport/Shared.hs | 1 + src/Simplex/Messaging/Transport/WebSockets.hs | 1 + 22 files changed, 387 insertions(+) create mode 100644 spec/modules/Simplex/FileTransfer/Transport.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Transport.md create mode 100644 spec/modules/Simplex/Messaging/Transport.md create mode 100644 spec/modules/Simplex/Messaging/Transport/Buffer.md create mode 100644 spec/modules/Simplex/Messaging/Transport/Client.md create mode 100644 spec/modules/Simplex/Messaging/Transport/Credentials.md create mode 100644 spec/modules/Simplex/Messaging/Transport/HTTP2.md create mode 100644 spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md create mode 100644 spec/modules/Simplex/Messaging/Transport/HTTP2/File.md create mode 100644 spec/modules/Simplex/Messaging/Transport/HTTP2/Server.md create mode 100644 spec/modules/Simplex/Messaging/Transport/KeepAlive.md create mode 100644 spec/modules/Simplex/Messaging/Transport/Server.md create mode 100644 spec/modules/Simplex/Messaging/Transport/Shared.md create mode 100644 spec/modules/Simplex/Messaging/Transport/WebSockets.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index a8eafc1a1..d7a627254 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -7,3 +7,9 @@ - **Padding schemes**: Three different padding formats across the codebase — Crypto.hs uses 2-byte Word16 length prefix (max ~65KB), Crypto/Lazy.hs uses 8-byte Int64 prefix (file-sized), and both use '#' fill character. Ratchet header padding uses fixed sizes (88 or 2310 bytes). All use `pad`/`unPad` but with incompatible formats. The relationship between padding, encryption, and message size limits spans Crypto, Lazy, Ratchet, and the protocol layer. - **NaCl construction variants**: crypto_box, secret_box, and KEM hybrid secret all use the same XSalsa20+Poly1305 core (Crypto.hs `xSalsa20`), but with different key sources (DH, symmetric, SHA3_256(DH||KEM)). The lazy streaming variant (Lazy.hs) adds prepend-tag vs tail-tag placement. File.hs wraps lazy streaming with handle-based I/O. Full picture requires reading Crypto.hs, Lazy.hs, File.hs, and SNTRUP761.hs together. + +- **Transport encryption layering**: Three encryption layers overlap — TLS (Transport.hs), optional block encryption via sbcHkdf chains (Transport.hs tPutBlock/tGetBlock), and SMP protocol-level encryption. Block encryption is disabled for proxy connections (already encrypted), and absent for NTF protocol. The interaction of these layers with proxy version downgrade logic spans Transport.hs, Client.hs, and the SMP proxy module. + +- **Certificate chain trust model**: ChainCertificates (Shared.hs) defines 0–4 cert chain semantics, used by both Client.hs (validateCertificateChain) and Server.hs (validateClientCertificate, SNI credential switching). The 4-length case skipping index 2 (operator cert) and the FQHN-disabled x509validate are decisions that span the entire transport security model. + +- **Handshake protocol family**: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols. diff --git a/spec/modules/Simplex/FileTransfer/Transport.md b/spec/modules/Simplex/FileTransfer/Transport.md new file mode 100644 index 000000000..6bad5455c --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Transport.md @@ -0,0 +1,23 @@ +# Simplex.FileTransfer.Transport + +> XFTP protocol types, version negotiation, and encrypted file streaming with integrity verification. + +**Source**: [`FileTransfer/Transport.hs`](../../../../src/Simplex/FileTransfer/Transport.hs) + +## xftpClientHandshakeStub — XFTP doesn't use TLS handshake + +`xftpClientHandshakeStub` always fails with `throwE TEVersion`. The source comment states: "XFTP protocol does not use this handshake method." The XFTP handshake is performed at the HTTP/2 layer — `XFTPServerHandshake` and `XFTPClientHandshake` are sent as HTTP/2 request/response bodies (see `FileTransfer/Client.hs` and `FileTransfer/Server.hs`). + +## receiveSbFile — constant-time auth tag verification + +`receiveSbFile` validates the authentication tag using `BA.constEq` (constant-time byte comparison). The auth tag is collected from the stream after all file data — if the file data ends mid-chunk, the remaining bytes of that chunk are used first, and a follow-up read provides the rest of the tag if needed. + +## receiveFile_ — two-phase integrity verification + +File reception has two verification phases: +1. **During receive**: either size checking (plaintext via `hReceiveFile`) or auth tag validation (encrypted via `receiveSbFile`) +2. **After receive**: `LC.sha256Hash` of the entire received file is compared to `chunkDigest` + +## sendEncFile — auth tag appended after all chunks + +`sendEncFile` streams encrypted chunks via `LC.sbEncryptChunk`, then sends `LC.sbAuth sbState` (the authentication tag) as a final frame when the remaining size reaches zero. diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md new file mode 100644 index 000000000..dd4564738 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -0,0 +1,36 @@ +# Simplex.Messaging.Notifications.Transport + +> Notification Router Protocol transport: manages push notification subscriptions between client and NTF Router. + +**Source**: [`Notifications/Transport.hs`](../../../../../src/Simplex/Messaging/Notifications/Transport.hs) + +**Protocol spec**: [`protocol/push-notifications.md`](../../../../../protocol/push-notifications.md) — SimpleX Notification Router protocol. + +## Overview + +This module implements the transport layer for the **Notification Router Protocol**. Per the protocol spec: "To manage notification subscriptions to SMP routers, SimpleX Notification Router provides an RPC protocol with a similar design to SimpleX Messaging Protocol router." + +The protocol spec diagram shows three separate protocols in the notification flow: +1. **Notification Router Protocol** (this module): client ↔ SimpleX Notification Router — subscription management +2. **SMP protocol**: SMP Router → SimpleX Notifications Subscriber — notification signals +3. **Push provider** (e.g., APN): SimpleX Push Router → device — per the spec: "the notifications are e2e encrypted between SimpleX Notification Router and the user's device" + +## Differences from SMP transport + +The NTF protocol reuses SMP's transport infrastructure but with reduced parameters: + +| Property | SMP | NTF | +|----------|-----|-----| +| Block size | 16384 | 512 | +| Block encryption | Yes (v11+) | No (`encryptBlock = Nothing`) | +| Service certificates | Yes (v16+) | No (`serviceAuth = False`) | +| Version range | 6–19 | 1–3 | +| Handshake messages | 2–3 | 2 | + +## Same ALPN/legacy fallback pattern as SMP + +`ntfServerHandshake` uses the same pattern as `smpServerHandshake`: if ALPN is not negotiated (`getSessionALPN` returns `Nothing`), the server offers only `legacyServerNTFVRange` (v1 only). + +## NTF handshake uses SMP shared types + +The handshake reuses SMP's `THandle`, `THandleParams`, `THandleAuth` types. The `encodeAuthEncryptCmds` and `authEncryptCmdsP` helper functions are defined locally in this module (with NTF-specific version thresholds). NTF never sets `sessSecret` / `sessSecret'`, `peerClientService`, or `clientService` — these are always `Nothing`. diff --git a/spec/modules/Simplex/Messaging/Transport.md b/spec/modules/Simplex/Messaging/Transport.md new file mode 100644 index 000000000..f88188792 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport.md @@ -0,0 +1,114 @@ +# Simplex.Messaging.Transport + +> SMP transport layer: TLS connection management, SMP handshake protocol, block encryption, version negotiation. + +**Source**: [`Transport.hs`](../../../../src/Simplex/Messaging/Transport.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md` — Transport connection](../../../../protocol/simplex-messaging.md#transport-connection-with-the-smp-router) — SMP encrypted transport, handshake syntax, certificate chain requirements. + +## Overview + +This is the core transport module. It defines: +- The `Transport` typeclass abstracting over TLS and WebSocket connections +- The SMP handshake protocol (server and client sides) +- Optional block encryption using HKDF-derived symmetric key chains (v11+) +- Version negotiation with backward-compatible extensions + +Per the protocol spec: "Each transport block has a fixed size of 16384 bytes for traffic uniformity." The `sessionIdentifier` field uses tls-unique channel binding (RFC 5929) — "it should be included in authorized part of all SMP transmissions sent in this transport connection." + +## SMP version 13 is missing + +The version history jumps from 12 (`blockedEntitySMPVersion`) to 14 (`proxyServerHandshakeSMPVersion`). Version 13 was skipped. + +## proxiedSMPRelayVersion — anti-fingerprinting cap + +`proxiedSMPRelayVersion = 18`, one below `currentClientSMPRelayVersion = 19`. The code comment states: "SMP proxy sets it to lower than its current version to prevent client version fingerprinting by the destination relays when clients upgrade at different times." + +In practice (Server.hs), the SMP proxy uses `proxiedSMPRelayVRange` to cap the destination relay's version range in the `PKEY` response sent to the client, so the client sees a capped version range rather than the relay's actual range. + +## withTlsUnique — different API calls yield same value + +`withTlsUnique` extracts the tls-unique channel binding (RFC 5929) using a type-level dispatch: +- **Server** (`STServer`): `T.getPeerFinished` — the peer's (client's) Finished message +- **Client** (`STClient`): `T.getFinished` — own (client's) Finished message + +Both calls yield the client's Finished message. If the result is `Nothing`, the connection is closed immediately (`closeTLS cxt >> ioe_EOF`). + +## defaultSupportedParams vs defaultSupportedParamsHTTPS + +Two TLS parameter sets: + +- **`defaultSupportedParams`**: ChaCha20-Poly1305 ciphers only, Ed448/Ed25519 signatures only, X448/X25519 groups. Per the protocol spec: "TLS_CHACHA20_POLY1305_SHA256 cipher suite, ed25519 EdDSA algorithms for signatures, x25519 ECDHE groups for key exchange." +- **`defaultSupportedParamsHTTPS`**: extends `defaultSupportedParams` with `ciphersuite_strong`, additional groups, and additional hash/signature combinations. The source comment says: "A selection of extra parameters to accomodate browser chains." + +In the SMP server (Server.hs), when HTTP credentials are configured, `defaultSupportedParamsHTTPS` is used for all connections on that port (not selected per-connection). When no HTTP credentials are configured, `defaultSupportedParams` is used. + +## SMP handshake flow + +Per the [protocol spec](../../../../protocol/simplex-messaging.md#transport-handshake), the handshake is a two-message exchange (three if service certs are used): + +1. **Server → Client**: `paddedRouterHello` containing `smpVersionRange`, `sessionIdentifier` (tls-unique), and `routerCertKey` (certificate chain + X25519 key signed by the server's certificate) +2. **Client → Server**: `paddedClientHello` containing agreed `smpVersion`, `keyHash` (router identity — CA certificate fingerprint), optional `clientKey`, `proxyRouter` flag, and optional `clientService` +3. **Server → Client** (service only): `paddedRouterHandshakeResponse` containing assigned `serviceId` or `handshakeError` + +The client verifies `sessionIdentifier` matches its own tls-unique (`when (sessionId /= sessId) $ throwE TEBadSession`). The server verifies `keyHash` matches its CA fingerprint (`when (keyHash /= kh) $ throwE $ TEHandshake IDENTITY`). + +Per the protocol spec: "For TLS transport client should assert that sessionIdentifier is equal to tls-unique channel binding defined in RFC 5929." + +### legacyServerSMPRelayVRange when no ALPN + +If ALPN is not negotiated (`getSessionALPN c` returns `Nothing`), the server offers `legacyServerSMPRelayVRange` (v6 only) instead of the full version range. Per the protocol spec: "If the client does not confirm this protocol name, the router would fall back to v6 of SMP protocol." The spec notes: "This is added to allow support of older clients without breaking backward compatibility and to extend or modify handshake syntax." + +### Service certificate handshake extension + +When `clientService` is present in the client handshake, the server performs additional verification: +- The TLS client certificate chain must exactly match the certificate chain in the handshake message (`getPeerCertChain c == cc`) +- The signed X25519 public key is verified against the leaf certificate's key (`getCertVerifyKey leafCert` then `C.verifyX509`) +- On success, the server sends `SMPServerHandshakeResponse` with a `serviceId` +- On failure, the server sends `SMPServerHandshakeError` before raising the error + +Per the protocol spec (v16+): "`clientService` provides long-term service client certificate for high-volume services using SMP router (chat relays, notification routers, high traffic bots). The router responds with a third handshake message containing the assigned service ID." + +The client only includes service credentials when `v >= serviceCertsSMPVersion && certificateSent c` (the TLS client certificate was actually sent). + +## tPutBlock / tGetBlock — optional block encryption + +When `encryptBlock` is set, transport blocks are encrypted before being sent over TLS: + +- **Send**: `atomically $ stateTVar sndKey C.sbcHkdf` advances the chain key and returns `(SbKey, CbNonce)`; the block is encrypted with `C.sbEncrypt` +- **Receive**: same pattern with `rcvKey` and `C.sbDecrypt` + +The chain keys are initialized from `C.sbcInit sessionId secret` where `sessionId` is the tls-unique value and `secret` is the session DH shared secret. + +The code comment on `proxyServer` flag states: "This property, if True, disables additional transport encrytion inside TLS. (Proxy server connection already has additional encryption, so this layer is not needed there)." The protocol spec confirms: "`proxyRouter` flag (v14+) disables additional transport encryption inside TLS for proxy connections, since proxy router connection already has additional encryption." + +The protocol spec version history (v11) describes this as "additional encryption of transport blocks with forward secrecy." + +## smpTHandleClient — chain key swap + +`smpTHandleClient` applies `swap` to the chain key pair before creating `TSbChainKeys`. The code comment states: "swap is needed to use client's sndKey as server's rcvKey and vice versa." + +## Proxy version downgrade logic + +When the proxy connects to a destination relay older than v14 (`proxyServerHandshakeSMPVersion`), the client-side handshake caps the version range: + +``` +if proxyServer && maxVersion smpVersionRange < proxyServerHandshakeSMPVersion + then vRange {maxVersion = max (minVersion vRange) deletedEventSMPVersion} +``` + +The code comment explains: "Transport encryption between proxy and destination breaks clients with version 10 or earlier, because of a larger message size (see maxMessageLength)." The cap at `deletedEventSMPVersion` (v10) ensures transport encryption (v11+) is not negotiated with older relays. + +The comment also notes: "Prior to version v6.3 the version between proxy and destination was capped at 8, by mistake, which also disables transport encryption and the latest features." + +## forceCertChain + +`forceCertChain` forces evaluation of the certificate chain and signed key via `length (show cc) `seq` show signedKey `seq` cert`. Introduced in commit 9e7e0d10 ("smp-server: conserve resources"), sub-bullet "transport: force auth params, remove async wrapper" — part of a commit that adds strictness annotations throughout (`bang more thunks`, `strict`). + +## smpTHandle — version 0 bootstrap + +`smpTHandle` creates a `THandle` with version 0, no auth, and no block encryption. This handle is used for the handshake exchange itself (`sendHandshake`/`getHandshake`). After the handshake completes, `smpTHandle_` creates the real handle with the negotiated version, auth, and encryption parameters. + +## getHandshake — forward-compatible parsing + +The code comment states: "ignores tail bytes to allow future extensions." The protocol spec confirms: "`ignoredPart` in handshake allows to add additional parameters in handshake without changing protocol version — the client and routers must ignore any extra bytes within the original block length." diff --git a/spec/modules/Simplex/Messaging/Transport/Buffer.md b/spec/modules/Simplex/Messaging/Transport/Buffer.md new file mode 100644 index 000000000..6b1edf9fd --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/Buffer.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Transport.Buffer + +> Buffered TLS reading with TMVar-based concurrency lock. + +**Source**: [`Transport/Buffer.hs`](../../../../../src/Simplex/Messaging/Transport/Buffer.hs) + +## TBuffer — concurrent read safety via getLock + +`TBuffer` uses a `TMVar ()` as a mutex (`getLock`). `getBuffered` acquires the lock via `withBufferLock`, then loops and accumulates bytes until the requested count is reached. + +## getBuffered — first chunk has no timeout + +`getBuffered` reads the first chunk via `getChunk` (no timeout), but applies `withTimedErr t_` (the transport timeout) to subsequent chunks. + +## getLnBuffered — test only + +The source comment states: "This function is only used in test and needs to be improved before it can be used in production, it will never complete if TLS connection is closed before there is newline." diff --git a/spec/modules/Simplex/Messaging/Transport/Client.md b/spec/modules/Simplex/Messaging/Transport/Client.md new file mode 100644 index 000000000..bdc1cdb82 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/Client.md @@ -0,0 +1,23 @@ +# Simplex.Messaging.Transport.Client + +> TLS client connection setup: TCP/SOCKS5 connection, TLS handshake, certificate validation, host types. + +**Source**: [`Transport/Client.hs`](../../../../../src/Simplex/Messaging/Transport/Client.hs) + +## ConnectionHandle — three-stage cleanup + +`ConnectionHandle` has three constructors: `CHSocket` (raw socket), `CHContext` (TLS context), `CHTransport` (transport connection). An `IORef` holds the current handle, updated by `set` on each successful transition. The `E.bracket` cleanup function tears down the connection at whatever stage it reached. + +## SocksIsolateByAuth + +`SocksIsolateByAuth` is the default SOCKS authentication mode. When active, [Simplex.Messaging.Client](../Client.md) generates SOCKS credentials (`SocksCredentials sessionUsername ""`) where `sessionUsername` is `B64.encode $ C.sha256Hash $ bshow userId <> ...` with additional components based on `sessionMode` (`TSMUser`, `TSMSession`, `TSMServer`, `TSMEntity`). + +The three modes defined here: `SocksAuthUsername` (explicit credentials), `SocksAuthNull` (no auth, `@` prefix), `SocksIsolateByAuth` (empty string — credentials generated by the caller). + +## validateCertificateChain + +Validation checks the SHA-256 fingerprint of the identity certificate (extracted via `chainIdCaCerts` — see [Shared.md](./Shared.md#chainidcacerts--certificate-chain-semantics)) against the key hash. If the fingerprint doesn't match, the chain is rejected with `UnknownCA`. If the fingerprint matches, standard X.509 validation is performed using the CA certificate as trust anchor. + +## No TLS timeout for client connections + +The code comment states: "No TLS timeout to avoid failing connections via SOCKS." `transportTimeout` is set to `Nothing` for all client connections via `clientTransportConfig`. diff --git a/spec/modules/Simplex/Messaging/Transport/Credentials.md b/spec/modules/Simplex/Messaging/Transport/Credentials.md new file mode 100644 index 000000000..9aefa53ae --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/Credentials.md @@ -0,0 +1,13 @@ +# Simplex.Messaging.Transport.Credentials + +> Certificate generation for transport layer: Ed25519 key pairs, X.509 signing, TLS credential extraction. + +**Source**: [`Transport/Credentials.hs`](../../../../../src/Simplex/Messaging/Transport/Credentials.hs) + +## genCredentials — nanosecond stripping + +`genCredentials` zeroes out nanoseconds from the current time before creating the certificate validity period: `todNSec = 0`. The source comment explains: "remove nanoseconds from time - certificate encoding/decoding removes them." + +## tlsCredentials — root fingerprint from last credential + +`tlsCredentials` extracts the SHA-256 fingerprint from `L.last credentials` (the root/CA certificate), and the private key from `L.head credentials` (the leaf). The returned `KeyHash` wraps this root fingerprint. diff --git a/spec/modules/Simplex/Messaging/Transport/HTTP2.md b/spec/modules/Simplex/Messaging/Transport/HTTP2.md new file mode 100644 index 000000000..ad4b5de57 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/HTTP2.md @@ -0,0 +1,13 @@ +# Simplex.Messaging.Transport.HTTP2 + +> Bridges TLS transport to HTTP/2 configuration, buffer management, and body reading. + +**Source**: [`Transport/HTTP2.hs`](../../../../../src/Simplex/Messaging/Transport/HTTP2.hs) + +## allocHTTP2Config — manual buffer allocation + +`allocHTTP2Config` uses `mallocBytes` to allocate a write buffer (`Ptr Word8`) for the `http2` package's `Config`. The config bridges TLS to HTTP/2 by passing `cPut c` and `cGet c` from the `Transport` typeclass into the HTTP/2 config's `confSendAll` and `confReadN`. + +## http2TLSParams + +`http2TLSParams` uses `ciphersuite_strong_det` (from `Network.TLS.Extra`), distinct from the `ciphersuite_strong` used in `defaultSupportedParamsHTTPS`. This is the default `suportedTLSParams` in the HTTP/2 client configuration. diff --git a/spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md b/spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md new file mode 100644 index 000000000..0ff0e0af8 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md @@ -0,0 +1,19 @@ +# Simplex.Messaging.Transport.HTTP2.Client + +> Thread-safe HTTP/2 client with request queuing, connection lifecycle, and timeout management. + +**Source**: [`Transport/HTTP2/Client.hs`](../../../../../../src/Simplex/Messaging/Transport/HTTP2/Client.hs) + +## sendRequest vs sendRequestDirect — thread safety + +`sendRequest` is thread-safe: it puts the request on a `TBQueue` and waits for the response via a `TMVar`. A single background thread (`process`) dequeues and sends requests sequentially through the HTTP/2 session. + +`sendRequestDirect` bypasses the queue and calls `sendReq` directly. The source comment warns: "this function should not be used until HTTP2 is thread safe, use sendRequest." + +## attachHTTP2Client — runs on both client and server TLS + +The source comment states: "HTTP2 client can be run on both client and server TLS connections." `attachHTTP2Client` takes a `TLS p` where `p` can be `TClient` or `TServer`, allowing an HTTP/2 client session to run on an existing server-side TLS connection. + +## Connection timeout and async lifecycle + +`getVerifiedHTTP2ClientWith` starts the HTTP/2 session in an `async` and waits up to `connTimeout` for the session to establish (signal via `TMVar`). If the timeout fires, the async is cancelled. If the session establishes successfully, the `action` field holds the async handle — `closeHTTP2Client` cancels it with `uninterruptibleCancel`. diff --git a/spec/modules/Simplex/Messaging/Transport/HTTP2/File.md b/spec/modules/Simplex/Messaging/Transport/HTTP2/File.md new file mode 100644 index 000000000..8d03d8239 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/HTTP2/File.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Transport.HTTP2.File + +> File transfer over HTTP/2: chunked send/receive with size tracking. + +**Source**: [`Transport/HTTP2/File.hs`](../../../../../../src/Simplex/Messaging/Transport/HTTP2/File.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Transport/HTTP2/Server.md b/spec/modules/Simplex/Messaging/Transport/HTTP2/Server.md new file mode 100644 index 000000000..931f89bca --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/HTTP2/Server.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Transport.HTTP2.Server + +> HTTP/2 server with inactive client expiration. The single-queue server is for testing only. + +**Source**: [`Transport/HTTP2/Server.hs`](../../../../../../src/Simplex/Messaging/Transport/HTTP2/Server.hs) + +## Inactive client expiration + +`runHTTP2ServerWith_` tracks last activity per client via a `TVar SystemTime`. A background thread (`expireInactiveClient`) periodically checks whether the client has been inactive beyond the `ExpirationConfig` threshold. If so, it calls `closeConnection tls`. + +The activity timestamp is updated on every HTTP/2 request (before dispatching to the handler). + +## getHTTP2Server — testing only + +The source comment states: "This server is for testing only, it processes all requests in a single queue." `getHTTP2Server` puts all requests on a single `TBQueue`. `runHTTP2Server` dispatches requests directly via `H.run` without queueing. diff --git a/spec/modules/Simplex/Messaging/Transport/KeepAlive.md b/spec/modules/Simplex/Messaging/Transport/KeepAlive.md new file mode 100644 index 000000000..1f943bafa --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/KeepAlive.md @@ -0,0 +1,11 @@ +# Simplex.Messaging.Transport.KeepAlive + +> Platform-specific TCP keepalive configuration via CApiFFI. + +**Source**: [`Transport/KeepAlive.hs`](../../../../../src/Simplex/Messaging/Transport/KeepAlive.hs) + +## Platform-specific TCP_KEEPIDLE + +macOS uses `TCP_KEEPALIVE` instead of `TCP_KEEPIDLE`. The CPP conditional imports the correct constant at compile time via `foreign import capi`. Windows uses hardcoded numeric values — the source comment states: "The values are copied from windows::Win32::Networking::WinSock." + +Defaults: idle=30s, interval=15s, count=4. diff --git a/spec/modules/Simplex/Messaging/Transport/Server.md b/spec/modules/Simplex/Messaging/Transport/Server.md new file mode 100644 index 000000000..181951dcd --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/Server.md @@ -0,0 +1,33 @@ +# Simplex.Messaging.Transport.Server + +> TLS server: socket lifecycle, client acceptance, SNI credential switching, socket leak detection. + +**Source**: [`Transport/Server.hs`](../../../../../src/Simplex/Messaging/Transport/Server.hs) + +## safeAccept — errno-based retry + +`safeAccept` retries `accept()` on specific errno values. The code comment references the POSIX man page: "man accept says: For reliable operation the application should detect the network errors defined for the protocol after accept() and treat them like EAGAIN by retrying." The retry set: `eCONNABORTED, eAGAIN, eNETDOWN, ePROTO, eNOPROTOOPT, eHOSTDOWN, eNONET, eHOSTUNREACH, eOPNOTSUPP, eNETUNREACH`. Any other error is logged and re-thrown. + +## SocketState — leak detection + +`SocketState = (TVar Int, TVar Int, TVar (IntMap (Weak ThreadId)))` tracks: accepted count, gracefully-closed count, and active client threads. `getSocketStats` computes `socketsLeaked = socketsAccepted - socketsClosed - socketsActive`. + +## closeServer — weak thread references + +`closeServer` kills active client threads via `Weak ThreadId`. The code: `readTVarIO clients >>= mapM_ (deRefWeak >=> mapM_ killThread)`. `deRefWeak` returns `Nothing` if the thread has already been garbage collected, so the shutdown does not fail on already-dead threads. + +## SNI credential switching + +`supportedTLSServerParams` selects TLS credentials based on SNI: +- **No SNI**: uses `credential` (the primary server credential) +- **SNI present**: uses `sniCredential` (when configured) + +The `sniCredUsed` TVar records whether SNI triggered credential switching. In the SMP server (Server.hs), when `sniUsed` is `True`, the connection is dispatched to the HTTP handler instead of the SMP handler. + +## startTCPServer — address resolution + +`startTCPServer` resolves the listen address and selects `AF_INET6` first, falling back to `AF_INET`: `select as = fromJust $ family AF_INET6 <|> family AF_INET`. + +## Client certificate validation for services + +`paramsAskClientCert` enables TLS client certificate requests. In `validateClientCertificate`, an empty chain (`CCEmpty`) returns no error — client certificates are optional, as noted by the code comment: "client certificates are only used for services." diff --git a/spec/modules/Simplex/Messaging/Transport/Shared.md b/spec/modules/Simplex/Messaging/Transport/Shared.md new file mode 100644 index 000000000..8248c068b --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/Shared.md @@ -0,0 +1,28 @@ +# Simplex.Messaging.Transport.Shared + +> Certificate chain parsing and X.509 validation utilities shared between client and server. + +**Source**: [`Transport/Shared.hs`](../../../../../src/Simplex/Messaging/Transport/Shared.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md` — Router certificate](../../../../protocol/simplex-messaging.md#router-certificate) — certificate chain lengths and semantics. + +## chainIdCaCerts — certificate chain semantics + +`chainIdCaCerts` classifies TLS certificate chains (which are ordered leaf-first) by length: + +| Length | Constructor | Code comment | +|--------|------------|--------------| +| 0 | `CCEmpty` | (no chain) | +| 1 | `CCSelf cert` | (self-signed) | +| 2 | `CCValid {leafCert, idCert=cert, caCert=cert}` | "current long-term online/offline certificates chain" | +| 3 | `CCValid {leafCert, idCert, caCert}` | "with additional operator certificate (preset in the client)" | +| 4 | `CCValid {leafCert, idCert, _, caCert}` | "with network certificate" | +| 5+ | `CCLong` | (rejected) | + +The protocol spec defines supported chain lengths of 2, 3, and 4 certificates (see [Router certificate](../../../../protocol/simplex-messaging.md#router-certificate)). In all `CCValid` cases, `idCert` is the certificate whose fingerprint is compared against the server address key hash, and `caCert` is used as the X.509 trust anchor. + +In the 4-cert case, index 2 is skipped (`_`) — it is present in the chain but not used as either the identity or the trust anchor. + +## x509validate — FQHN check disabled + +`x509validate` sets `checkFQHN = False`. The protocol spec identifies servers by certificate fingerprint (key hash in the server address), not by domain name. The validation uses a fresh `ValidationCache` (`ValidationCacheUnknown` for all lookups, no-op store) — each connection validates independently. diff --git a/spec/modules/Simplex/Messaging/Transport/WebSockets.md b/spec/modules/Simplex/Messaging/Transport/WebSockets.md new file mode 100644 index 000000000..7f1b3e673 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Transport/WebSockets.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Transport.WebSockets + +> WebSocket transport implementation over TLS, with strict message framing. + +**Source**: [`Transport/WebSockets.hs`](../../../../../src/Simplex/Messaging/Transport/WebSockets.hs) + +## cGet — strict size check (unlike TLS) + +`cGet` throws `TEBadBlock` if the received WebSocket message length doesn't equal `n`. This differs from the TLS `cGet` which uses `getBuffered` to accumulate partial reads. + +## WebSocket options + +- `connectionCompressionOptions = NoCompression` +- `connectionFramePayloadSizeLimit = SizeLimit $ fromIntegral smpBlockSize` (16384) +- `connectionMessageDataSizeLimit = SizeLimit 65536` diff --git a/src/Simplex/FileTransfer/Transport.hs b/src/Simplex/FileTransfer/Transport.hs index d55b25148..11b504a4c 100644 --- a/src/Simplex/FileTransfer/Transport.hs +++ b/src/Simplex/FileTransfer/Transport.hs @@ -103,6 +103,7 @@ currentXFTPVersion = VersionXFTP 3 supportedFileServerVRange :: VersionRangeXFTP supportedFileServerVRange = mkVersionRange initialXFTPVersion currentXFTPVersion +-- spec: spec/modules/Simplex/FileTransfer/Transport.md#xftpclienthandshakestub--xftp-doesnt-use-tls-handshake -- XFTP protocol does not use this handshake method xftpClientHandshakeStub :: c 'TClient -> Maybe C.KeyPairX25519 -> C.KeyHash -> VersionRangeXFTP -> Bool -> Maybe (ServiceCredentials, C.KeyPairEd25519) -> ExceptT TransportError IO (THandle XFTPVersion c 'TClient) xftpClientHandshakeStub _c _ks _keyHash _xftpVRange _proxyServer _serviceKeys = throwE TEVersion @@ -190,6 +191,7 @@ receiveEncFile getBody = receiveFile_ . receive data ReceiveFileError = RFESize | RFECrypto +-- spec: spec/modules/Simplex/FileTransfer/Transport.md#receivesbfile--constant-time-auth-tag-verification receiveSbFile :: (Int -> IO ByteString) -> Handle -> LC.SbState -> Word32 -> IO (Either ReceiveFileError ()) receiveSbFile getBody h = receive where diff --git a/src/Simplex/Messaging/Transport.hs b/src/Simplex/Messaging/Transport.hs index f1eb1a8bd..fde483177 100644 --- a/src/Simplex/Messaging/Transport.hs +++ b/src/Simplex/Messaging/Transport.hs @@ -240,6 +240,7 @@ currentServerSMPRelayVersion = VersionSMP 19 -- Max SMP protocol version to be used in e2e encrypted -- connection between client and server, as defined by SMP proxy. +-- spec: spec/modules/Simplex/Messaging/Transport.md#proxiedsmprelayversion--anti-fingerprinting-cap -- SMP proxy sets it to lower than its current version -- to prevent client version fingerprinting by the -- destination relays when clients upgrade at different times. @@ -376,6 +377,7 @@ getTLS cfg tlsCertSent tlsPeerCert cxt = withTlsUnique @TLS @p cxt newTLS tlsALPN <- T.getNegotiatedProtocol cxt pure TLS {tlsContext = cxt, tlsALPN, tlsTransportConfig = cfg, tlsCertSent, tlsPeerCert, tlsUniq, tlsBuffer} +-- spec: spec/modules/Simplex/Messaging/Transport.md#withtlsunique--different-api-calls-yield-same-value withTlsUnique :: forall c p. TransportPeerI p => T.Context -> (ByteString -> IO (c p)) -> IO (c p) withTlsUnique cxt f = cxtFinished cxt @@ -722,6 +724,7 @@ instance Encoding TransportError where TENoServerAuth -> "NO_AUTH" TEHandshake e -> "HANDSHAKE " <> bshow e +-- spec: spec/modules/Simplex/Messaging/Transport.md#tputblock--tgetblock--optional-block-encryption -- | Pad and send block to SMP transport. tPutBlock :: Transport c => THandle v c p -> ByteString -> IO (Either TransportError ()) tPutBlock THandle {connection = c, params = THandleParams {blockSize, encryptBlock}} block = do @@ -797,6 +800,7 @@ smpClientHandshake :: forall c. Transport c => c 'TClient -> Maybe C.KeyPairX255 smpClientHandshake c ks_ keyHash@(C.KeyHash kh) vRange proxyServer serviceKeys_ = do SMPServerHandshake {sessionId = sessId, smpVersionRange, authPubKey} <- getHandshake th when (sessionId /= sessId) $ throwE TEBadSession + -- spec: spec/modules/Simplex/Messaging/Transport.md#proxy-version-downgrade-logic -- Below logic downgrades version range in case the "client" is SMP proxy server and it is -- connected to the destination server of the version 11 or older. -- It disables transport encryption between SMP proxy and destination relay. @@ -857,6 +861,7 @@ smpTHandleClient :: forall c. THandleSMP c 'TClient -> VersionSMP -> VersionRang smpTHandleClient th v vr pk_ ck_ proxyServer clientService = do let thAuth = clientTHParams <$!> ck_ be <- blockEncryption th v proxyServer thAuth + -- spec: spec/modules/Simplex/Messaging/Transport.md#smpthandleclient--chain-key-swap -- swap is needed to use client's sndKey as server's rcvKey and vice versa pure $ smpTHandle_ th v vr thAuth $ uncurry TSbChainKeys . swap <$> be where @@ -893,6 +898,7 @@ smpTHandle_ th@THandle {params} v vr thAuth encryptBlock = } in (th :: THandleSMP c p) {params = params'} +-- spec: spec/modules/Simplex/Messaging/Transport.md#forcecertchain--space-leak-prevention forceCertChain :: CertChainPubKey -> CertChainPubKey forceCertChain cert@(CertChainPubKey (X.CertificateChain cc) signedKey) = length (show cc) `seq` show signedKey `seq` cert {-# INLINE forceCertChain #-} diff --git a/src/Simplex/Messaging/Transport/Credentials.hs b/src/Simplex/Messaging/Transport/Credentials.hs index 8e3efe795..26bcadf7a 100644 --- a/src/Simplex/Messaging/Transport/Credentials.hs +++ b/src/Simplex/Messaging/Transport/Credentials.hs @@ -40,6 +40,7 @@ privateToTls (C.APrivateSignKey _ k) = case k of type Credentials = (C.ASignatureKeyPair, X509.SignedCertificate) +-- spec: spec/modules/Simplex/Messaging/Transport/Credentials.md#gencredentials--nanosecond-stripping genCredentials :: TVar ChaChaDRG -> Maybe Credentials -> (Hours, Hours) -> Text -> IO Credentials genCredentials g parent (before, after) subjectName = do subjectKeys <- atomically $ C.generateSignatureKeyPair C.SEd25519 g diff --git a/src/Simplex/Messaging/Transport/HTTP2/Client.hs b/src/Simplex/Messaging/Transport/HTTP2/Client.hs index ca0714225..d3402130b 100644 --- a/src/Simplex/Messaging/Transport/HTTP2/Client.hs +++ b/src/Simplex/Messaging/Transport/HTTP2/Client.hs @@ -193,6 +193,7 @@ sendRequest HTTP2Client {client_ = HClient {config, reqQ}} req reqTimeout_ = do let reqTimeout = http2RequestTimeout config reqTimeout_ maybe (Left HCResponseTimeout) Right <$> (reqTimeout `timeout` atomically (takeTMVar resp)) +-- spec: spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md#sendrequest-vs-sendrequestdirect--thread-safety -- | this function should not be used until HTTP2 is thread safe, use sendRequest sendRequestDirect :: HTTP2Client -> Request -> Maybe Int -> IO (Either HTTP2ClientError HTTP2Response) sendRequestDirect HTTP2Client {client_ = HClient {config, disconnected}, sendReq} req reqTimeout_ = do diff --git a/src/Simplex/Messaging/Transport/Server.hs b/src/Simplex/Messaging/Transport/Server.hs index cdfc300b7..63e471855 100644 --- a/src/Simplex/Messaging/Transport/Server.hs +++ b/src/Simplex/Messaging/Transport/Server.hs @@ -183,6 +183,7 @@ runTCPServerSocket (accepted, gracefullyClosed, clients) started getSocket serve tId <- mkWeakThreadId =<< server conn `forkFinally` closeConn atomically $ unlessM (readTVar closed) $ modifyTVar' clients $ IM.insert cId tId +-- spec: spec/modules/Simplex/Messaging/Transport/Server.md#safeaccept--errno-based-retry -- | Recover from errors in `accept` whenever it is safe. -- Some errors are safe to ignore, while blindly restaring `accept` may trigger a busy loop. -- @@ -224,6 +225,7 @@ getSocketStats (accepted, closed, active) = do let socketsLeaked = socketsAccepted - socketsClosed - socketsActive pure SocketStats {socketsAccepted, socketsClosed, socketsActive, socketsLeaked} +-- spec: spec/modules/Simplex/Messaging/Transport/Server.md#closeserver--weak-thread-references closeServer :: TMVar Bool -> TVar (IntMap (Weak ThreadId)) -> Socket -> IO () closeServer started clients sock = do close sock diff --git a/src/Simplex/Messaging/Transport/Shared.hs b/src/Simplex/Messaging/Transport/Shared.hs index 204ef3f5e..e7f450f49 100644 --- a/src/Simplex/Messaging/Transport/Shared.hs +++ b/src/Simplex/Messaging/Transport/Shared.hs @@ -24,6 +24,7 @@ data ChainCertificates | CCValid {leafCert :: X.SignedCertificate, idCert :: X.SignedCertificate, caCert :: X.SignedCertificate} | CCLong +-- spec: spec/modules/Simplex/Messaging/Transport/Shared.md#chainidcacerts--certificate-chain-semantics chainIdCaCerts :: X.CertificateChain -> ChainCertificates chainIdCaCerts (X.CertificateChain chain) = case chain of [] -> CCEmpty diff --git a/src/Simplex/Messaging/Transport/WebSockets.hs b/src/Simplex/Messaging/Transport/WebSockets.hs index 3ab213dcd..38ac6627d 100644 --- a/src/Simplex/Messaging/Transport/WebSockets.hs +++ b/src/Simplex/Messaging/Transport/WebSockets.hs @@ -69,6 +69,7 @@ instance Transport WS where closeConnection = S.close . wsStream {-# INLINE closeConnection #-} + -- spec: spec/modules/Simplex/Messaging/Transport/WebSockets.md#cget--strict-size-check-unlike-tls cGet :: WS p -> Int -> IO ByteString cGet c n = do s <- receiveData (wsConnection c) From 09d55de115ac2a39903f02b13f66476fccf031fd Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 20:17:00 +0000 Subject: [PATCH 13/61] protocol and client specs --- spec/TOPICS.md | 6 ++ spec/modules/README.md | 2 + spec/modules/Simplex/Messaging/Client.md | 85 +++++++++++++++++ .../modules/Simplex/Messaging/Client/Agent.md | 92 +++++++++++++++++++ spec/modules/Simplex/Messaging/Protocol.md | 68 ++++++++++++++ .../Simplex/Messaging/Protocol/Types.md | 7 ++ src/Simplex/Messaging/Client.hs | 5 + src/Simplex/Messaging/Client/Agent.hs | 2 + src/Simplex/Messaging/Protocol.hs | 3 + 9 files changed, 270 insertions(+) create mode 100644 spec/modules/Simplex/Messaging/Client.md create mode 100644 spec/modules/Simplex/Messaging/Client/Agent.md create mode 100644 spec/modules/Simplex/Messaging/Protocol.md create mode 100644 spec/modules/Simplex/Messaging/Protocol/Types.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index d7a627254..977107652 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -12,4 +12,10 @@ - **Certificate chain trust model**: ChainCertificates (Shared.hs) defines 0–4 cert chain semantics, used by both Client.hs (validateCertificateChain) and Server.hs (validateClientCertificate, SNI credential switching). The 4-length case skipping index 2 (operator cert) and the FQHN-disabled x509validate are decisions that span the entire transport security model. +- **SMP proxy protocol flow**: The PRXY/PFWD/RFWD proxy protocol involves Client.hs (proxySMPCommand with 10 error scenarios, forwardSMPTransmission with sessionSecret encryption), Protocol.hs (command types, version-dependent encoding), Transport.hs (proxiedSMPRelayVersion cap, proxyServer flag disabling block encryption). The double encryption (client-relay via PFWD + proxy-relay via RFWD), combined timeout (tcpConnect + tcpTimeout), nonce/reverseNonce pairing, and version downgrade logic are not visible from any single module. + +- **Service certificate subscription model**: Service subscriptions (SUBS/NSUBS) and per-queue subscriptions (SUB/NSUB) coexist with complex state transitions. Client/Agent.hs manages dual active/pending subscription maps with session-aware cleanup. Protocol.hs defines useServiceAuth (only NEW/SUB/NSUB). Client.hs implements authTransmission with dual signing (entity key over cert hash + transmission, service key over transmission only). Transport.hs handles the service certificate handshake extension (v16+). The full subscription lifecycle — from DBService credentials through handshake to service subscription to disconnect/reconnect — spans all four modules. + +- **Two agent layers**: Client/Agent.hs ("small agent") is used only in servers — SMP proxy and notification server — to manage client connections to other SMP servers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences. + - **Handshake protocol family**: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols. diff --git a/spec/modules/README.md b/spec/modules/README.md index ef2b45881..7b7666aad 100644 --- a/spec/modules/README.md +++ b/spec/modules/README.md @@ -74,6 +74,8 @@ Do NOT document: - **Function-by-function prose that restates the implementation** — "this function takes X and returns Y by doing Z" adds nothing - **Line numbers** — they're brittle and break on every edit - **Comments that fit in one line in source** — put those in the source file instead as `-- spec:` comments +- **Verbatim quotes of source comments** — reference them instead: "See comment on `functionName`." Then add only what the comment doesn't cover (cross-module implications, what breaks if violated). If the source comment says everything, the function doesn't need a doc entry. +- **Tables that reproduce code structure** — if the information is self-evident from reading the code's pattern matching or type definitions, it doesn't belong in the doc (e.g., per-command credential requirements, version-conditional encoding branches) ## Format diff --git a/spec/modules/Simplex/Messaging/Client.md b/spec/modules/Simplex/Messaging/Client.md new file mode 100644 index 000000000..a4f7be352 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Client.md @@ -0,0 +1,85 @@ +# Simplex.Messaging.Client + +> Generic protocol client: connection management, command sending/receiving, batching, proxy protocol, reconnection. + +**Source**: [`Client.hs`](../../../../src/Simplex/Messaging/Client.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md`](../../../../protocol/simplex-messaging.md) — SimpleX Messaging Protocol. + +## Overview + +This module implements the client side of the `Protocol` typeclass — connecting to servers, sending commands, receiving responses, and managing connection lifecycle. It is generic over `Protocol v err msg`, instantiated for SMP as `SMPClient` (= `ProtocolClient SMPVersion ErrorType BrokerMsg`). The SMP proxy protocol (PRXY/PFWD/RFWD) is also implemented here. + +## Four concurrent threads — teardown semantics + +`getProtocolClient` launches four threads via `raceAny_`: +- `send`: reads from `sndQ` (TBQueue) and writes to TLS +- `receive`: reads from TLS and writes to `rcvQ` (TBQueue), updates `lastReceived` +- `process`: reads from `rcvQ` and dispatches to response vars or `msgQ` +- `monitor`: periodic ping loop (only when `smpPingInterval > 0`) + +When ANY thread exits (normally or exceptionally), `raceAny_` cancels all others. `E.finally` ensures the `disconnected` callback always fires. Implication: a single stuck thread (e.g., TLS read blocked on a half-open connection) keeps the entire client alive until `monitor` drops it. There is no per-thread health check — liveness depends entirely on the monitor's timeout logic. + +## Request lifecycle and leak risk + +`mkRequest` inserts a `Request` into `sentCommands` TMap BEFORE the transmission is written to TLS. If the TLS write fails silently or the connection drops before the response, the entry remains in `sentCommands` until the monitor's timeout counter exceeds `maxCnt` and drops the entire client. There is no per-request cleanup on send failure — individual request entries are only removed by `processMsg` (on response) or by `getResponse` timeout (which sets `pending = False` but doesn't remove the entry). + +## getResponse — pending flag race contract + +This is the core concurrency contract between timeout and response processing: + +1. `getResponse` waits with `timeout` for `takeTMVar responseVar` +2. Regardless of result, atomically sets `pending = False` and tries `tryTakeTMVar` again (see comment on `getResponse`) +3. In `processMsg`, when a response arrives for a request where `pending` is already `False` (timeout won), `wasPending` is `False` and the response is forwarded to `msgQ` as `STResponse` rather than discarded + +The double-check pattern (`swapTVar pending False` + `tryTakeTMVar`) handles the race window where a response arrives between timeout firing and `pending` being set to `False`. Without this, responses arriving in that gap would be silently lost. + +`timeoutErrorCount` is reset to 0 in three places: in `getResponse` when a response arrives, in `receive` on every TLS read, and the monitor uses this count to decide when to drop the connection. + +## processMsg — server events vs expired responses + +When `corrId` is empty, the message is an `STEvent` (server-initiated). When non-empty and the request was already expired (`wasPending` is `False`), the response becomes `STResponse` — not discarded, but forwarded to `msgQ` with the original command context. Entity ID mismatch is `STUnexpectedError`. + +## nonBlockingWriteTBQueue — fork on full + +If `tryWriteTBQueue` returns `False`, a new thread is forked for the blocking write. No backpressure mechanism — under sustained overload, thread count grows without bound. This is a deliberate tradeoff: the caller never blocks (preventing deadlock between send and process threads), at the cost of potential unbounded thread creation. + +## Batch commands do not expire + +See comment on `sendBatch`. Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands use `Just r` and the send thread checks `pending` after dequeue. The coupling: if the server stops responding, batched commands can block the send queue indefinitely since they have no timeout-based expiry. + +## monitor — quasi-periodic adaptive ping + +The ping loop sleeps for `smpPingInterval`, then checks elapsed time since `lastReceived`. If significant time remains in the interval (> 1 second), it re-sleeps for just the remaining time rather than sending a ping. This means ping frequency adapts to actual receive activity — frequent receives suppress pings. + +Pings are only sent when `sendPings` is `True`, set by `enablePings` (called from `subscribeSMPQueue`, `subscribeSMPQueues`, `subscribeSMPQueueNotifications`, `subscribeSMPQueuesNtfs`, `subscribeService`). The client drops the connection when `maxCnt` commands have timed out in sequence AND at least `recoverWindow` (15 minutes) has passed since the last received response. + +## clientCorrId — dual-purpose random values + +`clientCorrId` is a `TVar ChaChaDRG` generating random `CbNonce` values that serve as both correlation IDs and nonces for proxy encryption. When a nonce is explicitly passed (e.g., by `createSMPQueue`), it is used instead of generating a random one. + +## Proxy command re-parameterization + +`proxySMPCommand` constructs modified `thParams` per-request — setting `sessionId`, `peerServerPubKey`, and `thVersion` to the proxy-relay connection's parameters rather than the client-proxy connection's. A single `SMPClient` connection to the proxy carries commands with different auth parameters per destination relay. The encoding, signing, and encryption all use these per-request params, not the connection's original params. + +## proxySMPCommand — error classification + +See comment above `proxySMPCommand` for the 9 error scenarios (0-9) mapping each combination of success/error at client-proxy and proxy-relay boundaries. Errors from the destination relay wrapped in `PRES` are thrown as `ExceptT` errors (transparent proxy). Errors from the proxy itself are returned as `Left ProxyClientError`. + +## forwardSMPTransmission — proxy-side forwarding + +Used by the proxy server to forward `RFWD` to the destination relay. Uses `cbEncryptNoPad`/`cbDecryptNoPad` (no padding) with the session secret from the proxy-relay connection. Response nonce is `reverseNonce` of the request nonce. + +## authTransmission — dual auth with service signature + +When `useServiceAuth` is `True` and a service certificate is present, the entity key signs over `serviceCertHash <> transmission` (not just the transmission) — see comment on `authTransmission`. The service key only signs the transmission itself. For X25519 keys, `cbAuthenticate` produces a `TAAuthenticator`; for Ed25519/Ed448, `C.sign'` produces a `TASignature`. + +The service signature is only added when the entity authenticator is non-empty. If authenticator generation fails silently (returns empty bytes), service signing is silently skipped. This mirrors the [state-dependent parser contract](./Protocol.md#service-signature--state-dependent-parser-contract) in Protocol.hs. + +## action — weak thread reference + +`action` stores a `Weak ThreadId` (via `mkWeakThreadId`) to the main client thread. `closeProtocolClient` dereferences and kills it. The weak reference allows the thread to be garbage collected if all other references are dropped. + +## writeSMPMessage — server-side event injection + +`writeSMPMessage` writes directly to `msgQ` as `STEvent`, bypassing the entire command/response pipeline. This is used by the server to inject MSG events into the subscription response path. diff --git a/spec/modules/Simplex/Messaging/Client/Agent.md b/spec/modules/Simplex/Messaging/Client/Agent.md new file mode 100644 index 000000000..96e6ff84b --- /dev/null +++ b/spec/modules/Simplex/Messaging/Client/Agent.md @@ -0,0 +1,92 @@ +# Simplex.Messaging.Client.Agent + +> SMP client connections with subscription management, reconnection, and service certificate support. + +**Source**: [`Client/Agent.hs`](../../../../../src/Simplex/Messaging/Client/Agent.hs) + +## Overview + +This is the "small agent" — used only in servers (SMP proxy, notification server) to manage client connections to other SMP servers. The "big agent" in `Simplex.Messaging.Agent` + `Simplex.Messaging.Agent.Client` serves client applications and adds the full messaging agent layer. See [Two agent layers](../../../../TOPICS.md) topic. + +`SMPClientAgent` manages `SMPClient` connections via `smpClients :: TMap SMPServer SMPClientVar` (one per SMP server), tracks active and pending subscriptions, and handles automatic reconnection. It is parameterized by `Party` (`p`) and uses the `ServiceParty` constraint to support both `RecipientService` and `NotifierService` modes. + +## Dual subscription model + +Four TMap fields track subscriptions in two dimensions: + +| | Active | Pending | +|---|---|---| +| **Service** | `activeServiceSubs` (TMap SMPServer (TVar (Maybe (ServiceSub, SessionId)))) | `pendingServiceSubs` (TMap SMPServer (TVar (Maybe ServiceSub))) | +| **Queue** | `activeQueueSubs` (TMap SMPServer (TMap QueueId (SessionId, C.APrivateAuthKey))) | `pendingQueueSubs` (TMap SMPServer (TMap QueueId C.APrivateAuthKey)) | + +See comments on `activeServiceSubs` and `pendingServiceSubs` for the coexistence rules. Key constraint: only one service subscription per server. Active subs store the `SessionId` that established them. + +## SessionVar compare-and-swap — core concurrency safety + +`removeSessVar` (in Session.hs) uses `sessionVarId` (monotonically increasing counter from `sessSeq`) to prevent stale removal. When a disconnected client's cleanup runs after a new client already replaced the map entry, the ID mismatch causes removal to silently no-op. See comment on `removeSessVar`. This is used throughout: `removeClientAndSubs` for client map, `cleanup` for worker map. + +## removeClientAndSubs — outside-STM lookup optimization + +See comment on `removeClientAndSubs`. Subscription TVar references are obtained outside STM (via `TM.lookupIO`), then modified inside `atomically`. This is safe because the invariant is that subscription TVar entries for a server are never deleted from the outer TMap, only their contents change. Moving lookups inside the STM transaction would cause excessive re-evaluation under contention. + +## Disconnect preserves others' subscriptions + +`updateServiceSub` only moves active→pending when `sessId` matches the disconnected client (see its comment). If a new client already established different subscriptions on the same server, those are preserved. Queue subs use `M.partition` to split by SessionId — only matching subs move to pending, non-matching remain active. + +## Pending never reset to Nothing on disconnect + +See comment on `updateServiceSub`. After clearing an active service sub, the code sets pending to the cleared value but does NOT reset pending to `Nothing`. This avoids the race where a concurrent new client session has already set a different pending subscription. Implication: pending subs can only grow (be set) during disconnect, never shrink (be cleared). + +## persistErrorInterval — delayed error cleanup + +When `connectClient` calls `newSMPClient` and it fails, the error is stored with an expiry timestamp. `waitForSMPClient` checks expiry before retrying. When `persistErrorInterval` is 0, the error is stored without timestamp and the SessionVar is immediately removed from the map. + +## Session validation after subscription RPC + +Both `smpSubscribeQueues` and `smpSubscribeService` validate `activeClientSession` AFTER the subscription RPC completes, before committing results to state. If the session changed during the RPC (client reconnected), results are discarded and reconnection is triggered. This is optimistic execution with post-hoc validation — the RPC may succeed but its results are thrown away if the session is stale. + +## groupSub — subscription response classification + +Each queue response is classified by a `foldr` over the (subs, responses) zip: + +- **Success with matching serviceId**: counted as service-subscribed (`sQs` list) +- **Success without matching serviceId**: counted as queue-only (`qOks` list with SessionId and key) +- **Not in pending map**: silently skipped (handles concurrent activation by another path) +- **Temporary error** (network, timeout): sets the `tempErrs` flag but does NOT remove from pending — queue stays pending for retry on reconnect +- **Permanent error**: removes from pending and added to `finalErrs` — terminal, no automatic retry + +Even if multiple temporary errors occur in a batch, only one `reconnectClient` call is made (via the boolean accumulator flag). + +## updateActiveServiceSub — accumulative merge + +When serviceId and sessionId match the existing active subscription, queue count is added (`n + n'`) and IdsHash is XOR-merged (`idsHash <> idsHash'`). This accumulates across multiple subscription batches for the same service. When they don't match, the subscription is replaced entirely (silently drops old data). + +## CAServiceUnavailable — cascade to queue resubscription + +When `smpSubscribeService` detects service ID or role mismatch with the connection, it fires `CAServiceUnavailable`. See comment on `CAServiceUnavailable` for the full implication: the app must resubscribe all queues individually, creating new associations. This can happen if the SMP server reassigns service IDs (e.g., after downgrade and upgrade). + +## getPending — polymorphic over STM/IO + +`getPending` uses rank-2 polymorphism to work in both STM (for the "should we spawn a worker?" check, providing a consistent snapshot) and IO (for the actual reconnection data read, providing fresh data). Between these two calls, new pending subs could be added — the worker loop handles this by re-checking on each iteration. + +## Reconnect worker lifecycle + +### Spawn decision +`reconnectClient` checks `active` outside STM, then atomically checks for pending subs and gets/creates a worker SessionVar. If no pending subs exist, no worker is spawned — this prevents race with cleanup and adding pending queues in another call. + +### Worker cleanup blocks on TMVar fill +See comment on `cleanup`. The STM `retry` loop waits until the async handle is inserted into the TMVar before removing the worker from the map. Without this, cleanup could race ahead of the `putTMVar` in `newSubWorker`, leaving a terminated worker in the map. + +### Double timeout on reconnection +`runSubWorker` wraps the entire reconnection in `System.Timeout.timeout` using `tcpConnectTimeout` in addition to the network-layer timeout. Two layers — network for the connection attempt, outer for the entire operation including subscription. + +### Reconnect filters already-active queues +During reconnection, `reconnectSMPClient` reads current active queue subs (outside STM, same "vars never removed" invariant) and filters them out before resubscribing. Subscription is chunked by `agentSubsBatchSize` — partial success is possible across chunks. + +## Agent shutdown ordering + +`closeSMPClientAgent` executes in order: set `active = False`, close all client connections, then swap workers map to empty and fork cancellation threads. The cancel threads use `uninterruptibleCancel` but are fire-and-forget — `closeSMPClientAgent` may return before all workers are actually cancelled. + +## addSubs_ — left-biased union + +`addSubs_` uses `TM.union` which delegates to `M.union` (left-biased). If a queue subscription already exists, the new auth key from the incoming map wins. Service subs use `writeTVar` (overwrite) since only one service sub exists per server. diff --git a/spec/modules/Simplex/Messaging/Protocol.md b/spec/modules/Simplex/Messaging/Protocol.md new file mode 100644 index 000000000..dc1328cdf --- /dev/null +++ b/spec/modules/Simplex/Messaging/Protocol.md @@ -0,0 +1,68 @@ +# Simplex.Messaging.Protocol + +> SMP protocol types, commands, responses, encoding/decoding, and transport functions. + +**Source**: [`Protocol.hs`](../../../../src/Simplex/Messaging/Protocol.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md`](../../../../protocol/simplex-messaging.md) — SimpleX Messaging Protocol. + +## Overview + +This module defines the SMP protocol's type-level structure, wire encoding, and transport batching. It does not implement the server or client — those are in [Server.hs](./Server.md) and [Client.hs](./Client.md). The protocol spec governs the command semantics; this doc focuses on non-obvious implementation choices. + +## Two separate version scopes + +SMP client protocol version (`SMPClientVersion`, 4 versions) is separate from SMP relay protocol version (`SMPVersion`, up to version 19, defined in [Transport.hs](./Transport.md)). The client version governs client-to-client concerns (binary encoding, multi-host addresses, SKEY command, short links). The relay version governs client-to-server wire format, transport encryption, and command availability. See comment above `SMPClientVersion` data declaration for version history. + +## maxMessageLength — version-dependent + +`maxMessageLength` returns three different sizes depending on the relay version: +- v11+ (`encryptedBlockSMPVersion`): 16048 +- v9+ (`sendingProxySMPVersion`): 16064 +- older: 16088 + +The source has `TODO v6.0 remove dependency on version`. The type-level `MaxMessageLen` is fixed at 16088 with `TODO v7.0 change to 16048`. + +## Type-level party system + +10 `Party` constructors with `SParty` singletons, `PartyI` typeclass, and three constraint type families (`QueueParty`, `BatchParty`, `ServiceParty`). Invalid party usage produces compile-time errors via the `(Int ~ Bool, TypeError ...)` trick — the unsatisfiable `Int ~ Bool` constraint forces GHC to emit the `TypeError` message. + +## IdsHash — reversible XOR for state drift monitoring + +`IdsHash` uses `BS.zipWith xor` as its `Semigroup`. `queueIdHash` computes MD5 of the queue ID (16 bytes). `mempty` is 16 zero bytes. See comment on `subtractServiceSubs` for the reversibility property. `mconcat` is optimized to avoid repeated pack/unpack per step. + +## TransmissionAuth — size-based type discrimination + +`decodeTAuthBytes` distinguishes authenticator from signature by checking `B.length s == C.cbAuthenticatorSize`. This is a trap: if `cbAuthenticatorSize` ever coincides with a valid signature encoding size, the discrimination breaks. See comment on `tEncodeAuth` for the backward compatibility note (the encoding is backwards compatible with v6 that used `Maybe C.ASignature`). + +## Service signature — state-dependent parser contract + +In `transmissionP`, the service signature is only parsed when `serviceAuth` is true AND the authenticator is non-empty (`not (B.null authenticator)`). This means the parser's behavior depends on earlier parsed state — the service signature field is conditionally present on the wire. If a future change makes the authenticator always non-empty (or always empty), it silently changes whether service signatures are parsed. + +## transmissionP / implySessId + +When `implySessId` is `True`, the session ID is not transmitted on the wire — `transmissionP` sets `sessId` to `""` and prepends the local `sessionId` to the `authorized` bytes for verification. In `tDecodeServer`/`tDecodeClient`, session ID check is bypassed when `implySessId` is `True`. + +## batchTransmissions_ — constraints and ordering + +See comment for the 19-byte overhead calculation (pad size + transmission count + auth tag). Maximum 255 transmissions per batch (single-byte count). Uses `foldr` with `(:)` accumulation, which preserves original transmission order within each batch. + +## ClientMsgEnvelope — two-layer message format + +`ClientMsgEnvelope` has a `PubHeader` (client protocol version + optional X25519 DH key) and an encrypted body. The decrypted body is a `ClientMessage` containing a `PrivHeader` with prefix-based type discrimination: `"K"` for `PHConfirmation` (includes public auth key), `"_"` for `PHEmpty`. + +## MsgFlags — forward-compatible parsing + +The `MsgFlags` parser consumes the `notification` Bool then calls `A.takeTill (== ' ')` to swallow any remaining flag data. See comment on `MsgFlags` encoding for the 7-byte size constraint. Future flags added after `notification` are silently consumed and discarded by old clients. + +## BrokerErrorType NETWORK — detail loss + +The `NETWORK` variant of `BrokerErrorType` encodes as just `"NETWORK"` (detail dropped), with `TODO once all upgrade` comment. The parser falls back to `NEFailedError` when the `NetworkError` detail can't be parsed (`_smpP <|> pure NEFailedError`). This means a newer server's detailed network error is seen as `NEFailedError` by older clients. + +## Version-dependent encoding — scope + +`encodeProtocol` for both `Command` and `BrokerMsg` uses extensive version-conditional encoding. `NEW` has four encoding paths, `IDS` has five. All encoding paths for `IDS` must maintain the same field ordering — this is an implicit contract between encoder and decoder with no compile-time enforcement. + +## SUBS/NSUBS — asymmetric defaulting + +When the server parses `SUBS`/`NSUBS` from a client using a version older than `rcvServiceSMPVersion`, both count and hash default (`-1` and `mempty`). For the response side (`SOKS`/`ENDS` via `serviceRespP`), count is still parsed from the wire — only hash defaults to `mempty`. This asymmetry means command-side and response-side parsing have different fallback behavior for the same version boundary. diff --git a/spec/modules/Simplex/Messaging/Protocol/Types.md b/spec/modules/Simplex/Messaging/Protocol/Types.md new file mode 100644 index 000000000..0797bc185 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Protocol/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Protocol.Types + +> Client notice type with optional TTL, used in BLOCKED error responses. + +**Source**: [`Protocol/Types.hs`](../../../../../src/Simplex/Messaging/Protocol/Types.hs) + +No non-obvious behavior. See source. diff --git a/src/Simplex/Messaging/Client.hs b/src/Simplex/Messaging/Client.hs index 67b31de18..0f3b16813 100644 --- a/src/Simplex/Messaging/Client.hs +++ b/src/Simplex/Messaging/Client.hs @@ -641,6 +641,7 @@ getProtocolClient g nm transportSession@(_, srv, _) cfg@ProtocolClientConfig {qS atomically $ do writeTVar (connected c) True putTMVar cVar $ Right c' + -- spec: spec/modules/Simplex/Messaging/Client.md#four-concurrent-threads--teardown-semantics raceAny_ ([send c' th, process c', receive c' th] <> [monitor c' | smpPingInterval > 0]) `E.finally` disconnected c' @@ -689,6 +690,7 @@ getProtocolClient g nm transportSession@(_, srv, _) cfg@ProtocolClientConfig {qS forM_ msgQ $ \q -> mapM_ (atomically . writeTBQueue q . serverTransmission c) (L.nonEmpty ts') + -- spec: spec/modules/Simplex/Messaging/Client.md#processmsg--server-events-vs-expired-responses processMsg :: ProtocolClient v err msg -> Transmission (Either err msg) -> IO (Maybe (EntityId, ServerTransmission err msg)) processMsg ProtocolClient {client_ = PClient {sentCommands}} (corrId, entId, respOrErr) | B.null $ bs corrId = sendMsg $ STEvent clientResp @@ -1338,11 +1340,13 @@ sendProtocolCommand_ c@ProtocolClient {client_ = PClient {sndQ}, thParams = THan | batch = tEncodeBatch1 serviceAuth t | otherwise = tEncode serviceAuth t +-- spec: spec/modules/Simplex/Messaging/Client.md#nonblockingwritetbqueue--fork-on-full nonBlockingWriteTBQueue :: TBQueue a -> a -> IO () nonBlockingWriteTBQueue q x = do sent <- atomically $ tryWriteTBQueue q x unless sent $ void $ forkIO $ atomically $ writeTBQueue q x +-- spec: spec/modules/Simplex/Messaging/Client.md#getresponse--pending-flag-race-contract getResponse :: ProtocolClient v err msg -> NetworkRequestMode -> Maybe Int -> Request err msg -> IO (Response err msg) getResponse ProtocolClient {client_ = PClient {tcpTimeout, timeoutErrorCount}} nm tOut Request {entityId, pending, responseVar} = do r <- fromMaybe (netTimeoutInt tcpTimeout nm) tOut `timeout` atomically (takeTMVar responseVar) @@ -1382,6 +1386,7 @@ mkTransmission_ ProtocolClient {thParams, client_ = PClient {clientCorrId, sentC atomically $ TM.insert corrId r sentCommands pure r +-- spec: spec/modules/Simplex/Messaging/Client.md#authtransmission--dual-auth-with-service-signature authTransmission :: Maybe (THandleAuth 'TClient) -> Bool -> Maybe C.APrivateAuthKey -> C.CbNonce -> ByteString -> Either TransportError (Maybe TAuthorizations) authTransmission thAuth serviceAuth pKey_ nonce t = traverse authenticate pKey_ where diff --git a/src/Simplex/Messaging/Client/Agent.hs b/src/Simplex/Messaging/Client/Agent.hs index d302ba237..8dfd0e563 100644 --- a/src/Simplex/Messaging/Client/Agent.hs +++ b/src/Simplex/Messaging/Client/Agent.hs @@ -275,6 +275,7 @@ connectClient ca@SMPClientAgent {agentCfg, dbService, smpClients, smpSessions, m removeClientAndSubs :: SMPClient -> IO (Maybe ServiceSub, Maybe (Map QueueId C.APrivateAuthKey)) removeClientAndSubs smp = do + -- spec: spec/modules/Simplex/Messaging/Client/Agent.md#removeclientandsubs--outside-stm-lookup-optimization -- Looking up subscription vars outside of STM transaction to reduce re-evaluation. -- It is possible because these vars are never removed, they are only added. sVar_ <- TM.lookupIO srv $ activeServiceSubs ca @@ -452,6 +453,7 @@ smpSubscribeQueues ca smp srv subs = do pure acc sessId = sessionId $ thParams smp smpServiceId = smpClientServiceId smp + -- spec: spec/modules/Simplex/Messaging/Client/Agent.md#groupsub--subscription-response-classification groupSub :: Map QueueId C.APrivateAuthKey -> ((QueueId, C.APrivateAuthKey), Either SMPClientError (Maybe ServiceId)) -> diff --git a/src/Simplex/Messaging/Protocol.hs b/src/Simplex/Messaging/Protocol.hs index fa58d8843..3f9773991 100644 --- a/src/Simplex/Messaging/Protocol.hs +++ b/src/Simplex/Messaging/Protocol.hs @@ -527,6 +527,7 @@ tEncodeAuth serviceAuth = \case TASignature s -> C.signatureBytes s TAAuthenticator (C.CbAuthenticator s) -> s +-- spec: spec/modules/Simplex/Messaging/Protocol.md#transmissionauth--size-based-type-discrimination decodeTAuthBytes :: ByteString -> Maybe (C.Signature 'C.Ed25519) -> Either String (Maybe TAuthorizations) decodeTAuthBytes s serviceSig | B.null s = Right Nothing @@ -1703,6 +1704,7 @@ instance ToJSON BlockingReason where instance FromJSON BlockingReason where parseJSON = strParseJSON "BlockingReason" +-- spec: spec/modules/Simplex/Messaging/Protocol.md#transmissionp--implysessid -- | SMP transmission parser. transmissionP :: THandleParams v p -> Parser RawTransmission transmissionP THandleParams {sessionId, implySessId, serviceAuth} = do @@ -2244,6 +2246,7 @@ batchTransmissions' THandleParams {batch, blockSize = bSize, serviceAuth} ts s = tEncode serviceAuth t -- | Pack encoded transmissions into batches +-- spec: spec/modules/Simplex/Messaging/Protocol.md#batchtransmissions_--constraints-and-ordering batchTransmissions_ :: Int -> NonEmpty (Either TransportError ByteString, r) -> [TransportBatch r] batchTransmissions_ bSize = addBatch . foldr addTransmission ([], 0, 0, [], []) where From 260ffb1a9dee84ead72a547606e495f2002a74c1 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Thu, 12 Mar 2026 11:29:18 +0000 Subject: [PATCH 14/61] SMP router specs --- spec/TOPICS.md | 4 + spec/modules/Simplex/Messaging/Server.md | 106 ++++++++++++++++++ spec/modules/Simplex/Messaging/Server/CLI.md | 31 +++++ .../Simplex/Messaging/Server/Control.md | 7 ++ .../Simplex/Messaging/Server/Env/STM.md | 47 ++++++++ .../Simplex/Messaging/Server/Expiration.md | 7 ++ .../Simplex/Messaging/Server/Information.md | 7 ++ spec/modules/Simplex/Messaging/Server/Main.md | 37 ++++++ .../Simplex/Messaging/Server/Main/Init.md | 17 +++ .../Simplex/Messaging/Server/MsgStore.md | 7 ++ .../Messaging/Server/MsgStore/Journal.md | 7 ++ .../Messaging/Server/MsgStore/Postgres.md | 57 ++++++++++ .../Simplex/Messaging/Server/MsgStore/STM.md | 29 +++++ .../Messaging/Server/MsgStore/Types.md | 29 +++++ .../Simplex/Messaging/Server/NtfStore.md | 15 +++ .../Simplex/Messaging/Server/Prometheus.md | 21 ++++ .../Simplex/Messaging/Server/QueueStore.md | 7 ++ .../Messaging/Server/QueueStore/Postgres.md | 97 ++++++++++++++++ .../Messaging/Server/QueueStore/QueueInfo.md | 7 ++ .../Messaging/Server/QueueStore/STM.md | 37 ++++++ .../Messaging/Server/QueueStore/Types.md | 7 ++ .../modules/Simplex/Messaging/Server/Stats.md | 39 +++++++ .../Simplex/Messaging/Server/StoreLog.md | 36 ++++++ .../Messaging/Server/StoreLog/ReadWrite.md | 17 +++ .../Messaging/Server/StoreLog/Types.md | 7 ++ spec/modules/Simplex/Messaging/Server/Web.md | 21 ++++ src/Simplex/Messaging/Server.hs | 4 + src/Simplex/Messaging/Server/Env/STM.hs | 1 + .../Messaging/Server/MsgStore/Postgres.hs | 1 + .../Messaging/Server/MsgStore/Types.hs | 1 + .../Messaging/Server/QueueStore/Postgres.hs | 1 + .../Messaging/Server/QueueStore/STM.hs | 1 + src/Simplex/Messaging/Server/StoreLog.hs | 3 + 33 files changed, 715 insertions(+) create mode 100644 spec/modules/Simplex/Messaging/Server.md create mode 100644 spec/modules/Simplex/Messaging/Server/CLI.md create mode 100644 spec/modules/Simplex/Messaging/Server/Control.md create mode 100644 spec/modules/Simplex/Messaging/Server/Env/STM.md create mode 100644 spec/modules/Simplex/Messaging/Server/Expiration.md create mode 100644 spec/modules/Simplex/Messaging/Server/Information.md create mode 100644 spec/modules/Simplex/Messaging/Server/Main.md create mode 100644 spec/modules/Simplex/Messaging/Server/Main/Init.md create mode 100644 spec/modules/Simplex/Messaging/Server/MsgStore.md create mode 100644 spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md create mode 100644 spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md create mode 100644 spec/modules/Simplex/Messaging/Server/MsgStore/STM.md create mode 100644 spec/modules/Simplex/Messaging/Server/MsgStore/Types.md create mode 100644 spec/modules/Simplex/Messaging/Server/NtfStore.md create mode 100644 spec/modules/Simplex/Messaging/Server/Prometheus.md create mode 100644 spec/modules/Simplex/Messaging/Server/QueueStore.md create mode 100644 spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md create mode 100644 spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md create mode 100644 spec/modules/Simplex/Messaging/Server/QueueStore/STM.md create mode 100644 spec/modules/Simplex/Messaging/Server/QueueStore/Types.md create mode 100644 spec/modules/Simplex/Messaging/Server/Stats.md create mode 100644 spec/modules/Simplex/Messaging/Server/StoreLog.md create mode 100644 spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md create mode 100644 spec/modules/Simplex/Messaging/Server/StoreLog/Types.md create mode 100644 spec/modules/Simplex/Messaging/Server/Web.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index 977107652..6ef029e96 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -19,3 +19,7 @@ - **Two agent layers**: Client/Agent.hs ("small agent") is used only in servers — SMP proxy and notification server — to manage client connections to other SMP servers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences. - **Handshake protocol family**: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols. + +- **Server subscription architecture**: The SMP server's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module. + +- **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. diff --git a/spec/modules/Simplex/Messaging/Server.md b/spec/modules/Simplex/Messaging/Server.md new file mode 100644 index 000000000..0ed6e43e1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server.md @@ -0,0 +1,106 @@ +# Simplex.Messaging.Server + +> SMP server: client handling, subscription lifecycle, message delivery, proxy forwarding, control port. + +**Source**: [`Server.hs`](../../../../src/Simplex/Messaging/Server.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md`](../../../../protocol/simplex-messaging.md) — SimpleX Messaging Protocol. + +## Overview + +The server runs as `raceAny_` over many threads — any thread exit stops the entire server. The thread set includes: one `serverThread` per subscription type (SMP, NTF), a notification delivery thread, a pending events thread, a proxy agent receiver, a SIGINT handler, plus per-transport listener threads and optional expiration/stats/prometheus/control-port threads. `E.finally` ensures `stopServer` runs on any exit. + +## serverThread — subscription lifecycle with split STM + +See comment on `serverThread`. It reads the subscription request from `subQ`, then looks up the client **outside** STM (via `getServerClient`), then enters an STM transaction (`updateSubscribers`) to compute which old subscriptions to end, then runs `endPreviousSubscriptions` in IO. If the client disconnects between lookup and transaction, `updateSubscribers` handles `Nothing` by still sending END/DELD to other subscribed clients. + +`checkAnotherClient` ensures END messages are only sent to clients **other than** the subscribing client — if `clntId == clientId`, the action is skipped. + +`removeWhenNoSubs` removes a client from `subClients` only when **both** queue and service subscriptions are empty — not after each individual unsubscription. + +## SubscribedClients — TVar-of-Maybe pattern + +See comment on `SubscribedClients` in Env/STM.hs. Subscription entries store `TVar (Maybe (Client s))` — the TVar's contents change between `Just client` and `Nothing` on disconnect/reconnect, allowing STM transactions reading the TVar to automatically re-evaluate when the subscriber changes. Entries **are** removed via `lookupDeleteSubscribedClient` (when subscriptions end) and `deleteSubcribedClient` (on client disconnect), though the source comment describes the original intent of never cleaning them up. + +`upsertSubscribedClient` returns the previously subscribed client only if it's a **different** client (checked via `sameClientId`). Same client → returns `Nothing` (no END needed). + +## ProhibitSub / ServerSub state machine + +`Sub.subThread` is either `ProhibitSub` or `ServerSub (TVar SubscriptionThread)`. GET creates `ProhibitSub`, preventing subsequent SUB on the same queue (`CMD PROHIBITED`). SUB creates `ServerSub NoSub`, preventing subsequent GET (`CMD PROHIBITED`). This is enforced per-connection — the state tracks which access pattern the client chose. + +`SubscriptionThread` transitions: `NoSub` → `SubPending` (sndQ full during delivery) → `SubThread (Weak ThreadId)` (delivery thread spawned). `SubPending` is set **before** the thread is spawned; the thread atomically upgrades to `SubThread` after forking. If the thread exits before upgrading, the `modifyTVar'` is a no-op (checks for `SubPending` specifically). + +## tryDeliverMessage — sync/async split delivery + +See comment on `tryDeliverMessage`. When a SEND arrives and the queue was empty: + +1. Look up subscribed client **outside STM** (avoids transaction cost when no subscriber exists) +2. In STM: check `delivered` is Nothing, check sndQ not full → deliver synchronously, return Nothing +3. If sndQ is full: set `SubPending`, return the client/sub/stateVar triple +4. Fork a delivery thread that waits for sndQ space, verifies `sameClient` (prevents delivery to reconnected client), then delivers and sets state to `NoSub` + +The `sameClient` check inside the delivery thread prevents a race: if the client reconnected between fork and delivery, the new client will receive the message via its own SUB. + +`newServiceDeliverySub` creates a transient subscription **only** for service-associated queues during message delivery — this is separate from the SUB-created subscriptions. + +## Constant-time authorization — dummy keys + +See comment on `verifyCmdAuthorization`. Always runs verification regardless of whether the queue exists. When the queue is missing (AUTH error), `dummyVerifyCmd` runs verification with hardcoded dummy keys (Ed25519, Ed448, X25519) and the result is discarded via `seq`. The time depends only on the authorization type provided, not on queue existence. This mitigates timing side-channel attacks that could reveal whether a queue ID exists. + +When the signature algorithm doesn't match the queue key, verification runs with a dummy key of the **provided** algorithm and the result is forced then discarded (`seq False`). + +## Service subscription — hash-based drift detection + +See comment on `sharedSubscribeService`. The client sends expected `(count, idsHash)`. The server reads the actual values from storage, then computes `subsChange = subtractServiceSubs currSubs subs'` — the **difference** between what the client's session currently tracks and the new values. This difference (not the absolute values) is passed to `serverThread` via `CSService` to adjust `totalServiceSubs`. Using differences prevents double-counting when a service resubscribes. + +Stats classification: exactly one of `srvSubOk`/`srvSubMore`/`srvSubFewer`/`srvSubDiff` is incremented per subscription. `count == -1` is a special case for old NTF servers. + +## Proxy forwarding — single transmission, no service identity + +See comment on `processForwardedCommand`. Only single forwarded transmissions are allowed — batches are rejected with `BLOCK`. The synthetic `THandleAuth` has `peerClientService = Nothing`, preventing forwarded clients from claiming service identity. Only SEND, SKEY, LKEY, and LGET are allowed through `rejectOrVerify`. + +Double encryption: response is encrypted first to the client (with `C.cbEncrypt` using `reverseNonce clientNonce`), then wrapped and encrypted to the proxy (with `C.cbEncryptNoPad` using `reverseNonce proxyNonce`). Using reversed nonces ensures request and response directions use distinct nonces. + +## Proxy concurrency limiter + +See `wait`/`signal` around `forkProxiedCmd`. `procThreads` TVar implements a counting semaphore via STM `retry`. When `used >= serverClientConcurrency`, the transaction retries until another thread finishes. No bound on wait time — under sustained proxy load, commands queue indefinitely. + +## sendPendingEvtsThread — atomic swap + +`swapTVar pendingEvents IM.empty` atomically takes all pending events and clears the map. Events accumulated during processing are captured in the next interval. `tryWriteTBQueue` is tried first (non-blocking); if the sndQ is full, a forked thread does the blocking write. This prevents the pending events thread from stalling on one slow client. + +## deliverNtfsThread — throwSTM for control flow + +See `withSubscribed`. When a service client unsubscribes between the TVar read and the flush, `throwSTM (userError "service unsubscribed")` aborts the STM transaction. This is caught by `tryAny` and logged as "cancelled" — it's a successful path, not an error. The `flushSubscribedNtfs` function also cancels via `throwSTM` if the client is no longer current or sndQ is full. + +## Batch subscription responses — SOK grouped with MSG + +See comment on `processSubBatch`. When batched SUB commands produce SOK responses plus messages, the first message is appended to the SOK batch (up to 4 SOKs per block) in a single transmission. Remaining messages go to `msgQ` for separate delivery. This ensures the client receives at least one message quickly with its subscription acknowledgments. + +## send thread — MVar fair lock + +The TLS handle is wrapped in an `MVar` (`newMVar h`). Both `send` (command responses from `sndQ`) and `sendMsg` (messages from `msgQ`) acquire this lock via `withMVar`. This ensures fair interleaving between response batches and individual messages, preventing either from starving the other. + +## Queue creation — ID oracle prevention + +See comment on queue creation with client-supplied IDs. When `clntIds = True`, the ID must equal `B.take 24 (C.sha3_384 (bs corrId))`. This prevents ID oracle attacks where an attacker could probe for queue existence by attempting to create a queue with a specific ID and observing DUPLICATE vs AUTH errors. + +## disconnectTransport — subscription-aware idle timeout + +See `noSubscriptions`. The idle client disconnect thread only checks expiration when the client has **no** subscriptions (not in `subClients` for either SMP or NTF subscribers). Subscribed clients are kept alive indefinitely regardless of inactivity — they're waiting for messages, not idle. + +## clientDisconnected — ordered cleanup + +On disconnect: (1) set `connected = False`, (2) atomically swap out all subscriptions, (3) cancel subscription threads, (4) if server is still active: delete client from server map, update queue and service subscribers. Service subscription cleanup (`updateServiceSubs`) subtracts the client's accumulated `(count, idsHash)` from `totalServiceSubs`. End threads are swapped out and killed. + +## Control port — single auth, no downgrade + +See `controlPortAuth`. Role is set on first `CPAuth` only (`CPRNone` case). Subsequent AUTH commands print the current role but do not change it — the message says "start new session to change." This prevents role downgrade attacks within a session. + +## withQueue_ — updatedAt time check + +Every queue command calls `withQueue_` which checks if `updatedAt` matches today's date. If not, `updateQueueTime` is called to update it. This means `updatedAt` is a daily resolution activity timestamp, not a per-command timestamp. The SEND path passes `queueNotBlocked = False` to still update the time even for blocked queues (though SEND fails on blocked queues separately). + +## foldrM in client command processing + +`foldrM process ([], [])` processes a batch of verified commands right-to-left, accumulating responses and messages. The responses list is built with `(:)`, so the final order matches the original command order. Messages from SUB are collected separately and passed as the second element of the `sndQ` tuple. diff --git a/spec/modules/Simplex/Messaging/Server/CLI.md b/spec/modules/Simplex/Messaging/Server/CLI.md new file mode 100644 index 000000000..a369b6981 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/CLI.md @@ -0,0 +1,31 @@ +# Simplex.Messaging.Server.CLI + +> CLI argument parsing, INI configuration reading, X.509 certificate generation, and utility functions. + +**Source**: [`CLI.hs`](../../../../../src/Simplex/Messaging/Server/CLI.hs) + +## strictIni / iniOnOff — error semantics + +`strictIni` calls `error` on missing INI keys — no structured error, no recovery. `readStrictIni` chains this with `read`, so both "key missing" and "key present but unparseable" produce exceptions indistinguishable by callers. + +`iniOnOff` returns `Maybe Bool`: "on" → `Just True`, "off" → `Just False`, missing key → `Nothing`, any other value → `error` (not a parse failure). This tri-valued logic drives the implicit-default pattern in [Main.md](./Main.md#restore_messages--implicit-default-propagation). + +## iniTransports — port reuse prevention + +SMP ports are parsed first. When explicit WebSocket ports are provided, they are filtered to exclude already-used SMP ports (`ports ws \\ smpPorts`). However, when "websockets" is "on" with no explicit port, it defaults to `["80"]` without filtering against SMP ports. This means if SMP is also on port 80, the default WebSocket configuration would conflict. + +## iniDBOptions — schema creation disabled at CLI + +When reading database options from INI, `createSchema` is always set to `False` regardless of INI content. This enforces a security invariant: database schemas must be created manually or by migration, never automatically by the server. + +## createServerX509_ — external tool dependency + +Certificate generation shells out to `openssl` commands via `readCreateProcess`, which throws `IOError` on non-zero exit codes. Failures are thus detected but propagate as uncaught exceptions — no structured error handling wraps the certificate generation sequence. + +## checkSavedFingerprint — startup invariant + +Fingerprint is extracted from the CA certificate and saved during init. On every server start, the saved fingerprint is compared against the current certificate. Mismatch → startup failure. See [Main.md#initializeserver--fingerprint-invariant](./Main.md#initializeserver--fingerprint-invariant). + +## genOnline — existing certificate dependency + +When `signAlgorithm_` or `commonName_` are not provided, `genOnline` reads them from the existing certificate. This creates a hidden dependency on current certificate state that's not visible from the function signature. Expects exactly one certificate in the PEM file. diff --git a/spec/modules/Simplex/Messaging/Server/Control.md b/spec/modules/Simplex/Messaging/Server/Control.md new file mode 100644 index 000000000..644fb786a --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Control.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Control + +> Control port protocol types and encoding for server administration. + +**Source**: [`Control.hs`](../../../../../src/Simplex/Messaging/Server/Control.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Env/STM.md b/spec/modules/Simplex/Messaging/Server/Env/STM.md new file mode 100644 index 000000000..d0e948120 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Env/STM.md @@ -0,0 +1,47 @@ +# Simplex.Messaging.Server.Env.STM + +> Server environment, configuration, client state, subscription types, and storage initialization. + +**Source**: [`Env/STM.hs`](../../../../../../src/Simplex/Messaging/Server/Env/STM.hs) + +## Overview + +This module defines the server's shared state (`Env`, `Server`, `Client`) and the subscription model types. Most non-obvious patterns are about concurrency safety — preventing STM contention while maintaining consistency. Key patterns are documented in [Server.md](../Server.md) where they're used; this doc covers patterns specific to the type definitions and initialization. + +## SubscribedClients — TVar-of-Maybe pattern + +See comment on `SubscribedClients`. Entries store `TVar (Maybe (Client s))` rather than the client directly. Three implications: + +1. STM transactions reading the TVar automatically re-evaluate when the subscriber changes (disconnect/reconnect) +2. IO lookups via `TM.lookupIO` can be done outside STM safely (the TVar reference itself is stable while it exists) +3. Reconnecting clients can reuse existing subscription slots without map-level contention + +Note: despite the source comment saying subscriptions "are not removed," the code does remove entries via `lookupDeleteSubscribedClient` (when subscriptions end) and `deleteSubcribedClient` (on client disconnect). The comment reflects the original design intent for mobile client continuity, but the current implementation does clean up. + +See also [Server.md#subscribedclients--tvar-of-maybe-pattern](../Server.md#subscribedclients--tvar-of-maybe-pattern). + +## deleteSubcribedClient — split transaction for contention avoidance + +See comment on `deleteSubcribedClient`. The TVar lookup is in a separate IO read from the client comparison and deletion. This is safe because the client is read in the same STM transaction as the deletion — if another client was inserted between lookup and delete, `sameClient` returns False and the delete is skipped. After setting the TVar to `Nothing`, the entry is also removed from the TMap. + +## insertServerClient — connected check + +`insertServerClient` checks `connected` inside the STM transaction before inserting. If the client was already marked disconnected (race with cleanup), the insert is skipped and returns `False`. This prevents resurrecting a disconnected client in the server map. + +## SupportedStore — compile-time storage validation + +Type family with `(Int ~ Bool, TypeError ...)` for invalid combinations. The unsatisfiable `Int ~ Bool` constraint forces GHC to emit the `TypeError` message. Valid: Memory+Memory, Memory+Journal, Postgres+Journal, Postgres+Postgres (with flag). Invalid: Memory+Postgres, Postgres+Memory. The `dbServerPostgres` CPP flag controls whether Postgres+Postgres is available. + +## newEnv — initialization order + +Store initialization order matters: (1) create message store (loads store log for STM backends), (2) create notification store (empty TMap), (3) generate TLS credentials, (4) compute server identity from fingerprint, (5) create stats, (6) create proxy agent. The store log load (`loadStoreLog`) calls `readWriteQueueStore` which reads the existing log, replays it to build state, then opens a new log for writing. `setStoreLog` attaches the write log to the store. + +HTTPS credentials are validated: must be at least 4096-bit RSA (`public_size >= 512` bytes). The check explicitly notes that Let's Encrypt ECDSA uses "insecure curve p256." + +## ServerSubscribers — dual subscriber tracking + +`ServerSubscribers` has two `SubscribedClients` maps: `queueSubscribers` (one entry per queue, for direct subscriptions) and `serviceSubscribers` (one entry per service, for service-certificate subscriptions). `totalServiceSubs` tracks the aggregate `(count, IdsHash)` across all services. `subClients` is an `IntSet` of all client IDs with any subscription (union of queue and service subscribers) — used for idle disconnect decisions. + +## endThreads — weak references with sequence counter + +See comment on `endThreads`. Forked client threads (delivery, proxy commands) are tracked in `IntMap (Weak ThreadId)` with a monotonically increasing `endThreadSeq`. On client disconnect, all threads are swapped out and killed. Weak references allow GC to collect threads that finished normally without explicit cleanup. diff --git a/spec/modules/Simplex/Messaging/Server/Expiration.md b/spec/modules/Simplex/Messaging/Server/Expiration.md new file mode 100644 index 000000000..a51e24c20 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Expiration.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Expiration + +> Expiration configuration and epoch calculation. + +**Source**: [`Expiration.hs`](../../../../../src/Simplex/Messaging/Server/Expiration.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Information.md b/spec/modules/Simplex/Messaging/Server/Information.md new file mode 100644 index 000000000..16f153154 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Information.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Information + +> Server public information types (config, operator, hosting) for the server info page. + +**Source**: [`Information.hs`](../../../../../src/Simplex/Messaging/Server/Information.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Main.md b/spec/modules/Simplex/Messaging/Server/Main.md new file mode 100644 index 000000000..aed538573 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Main.md @@ -0,0 +1,37 @@ +# Simplex.Messaging.Server.Main + +> Server CLI entry point: dispatches Init, Start, Delete, Journal, and Database commands. + +**Source**: [`Main.hs`](../../../../../src/Simplex/Messaging/Server/Main.hs) + +## Overview + +This is the CLI dispatcher for the SMP server. It parses INI configuration, validates storage mode combinations, and dispatches to the appropriate command handler. The most complex logic is storage configuration validation and migration between storage modes. + +## Storage mode compatibility — state machine + +`checkMsgStoreMode` and `iniStoreCfg` implement a state machine of valid storage mode combinations. Valid: Memory+Memory, Memory+Journal (deprecated), Postgres+Journal, Postgres+Postgres (with flag). Invalid: Memory+Postgres (queue store doesn't support it), Postgres+Memory (messages can't be in-memory with DB queues). Error messages guide the user toward migration commands. The validity is also enforced at the type level via `SupportedStore` in [Env/STM.md](./Env/STM.md#supportedstore--compile-time-storage-validation). + +## INI parsing — error context loss + +`readIniFile` errors are coerced to `String` without structured error information. When INI keys are missing or unparseable, `strictIni` calls `error` (see [CLI.md](./CLI.md#strictini--inionoff--error-semantics)). No line numbers or parse context are preserved. + +## restore_messages — implicit default propagation + +The `restore_messages` INI setting has three-valued logic: explicit "on" → restore, explicit "off" → skip, missing → inherits from `enable_store_log`. This implicit default is not captured in the type system — callers see `Maybe Bool` that silently resolves against another setting. + +## serverPublicInfo — validation with field dependencies + +`sourceCode` is required if ANY `ServerPublicInfo` field is present (in line with AGPLv3 license). `operator_country` requires `operator` to be set. `hosting_country` requires `hosting`. These constraints are enforced at parse time, not by the type system — they can be violated by programmatic construction. + +## initializeServer — fingerprint invariant + +During init, the CA certificate fingerprint is saved to a file. On every subsequent start, `checkSavedFingerprint` (in CLI.hs) validates that the current CA certificate matches the saved fingerprint. If the certificate is replaced without updating the fingerprint file, startup fails. This prevents silent key rotation. + +## Database import — non-atomic migration + +`importStoreLogToDatabase` reads the store log into memory, writes to database, then renames the original file with `.bak` suffix. If the function fails after partial database writes, the original file is still present but the database has partial data. No transactional guarantee across the file→DB boundary. + +## Journal store deprecation warning + +`SSCMemoryJournal` initialization prints a deprecation warning (see `newEnv` in Env/STM.hs). Journal message stores will be removed — migration path is: journal export → database import. diff --git a/spec/modules/Simplex/Messaging/Server/Main/Init.md b/spec/modules/Simplex/Messaging/Server/Main/Init.md new file mode 100644 index 000000000..665938ae8 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Main/Init.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Server.Main.Init + +> Server initialization: INI file content generation, default settings, and CLI option structures. + +**Source**: [`Main/Init.hs`](../../../../../../src/Simplex/Messaging/Server/Main/Init.hs) + +## iniFileContent — selective commenting + +`iniFileContent` uses `optDisabled`/`optDisabled'` to conditionally comment out INI settings. A setting appears commented when it was not explicitly provided or matches the default value. Consequence: regenerating the INI file after user changes will re-comment modified-to-default values, making it appear the user's change was reverted. + +## iniDbOpts — default fallback + +Database connection settings are uncommented only if they differ from `defaultDBOpts`. A custom connection string that matches the default will appear commented. + +## Control port passwords — independent generation + +Admin and user control port passwords are generated as two independent `randomBase64 18` calls during initialization. Despite `let pwd = ... in (,) <$> pwd <*> pwd` appearing to share a value, `pwd` is an IO action — applicative `<*>` executes it twice, producing two different random passwords. The INI template thus has distinct admin and user passwords. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore.md b/spec/modules/Simplex/Messaging/Server/MsgStore.md new file mode 100644 index 000000000..625649170 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.MsgStore + +> Message log record type for store log serialization. + +**Source**: [`MsgStore.hs`](../../../../../src/Simplex/Messaging/Server/MsgStore.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md new file mode 100644 index 000000000..3c0ab6afc --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.MsgStore.Journal + +> **Deprecated** — will be removed. Migration path: `journal export` → in-memory, then `database import` → PostgreSQL. See deprecation warning in [Env/STM.hs](../../../../../../src/Simplex/Messaging/Server/Env/STM.hs) `SSCMemoryJournal` initialization. + +**Source**: [`Journal.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Journal.hs) + +No further documentation — this module is deprecated. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md new file mode 100644 index 000000000..7262bde0a --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md @@ -0,0 +1,57 @@ +# Simplex.Messaging.Server.MsgStore.Postgres + +> PostgreSQL message store: server-side stored procedures for message operations, COPY protocol for bulk import. + +**Source**: [`Postgres.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Postgres.hs) + +## MsgQueue is unit type + +`type MsgQueue PostgresMsgStore = ()`. There is no message queue object for Postgres — all message operations go directly to the database via stored procedures. Functions like `getMsgQueue` return `pure ()`. + +## Partial interface — error stubs + +Multiple `MsgStoreClass` methods are `error "X not used"`: `withActiveMsgQueues`, `unsafeWithAllMsgQueues`, `logQueueStates`, `withIdleMsgQueue`, `getQueueMessages_`, `tryDeleteMsg_`, `setOverQuota_`, `getQueueSize_`, `unsafeRunStore`. These are required by the type class but not applicable to Postgres. Calling any at runtime crashes. Postgres overrides the default implementations of `tryPeekMsg`, `tryDelMsg`, `tryDelPeekMsg`, `deleteExpiredMsgs`, and `getQueueSize` with direct database calls. + +## writeMsg — quota logic in stored procedure + +`write_message(?,?,?,?,?,?,?)` is a PostgreSQL stored procedure that returns `(quota_written, was_empty)`. Quota enforcement happens in SQL, not in Haskell. This means quota logic is duplicated: STM store checks `canWrite` flag in Haskell, Postgres store checks in the database function. The two implementations must agree on quota semantics. + +## tryDelPeekMsg — variable row count + +The stored procedure `try_del_peek_msg` returns 0, 1, or 2 rows. For the 1-row case, the code checks whether the returned message's `messageId` matches the requested `msgId` to distinguish "deleted, no next message" from "delete failed, current message returned." This disambiguation is possible because the stored procedure always returns available messages even when deletion doesn't match. + +## uninterruptibleMask_ on all database operations + +All write operations (`writeMsg`, `tryDelMsg`, `tryDelPeekMsg`, `deleteExpiredMsgs`) and `isolateQueue` are wrapped in `E.uninterruptibleMask_`. This prevents async exceptions (e.g., client disconnect) from interrupting mid-transaction, which could leave database connections in an inconsistent state. + +## batchInsertMessages — COPY protocol + +Uses PostgreSQL's COPY FROM STDIN protocol (`DB.copy_` + `DB.putCopyData` + `DB.putCopyEnd`) for bulk message import, which is much faster than individual INSERTs. Messages are encoded to CSV format. Parse errors on individual records are logged and skipped — the import is error-tolerant. The entire operation runs in a single transaction (`withTransaction`). + +## exportDbMessages — batched I/O + +Accumulates rows in an `IORef` list (prepended for O(1) insert), flushing every 1000 records with `reverse` to restore order. Uses `DB.foldWithOptions_` with `fetchQuantity = Fixed 1000` to avoid loading all messages into memory. + +## updateQueueCounts — two-step reset + +Creates a temp table with aggregated message stats, then updates `msg_queues` in two steps: first zeros all queue counts, then applies actual stats from the temp table. The two-step approach handles queues with zero messages: they're reset by the first UPDATE but not touched by the second (no matching row in temp table). + +## toMessage — nanosecond precision lost + +`MkSystemTime ts 0` constructs timestamps with zero nanoseconds. Only whole seconds are stored in the database. Messages read from Postgres have coarser timestamps than messages in STM/Journal stores. + +## isolateQueue IS the transaction boundary + +`isolateQueue` for Postgres does `uninterruptibleMask_ $ withDB' op ... $ runReaderT a . DBTransaction`. Each `isolateQueue` call creates a fresh `DBTransaction` carrying the DB connection. This is how `tryPeekMsg_` (which uses `asks dbConn`) gets its connection. The `withQueueLock` is identity for Postgres, so `isolateQueue` provides no mutual exclusion — only the DB transaction provides isolation. + +## newMsgStore hardcodes useCache = False + +`newQueueStore @PostgresQueue (queueStoreCfg config, False)` — the Postgres message store always disables queue caching. All lookups go directly to the database. Contrast with the Journal+Postgres combination where caching is enabled. + +## deleteQueueSize — size before delete + +`deleteQueueSize` calls `getQueueSize` BEFORE `deleteStoreQueue`. The returned size is the count at query time — a concurrent `writeMsg` between the size query and the delete means the reported size is stale. This is acceptable because the size is used for statistics, not for correctness. + +## unsafeMaxLenBS + +`toMessage` uses `C.unsafeMaxLenBS` to bypass the `MaxLen` length check on message bodies read from the database. A TODO comment questions this choice. If the database contains oversized data, the length invariant is silently violated. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md b/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md new file mode 100644 index 000000000..95423cdf1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md @@ -0,0 +1,29 @@ +# Simplex.Messaging.Server.MsgStore.STM + +> In-memory STM message store: TQueue-based message queues with quota enforcement. + +**Source**: [`STM.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/STM.hs) + +## withQueueLock is identity + +`withQueueLock _ _ = id` — STM queues need no locking since STM provides atomicity. Journal.hs overrides this with a `TMVar`-based in-memory lock (via `withLockWaitShared`). Any code calling `withQueueLock` transparently gets the right concurrency control for the backend. + +## writeMsg — quota with empty-queue override + +When `canWrite` is `False` (over quota) but the queue is empty, writing is still allowed. This handles the case where all messages were deleted or expired but the `canWrite` flag was not reset. When the quota is exceeded, the actual message content is replaced with a `MessageQuota` (preserving only `msgId` and `msgTs`) — the client receives a quota notification instead of the message. + +## getMsgQueue — lazy initialization + +The message queue TVar (`msgQueue'`) starts as `Nothing`. The queue is created on first `getMsgQueue` call (lazy initialization). This means queues that are created but never receive messages don't allocate a TQueue. `getPeekMsgQueue` returns `Nothing` if no message queue exists — callers handle this as "queue is empty." + +## deleteQueue_ — atomic swap prevents post-delete operations + +`swapTVar (msgQueue' q) Nothing` atomically retrieves the old message queue and sets to `Nothing`. Any subsequent `getMsgQueue` call would create a fresh empty queue, but the deleted queue's `queueRec` TVar is also set to `Nothing` by `deleteStoreQueue`, so all operations would fail with `AUTH` first. + +## tryDeleteMsg_ — blind dequeue, no msgId check + +`tryDeleteMsg_` does `tryReadTQueue` — removes whatever is at the head without verifying the message ID. The msgId check lives in the default `tryDelMsg` / `tryDelPeekMsg` implementations in `Types.hs`, which always call `tryPeekMsg_` first to verify. Calling `tryDeleteMsg_` directly would silently delete the wrong message if the head changed between peek and delete. Safe only because `isolateQueue` serializes all operations on the same queue. + +## getQueueMessages_ snapshot — invisible gap + +`getQueueMessages_ False` implements non-destructive read by flushing TQueue then writing back. This runs inside `atomically` (via `isolateQueue`), so the temporarily-empty state is never visible to other transactions. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md new file mode 100644 index 000000000..2fd4c79bf --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md @@ -0,0 +1,29 @@ +# Simplex.Messaging.Server.MsgStore.Types + +> Type class for message stores with injective type families and polymorphic isolation. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Types.hs) + +## Injective type families + +All associated types (`StoreMonad`, `MsgQueue`, `StoreQueue`, `QueueStore`, `MsgStoreConfig`) use injective type families (`| m -> s`). This means each associated type uniquely determines the store type, avoiding ambiguity at call sites. Without injectivity, most call sites would need explicit type applications. + +## isolateQueue — polymorphic isolation + +`isolateQueue` abstracts the concurrency model: STM store implements it as `liftIO . atomically` (single STM transaction), while Journal store acquires a TMVar-based in-memory lock (not a filesystem lock). All message operations go through `isolateQueue` or `withPeekMsgQueue` (which calls `isolateQueue`). This means the atomicity guarantee varies by backend — STM gives true atomicity, Journal gives mutual exclusion via lock. + +## tryDelPeekMsg — atomic delete-and-peek + +Deletes the current message AND peeks the next one in a single `isolateQueue` call. This atomicity is critical for the ACK flow: the server needs to know if there's a next message to deliver immediately after acknowledging the current one, without a window where a concurrent SEND could interleave. + +## withIdleMsgQueue — journal-specific lifecycle + +For Journal store, the message queue file handle is closed after the action if it was initially closed or idle longer than the configured interval. For STM store, this is effectively a no-op (always open, never "idle"). The return tuple `(Maybe a, Int)` provides both the action result and the queue size — the `Maybe` is `Nothing` when no message queue exists (no messages ever written). + +## unsafeWithAllMsgQueues — CLI-only + +Explicitly unsafe: iterates all queues including those not in active memory. Only safe before server start or in CLI commands. During normal operation, Journal store may have queues on disk but not loaded — this function would load them, interfering with the lazy-loading lifecycle. + +## snapshotTQueue visibility gap + +`getQueueMessages_ False` (non-destructive read) flushes the TQueue then writes all messages back. Between flush and rewrite, concurrent STM transactions would see an empty queue. Since this runs inside `atomically` for STM store, the gap is invisible to other transactions. For Journal store (where `StoreMonad` is IO-based), this is not used. diff --git a/spec/modules/Simplex/Messaging/Server/NtfStore.md b/spec/modules/Simplex/Messaging/Server/NtfStore.md new file mode 100644 index 000000000..b58a44fad --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/NtfStore.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Server.NtfStore + +> In-memory notification store: per-notifier message notification lists with expiration. + +**Source**: [`NtfStore.hs`](../../../../../src/Simplex/Messaging/Server/NtfStore.hs) + +## storeNtf — outside-STM lookup with STM fallback + +`storeNtf` uses `TM.lookupIO` outside STM, then falls back to `TM.lookup` inside STM if the notifier entry doesn't exist. This is the same outside-STM lookup pattern used in Server.hs and Client/Agent.hs — avoids transaction re-evaluation from unrelated map changes. The double-check inside STM prevents races when two messages arrive concurrently for a new notifier. + +## deleteExpiredNtfs — last-is-earliest optimization + +Notifications are prepended (cons), so the last element in the list is the earliest. `deleteExpiredNtfs` checks `last ntfs` first — if the earliest notification is not expired, none are, and the entire list is skipped without filtering. This avoids traversing notification lists that have no expired entries. + +The outer `readTVarIO` check for empty list avoids entering an STM transaction at all for notifiers with no notifications. diff --git a/spec/modules/Simplex/Messaging/Server/Prometheus.md b/spec/modules/Simplex/Messaging/Server/Prometheus.md new file mode 100644 index 000000000..11610ee23 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Prometheus.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Server.Prometheus + +> Prometheus text exposition format for server metrics, with histogram gap-filling and derived aggregations. + +**Source**: [`Prometheus.hs`](../../../../../src/Simplex/Messaging/Server/Prometheus.hs) + +## Histogram gap-filling + +`showTimeBuckets` uses `mapAccumL` over sorted bucket keys. When the gap between consecutive buckets exceeds 60 seconds, it inserts a synthetic bucket at `sec - 60` with the cumulative total up to that point. This fills sparse `TimeBuckets` maps into continuous Prometheus histograms. The 60-second gap threshold is hardcoded. + +## Bucket sum aggregation — filters by value, not key + +`showBucketSums` intends to aggregate buckets into fixed time periods: 0-60s, 60-300s, 300-1200s, 1200-3600s, 3600+s. However, `IM.filter` (from `Data.IntMap.Strict`) filters by **value** (count), not by key (time). The predicate `\sec -> minTime <= sec && sec < maxTime` is applied to count values, not to the IntMap keys that represent seconds. This means buckets are selected based on whether their count falls in the range, not based on their time boundary. The aggregation boundaries are also independent of the bucketing thresholds in `updateTimeBuckets` (Stats.hs), which uses 5s/10s/30s/60s quantization. + +## Non-standard Prometheus timestamp output + +The `mstr` function appends `tsEpoch ts` (millisecond-precision Unix timestamp) directly after metric values, which is valid Prometheus text exposition format. + +## Delivery histogram count/sum source + +`simplex_smp_delivery_ack_confirmed_time_count` is `_msgRecv + _msgRecvGet`. `simplex_smp_delivery_ack_confirmed_time_sum` is `sumTime` from `_msgRecvAckTimes`. The count is accumulated separately from the histogram — if there's a code path that increments `msgRecv` without calling `updateTimeBuckets`, count and sum diverge. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore.md b/spec/modules/Simplex/Messaging/Server/QueueStore.md new file mode 100644 index 000000000..c906c9ecc --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore + +> Core record types for queue storage: QueueRec, NtfCreds, ServiceRec, ServerEntityStatus. + +**Source**: [`QueueStore.hs`](../../../../../src/Simplex/Messaging/Server/QueueStore.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md new file mode 100644 index 000000000..f97acaa2d --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md @@ -0,0 +1,97 @@ +# Simplex.Messaging.Server.QueueStore.Postgres + +> PostgreSQL queue store: cache-coherent TMap layer over database, double-checked locking, soft-delete lifecycle, COPY-based bulk import. + +**Source**: [`Postgres.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/Postgres.hs) + +## addQueue_ — no in-memory duplicate check, relies on DB constraint + +See comment on `addQueue_`: "Not doing duplicate checks in maps as the probability of duplicates is very low." The STM implementation checks all four ID maps before insertion and returns `DUPLICATE_`. The Postgres implementation skips this and relies on `UniqueViolation` from the DB, which `handleDuplicate` maps to `AUTH`, not `DUPLICATE_`. The same logical error produces different error codes depending on the store backend. + +## addQueue_ — non-atomic cache updates + +After the successful SQL INSERT, each cache map (`queues`, `senders`, `notifiers`, `links`) is updated in its own `atomically` block. Between these updates, the cache is partially consistent — a concurrent `getQueue_` by sender ID could miss the queue during the window between the `queues` insert and the `senders` insert. The STM implementation updates all maps in a single `atomically` block. `E.uninterruptibleMask_` prevents async exceptions but not concurrent reads. + +## getQueue_ / SNotifier — one-shot cache eviction on read + +See comment on `getQueue_` for the SNotifier case. After a successful notifier lookup, the notifier ID is deleted from the `notifiers` TMap. This makes the notifier cache a one-shot cache: the first lookup uses the cache, subsequent lookups hit the database. Unique to SNotifier — SSender entries persist indefinitely. The batch path (`getQueues_` SNotifier) does NOT do this eviction, so single and batch paths have different cache side effects. + +## getQueue_ / loadNtfQueue — notifier lookups never cache the queue + +See comment on `loadNtfQueue`: "checking recipient map first, not creating lock in map, not caching queue." Notifier-initiated DB loads produce ephemeral queue objects created with `mkQ False` (no persistent lock). Two concurrent notifier lookups for the same queue create independent queue objects with separate `TVar`s. Contrast with `loadSndQueue_` which caches via `cacheQueue`. + +## cacheQueue — double-checked locking + +Classic pattern: (1) TMap lookup outside lock, (2) if miss, DB load + create queue + acquire `withQueueLock`, (3) second TMap check inside lock + `atomically`, (4) if another thread won the race, discard the freshly created queue. See comment on `cacheQueue` for the rationale about preventing duplicate file opens. For Journal storage, the losing thread's lock remains in `queueLocks` as a harmless orphan. For Postgres-only storage (`mkQueue` creates a TVar), no resource leak. + +## getQueues_ — snapshot-based cache with stale-read risk + +Both SRecipient and SNotifier paths start with `readTVarIO` snapshots of the relevant TMap(s), then partition requested IDs into "found" and "need DB load." Between snapshot and DB query, the cache can change. The `cacheRcvQueue` path handles this with a second check inside the lock. The SNotifier path does NOT cache — it uses the stale snapshot to decide `maybe (mkQ False rId qRec) pure (M.lookup rId qs)`, so concurrent loads can create duplicate ephemeral objects. + +## getQueues_ — error code asymmetry: INTERNAL vs AUTH + +When all IDs are found in cache but some map to `Left` (theoretically impossible), the error is `INTERNAL`. When some IDs needed DB loading and were missing, the error is `AUTH`. Same "not found" condition, different error codes depending on whether the DB was consulted. The `INTERNAL` branch is a defensive assertion against inconsistent TMap snapshots. + +## withDB — every operation runs in its own transaction + +`withDB` wraps each action in `withTransaction` (PostgreSQL `READ COMMITTED`). No multi-statement transactions in queue store operations (unlike `getEntityCounts` and `batchInsertQueues` which use `withTransaction` directly). SQL exceptions are caught, logged, and mapped to `STORE` with the exception text — which propagates to the SMP client over the wire. + +## withQueueRec — lock-mask-read pattern + +All mutating operations share: (1) `withQueueLock` (per-queue lock), (2) `E.uninterruptibleMask_` (no async exceptions mid-operation), (3) `readQueueRecIO` (check queue not deleted). If the TVar reads `Nothing`, the operation short-circuits with `AUTH` without touching the database. The TVar is the authoritative "is deleted" check; `assertUpdated` (zero rows → `AUTH`) catches cache-DB divergence as a secondary check. + +## deleteStoreQueue — two-phase soft delete + +Queue deletion is soft: `UPDATE ... SET deleted_at = ?`. The row remains in the database. `compactQueues` later does the hard delete: `DELETE ... WHERE deleted_at < ?` using the configurable `deletedTTL`. All queries include `AND deleted_at IS NULL` to exclude soft-deleted rows. The STM implementation has no equivalent — `compactQueues` returns `pure 0`. + +## deleteStoreQueue — non-atomic cache cleanup, links never cleaned + +The TVar is set to `Nothing` first, then secondary maps (`senders`, `notifiers`, `notifierLocks`) are cleaned in separate `atomically` blocks. Between these, secondary maps point to a dead queue (functionally correct — returns AUTH either way). The `links` map is never cleaned up here — link entries for deleted queues remain in memory indefinitely. + +## secureQueue — idempotency difference from STM + +Re-securing with the same key falls through the verify function to `pure ()`, then **still executes the SQL UPDATE and TVar write**. The STM implementation returns `Right ()` without TVar mutation when the same key is provided. Both implementations write a store log entry either way. The Postgres version performs an unnecessary DB round-trip, connection pool checkout, and TVar write that the STM version avoids. + +## addQueueNotifier — three-layer duplicate detection + +(1) **Cache check**: `checkCachedNotifier` acquires a per-notifier-ID lock via `notifierLocks`, then checks `TM.memberIO`. Returns `DUPLICATE_`. (2) **Queue lock**: Via `withQueueRec`, prevents concurrent modifications to the same queue. (3) **Database constraint**: `handleDuplicate` catches `UniqueViolation`, returns `AUTH`. Same duplicate, different error codes depending on whether cache was warm. The `notifierLocks` map grows unboundedly — locks are never removed except when the queue is deleted. + +## addQueueNotifier — always clears notification service + +The SQL UPDATE always sets `ntf_service_id = NULL` when adding/replacing a notifier. The previous notifier's service association is silently lost. The STM implementation additionally calls `removeServiceQueue` to update service-level tracking; the Postgres version does not. + +## rowToQueueRec — link data replaced with empty stubs + +The standard `queueRecQuery` does NOT select `fixed_data` and `user_data` columns. When converting to `QueueRec`, link data is stubbed: `(,(EncDataBytes "", EncDataBytes "")) <$> linkId_`. Actual link data is loaded on demand via `getQueueLinkData`. Any code reading `queueData` from a cached `QueueRec` without going through `getQueueLinkData` sees empty bytes. The separate `rowToQueueRecWithData` (used by `foldQueueRecs` with `withData = True`) includes real data. + +## getCreateService — serialization via serviceLocks + +Entire operation wrapped in `withLockMap (serviceLocks st) fp`, serializing all creation/lookup for the same certificate fingerprint. Inside the lock: SELECT by `service_cert_hash`, if not found attempt INSERT catching `UniqueViolation`. The `serviceLocks` map grows unboundedly — no cleanup mechanism. + +## batchInsertQueues — COPY protocol with manual CSV serialization + +Uses PostgreSQL's `COPY FROM STDIN WITH (FORMAT CSV)` for bulk import. Queue records manually serialized via `queueRecToText`/`renderField`. This must stay in sync with `insertQueueQuery` column order — a mismatch causes silent data corruption. The `renderField` function does not escape CSV metacharacters, which is safe only because field values (entity IDs, keys, DH secrets) are binary data without commas/quotes/newlines. Runs in a single transaction; row count queried in a separate transaction afterward. + +## withLog_ — fire-and-forget store log writes + +`withLog_` catches all exceptions via `catchAny` and logs a warning, but does not fail the operation. Store log writes are best-effort. Contrast with the STM `withLog'` where log failures can propagate as `STORE` errors. In the Postgres implementation, the store log can fall behind the database state since the DB is the authoritative persistence layer. + +## useCache flag — behavioral bifurcation + +`useCache :: Bool` creates two distinct code paths. When `False`: `addQueue_` skips all TMap updates, `getQueue_` always loads from DB, `addQueueNotifier` skips cache duplicate check, `deleteStoreQueue` skips cache cleanup. Notably, `loadQueueNoCache` still creates queues with `mkQ True` (persistent lock) even though caching is disabled — the lock is needed for `withQueueRec`'s `withQueueLock`. + +## getServiceQueueCountHash — behavioral divergence from STM + +Postgres returns `Right (0, mempty)` when the service is not found (via `maybeFirstRow'` default). STM returns `Left AUTH`. Same logical condition, different error handling. Callers that expect AUTH on missing service will silently get a zero count from Postgres. + +## deleteStoreQueue — cross-module lock contract + +See comment on `deleteStoreQueue`: "this method is called from JournalMsgStore deleteQueue that already locks the queue." Unlike other mutations that go through `withQueueRec` (which acquires the lock), `deleteStoreQueue` uses `E.uninterruptibleMask_ $ runExceptT` directly — no `withQueueLock`. The caller must hold the lock. + +## addQueueLinkData — immutable data protection + +When link data already exists with the same `lnkId`, the SQL UPDATE adds `AND (fixed_data IS NULL OR fixed_data = ?)` to prevent overwriting immutable (fixed) data. If the immutable portion doesn't match, `assertUpdated` triggers AUTH. This enforces the invariant that `fixed_data` can only be set once. + +## assertUpdated — AUTH is overloaded + +`assertUpdated` checks that non-zero rows were affected. Zero rows → `AUTH`. This is the same error code returned for "not found" (via `readQueueRecIO`) and "duplicate" (via `handleDuplicate`). The actual cause — stale cache, deleted queue, or constraint violation — is indistinguishable in logs. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md b/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md new file mode 100644 index 000000000..b0ca64877 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore.QueueInfo + +> Data types for queue info display (control port), with JSON encoding. + +**Source**: [`QueueInfo.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/QueueInfo.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md new file mode 100644 index 000000000..6ff8da3b5 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md @@ -0,0 +1,37 @@ +# Simplex.Messaging.Server.QueueStore.STM + +> In-memory STM queue store: queue CRUD with store log journaling and service tracking. + +**Source**: [`STM.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/STM.hs) + +## addQueue_ — atomic multi-ID DUPLICATE check + +`addQueue_` checks ALL entity IDs (recipient, sender, notifier, link) for existence in a single STM transaction. If ANY already exist, returns `DUPLICATE_` without inserting anything. This prevents partial state where some IDs were inserted before the duplicate was detected on another. The `mkQ` callback runs outside STM before the check — the queue object is created optimistically and discarded if the check fails. + +## getCreateService — outside-STM with role validation + +`getCreateService` uses the outside-STM lookup pattern (`TM.lookupIO` then STM fallback). When a service cert already exists, `checkService` validates the role matches — a cert attempting to register with a different `SMPServiceRole` gets `SERVICE` error. A new service is only created if the ID is not already in `services` (prevents DUPLICATE). The `(serviceId, True/False)` return indicates whether the log should be written (only for new services). + +## IdsHash XOR in setServiceQueues_ + +Both `addServiceQueue` and `removeServiceQueue` use `setServiceQueues_`, which unconditionally XORs `queueIdHash qId` into `idsHash`. Since XOR is self-inverse, removal cancels addition. However, the XOR is applied blindly — there is no `S.member` guard. If `addServiceQueue` were called twice for the same `qId`, the XOR would self-cancel while the `Set` (via `S.insert` idempotency) retains the element, making hash and Set inconsistent. Similarly, `removeServiceQueue` on a non-member XORs a phantom ID into the hash. Correctness relies on callers maintaining the invariant: each `qId` is added exactly once and removed at most once per service. + +## withLog — uninterruptibleMask_ for log integrity + +Store log writes are wrapped in `E.uninterruptibleMask_` — cannot be interrupted by async exceptions during the write. This prevents partial log records that would corrupt the store log file during replay. Synchronous exceptions are caught by `E.try` and converted to `STORE` error (logged, not crashed). + +## secureQueue — idempotent replay + +If `senderKey` already matches the provided key, returns `Right ()`. A different key returns `Left AUTH`. This idempotency is essential for store log replay where the same `SecureQueue` record may be applied multiple times. + +## getQueues_ — map snapshot for batch consistency + +Batch queue lookups (`getQueues_`) read the entire TVar map once with `readTVarIO`, then look up each queue ID in the pure `Map`. This provides a consistent snapshot (all lookups see the same map state) and is more efficient than per-queue IO lookups for large batches. + +## closeQueueStore — non-atomic shutdown + +`closeQueueStore` clears TMaps in separate `atomically` calls, not one transaction. Concurrent operations during shutdown could see partially cleared state. This is acceptable because the store log is closed first, and the server should not be processing new requests during shutdown. + +## addQueueLinkData — conditional idempotency + +Re-adding link data with the same `lnkId` and matching first component of `QueueLinkData` succeeds (idempotent replay). Different `lnkId` or mismatched data returns `AUTH`. This handles store log replay where the same `CreateLink` may be applied multiple times. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md b/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md new file mode 100644 index 000000000..173cbe967 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore.Types + +> Type classes for queue store and stored queue operations. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/Types.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Stats.md b/spec/modules/Simplex/Messaging/Server/Stats.md new file mode 100644 index 000000000..056dc4a88 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Stats.md @@ -0,0 +1,39 @@ +# Simplex.Messaging.Server.Stats + +> Server statistics: counters, rolling period tracking, delivery time histograms, proxy stats, service stats. + +**Source**: [`Stats.hs`](../../../../../src/Simplex/Messaging/Server/Stats.hs) + +## Overview + +All stats are `IORef`-based, not STM — individual increments are atomic (`atomicModifyIORef'_`) but multi-field reads are not transactional. `getServerStatsData` reads 30+ IORefs sequentially — the resulting snapshot is temporally smeared, not a point-in-time atomic view. + +## PeriodStats — rolling window with boundary-only reset + +`PeriodStats` maintains three `IORef IntSet` (day, week, month). `updatePeriodStats` hashes the entity ID and inserts into all three periods. `periodStatCounts` resets a period's IntSet **only** when the period boundary is reached (day 1 of that period). At non-boundary times, it returns `""` (empty string) — the data is kept accumulating but not reported. + +See comment on `periodCount`. At day boundary (`periodCount 1 ref`), the day set is atomically swapped to empty and its size returned. Week resets on Monday (day 1 of week), month on the 1st. Periods are independent — day reset does NOT affect week/month accumulation. Each period counts unique queue hashes that were active during that period. + +## Disabled metrics — performance trade-offs + +See comments on `qSubNoMsg` and `subscribedQueues` in the source. `qSubNoMsg` is disabled because counting "subscription with no message" creates too many STM transactions. `subscribedQueues` is disabled because maintaining PeriodStats-style IntSets for all subscribed queues uses too much memory. Both fields are omitted from the stats output entirely. The parser handles old log files that contain these fields: `qSubNoMsg` is silently skipped via `skipInt`, and `subscribedQueues` is parsed but replaced with empty data. + +## TimeBuckets — ceil-aligned bucketing with precision loss + +`updateTimeBuckets` quantizes delivery-to-acknowledgment times into sparse buckets. Exact for 0-5s, then ceil-aligned: 6-30s → 5s buckets, 31-60s → 10s, 61-180s → 30s, 180+s → 60s. The `toBucket` formula uses `- ((- n) \`div\` m) * m` for ceiling division. `sumTime` and `maxTime` preserve exact values; only the histogram is lossy. + +## Serialization backward compatibility — silent data coercion + +The `strP` parser for `ServerStatsData` handles multiple format generations. Old format `qDeleted=` is read as `(value, 0, 0)` — `qDeletedNew` and `qDeletedSecured` default to 0. `qSubNoMsg` is parsed and silently discarded (`skipInt`). `subscribedQueues` is parsed but replaced with empty data. Data loaded from old formats is coerced, not reconstructed — precision is permanently lost. + +## Serialization typo — internally consistent + +The field `_srvAssocUpdated` is serialized as `"assocUpdatedt="` (extra 't') in `ServiceStatsData` encoding. The parser expects the same misspelling. Both sides are consistent, so it works — but external systems expecting `assocUpdated=` will fail to parse. + +## atomicSwapIORef for stats logging + +In `logServerStats` (Server.hs), each counter is read and reset via `atomicSwapIORef ref 0`. This is lock-free but means counters are zeroed after each logging interval — values represent delta since last log, not cumulative totals. `qCount` and `msgCount` are exceptions: they're read-only (via `readIORef`) because they track absolute current values, not deltas. + +## setPeriodStats — not thread safe + +See comment on `setPeriodStats`. Uses `writeIORef` (not atomic). Only safe during server startup when no other threads are running. If called concurrently, period data could be corrupted. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog.md b/spec/modules/Simplex/Messaging/Server/StoreLog.md new file mode 100644 index 000000000..cef1bdfb2 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog.md @@ -0,0 +1,36 @@ +# Simplex.Messaging.Server.StoreLog + +> Append-only log for queue state changes: write, read/replay, compaction, crash recovery, backup retention. + +**Source**: [`StoreLog.hs`](../../../../../src/Simplex/Messaging/Server/StoreLog.hs) + +## writeStoreLogRecord — atomicity via manual write + +See comment in `writeStoreLogRecord`. `hPutStrLn` breaks writes larger than 1024 bytes into multiple system calls on `LineBuffered` handles, which could interleave with concurrent writes. The solution is manual `B.hPut` (single call for the complete record + newline) plus `hFlush`. `E.uninterruptibleMask_` prevents async exceptions between write and flush — ensures a complete record is always written. + +## readWriteStoreLog — crash recovery state machine + +The `.start` temp backup file provides crash recovery during compaction. The sequence: + +1. Read existing log, replay into memory +2. Rename log to `.start` (atomic rename = backup point) +3. Write compacted state to new file +4. Rename `.start` to timestamped backup, remove old backups + +If the server crashes during step 3, the next startup detects `.start` and restores from it instead of the incomplete new file. Any partially-written current file is preserved as `.bak`. The comment says "do not terminate" during compaction — there is no safe interrupt point between steps 2 and 4. + +## removeStoreLogBackups — layered retention policy + +Backup retention is layered: (1) keep all backups newer than 24 hours, (2) of the rest, keep at least 3, (3) of those eligible for deletion, only delete backups older than 21 days. This means a server with infrequent restarts accumulates many backups (only cleaned on startup), while a frequently-restarting server keeps a rolling window. Backup timestamps come from ISO 8601 suffixes parsed from filenames. + +## QueueRec StrEncoding — backward-compatible parsing + +The `strP` parser handles two field name generations: old format `sndSecure=` (boolean, mapping `True` → `QMMessaging`, `False` → `QMContact`) and new format `queue_mode=`. Missing queue mode defaults to `Nothing` with the comment "unknown queue mode, we cannot imply that it is contact address." `EntityActive` status is implicit — not written to the log, and parsed as default when `status=` is absent. + +## openReadStoreLog — creates file if missing + +`openReadStoreLog` creates an empty file if it doesn't exist. Callers never need to handle "file not found." + +## foldLogLines — EOF flag for batching + +The `action` callback receives a `Bool` indicating whether the current line is the last one. This allows consumers (like `readQueueStore`) to batch operations and flush only on the final line. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md b/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md new file mode 100644 index 000000000..c6fd7e745 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Server.StoreLog.ReadWrite + +> Store log replay (read) and snapshot (write) for STM queue store. + +**Source**: [`ReadWrite.hs`](../../../../../../src/Simplex/Messaging/Server/StoreLog/ReadWrite.hs) + +## readQueueStore — error-tolerant replay + +Log replay (`readQueueStore`) processes each line independently. Parse errors are printed to stdout and skipped. Operation errors (e.g., queue not found during `SecureQueue` replay) are logged and skipped. A deleted queue encountered during replay (`queueRec` is `Nothing`) logs a warning but does not fail. This means a corrupted log line only loses that single operation, not the entire store. + +## NewService ID validation + +During replay, `getCreateService` may return a different `serviceId` than the one stored in the log (if the service cert already exists with a different ID). This is logged as an error but does not abort replay — the store continues with the ID it assigned. This handles the case where a store log was manually edited or partially corrupted. + +## writeQueueStore — services before queues + +`writeQueueStore` writes services first, then queues. Order matters: when the log is replayed, service IDs must already exist before queues reference them via `rcvServiceId`/`ntfServiceId`. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md b/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md new file mode 100644 index 000000000..491815282 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.StoreLog.Types + +> GADT wrapper for file handles with type-level IOMode enforcement. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/StoreLog/Types.hs) + +No non-obvious behavior. See source. Constructors are intentionally not exported — callers must use `openWriteStoreLog`/`openReadStoreLog`. diff --git a/spec/modules/Simplex/Messaging/Server/Web.md b/spec/modules/Simplex/Messaging/Server/Web.md new file mode 100644 index 000000000..716845aa5 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Web.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Server.Web + +> Static site generation, serving (HTTP, HTTPS, HTTP/2), and template rendering for the server info page. + +**Source**: [`Web.hs`](../../../../../src/Simplex/Messaging/Server/Web.hs) + +## attachStaticFiles — reusing Warp internals for TLS connections + +`attachStaticFiles` receives already-established TLS connections (which passed TLS handshake and ALPN check in the SMP transport layer) and runs Warp's HTTP handler on them. It manually calls `WI.withII`, `WT.attachConn`, `WI.registerKillThread`, and `WI.serveConnection` — internal Warp APIs. This couples the server to Warp internals and could break on Warp library updates. + +## serveStaticPageH2 — path traversal protection + +The H2 static file server uses `canonicalizePath` to resolve symlinks and `..` components, then checks the resolved path is a prefix of `canonicalRoot`. The caller must pre-compute `canonicalRoot` via `canonicalizePath` for the check to work. Without pre-canonicalization, a symlink in the root itself could defeat the protection. + +## .well-known path rewriting + +Both WAI (`changeWellKnownPath`) and H2 (`rewriteWellKnownH2`) rewrite `/.well-known/` to `/well-known/` because `staticApp` does not serve hidden directories (dot-prefixed). The generated site uses `well-known/` as the physical directory. If one rewrite path is updated without the other, the served files diverge between HTTP/1.1 and HTTP/2. + +## section_ / item_ — template rendering + +`render` applies substitutions to HTML templates using `...` section markers and `${label}` item markers. When a substitution value is `Nothing`, the entire section (including content between markers) is removed. `section_` recurses to handle multiple occurrences of the same section. `item_` is a simple find-and-replace. The section end marker is mandatory — a missing end marker calls `error` (crashes). diff --git a/src/Simplex/Messaging/Server.hs b/src/Simplex/Messaging/Server.hs index 3d977dc8c..bce73d338 100644 --- a/src/Simplex/Messaging/Server.hs +++ b/src/Simplex/Messaging/Server.hs @@ -247,6 +247,7 @@ smpServer started cfg@ServerConfig {transports, transportConfig = tCfg, startOpt closeServer :: M s () closeServer = asks (smpAgent . proxyAgent) >>= liftIO . closeSMPClientAgent + -- spec: spec/modules/Simplex/Messaging/Server.md#serverthread--subscription-lifecycle-with-split-stm serverThread :: forall sub. String -> Server s -> @@ -1223,6 +1224,7 @@ disconnectTransport THandle {connection, params = THandleParams {sessionId}} rcv data VerificationResult s = VRVerified (Maybe (StoreQueue s, QueueRec)) | VRFailed ErrorType +-- spec: spec/modules/Simplex/Messaging/Server.md#constant-time-authorization--dummy-keys -- This function verifies queue command authorization, with the objective to have constant time between the three AUTH error scenarios: -- - the queue and party key exist, and the provided authorization has type matching queue key, but it is made with the different key. -- - the queue and party key exist, but the provided authorization has incorrect type. @@ -1982,6 +1984,7 @@ client -- If the queue is not full, then the thread is created where these checks are made: -- - it is the same subscribed client (in case it was reconnected it would receive message via SUB command) -- - nothing was delivered to this subscription (to avoid race conditions with the recipient). + -- spec: spec/modules/Simplex/Messaging/Server.md#trydelivermessage--syncasync-split-delivery tryDeliverMessage :: Message -> IO () tryDeliverMessage msg = -- the subscribed client var is read outside of STM to avoid transaction cost @@ -2063,6 +2066,7 @@ client encNMsgMeta = C.cbEncrypt rcvNtfDhSecret ntfNonce (smpEncode msgMeta) 128 pure $ MsgNtf {ntfMsgId = msgId, ntfTs = msgTs, ntfNonce, ntfEncMeta = fromRight "" encNMsgMeta} + -- spec: spec/modules/Simplex/Messaging/Server.md#proxy-forwarding--single-transmission-no-service-identity processForwardedCommand :: EncFwdTransmission -> M s BrokerMsg processForwardedCommand (EncFwdTransmission s) = fmap (either ERR RRES) . runExceptT $ do THAuthServer {serverPrivKey, sessSecret'} <- maybe (throwE $ transportErr TENoServerAuth) pure (thAuth thParams') diff --git a/src/Simplex/Messaging/Server/Env/STM.hs b/src/Simplex/Messaging/Server/Env/STM.hs index 574111c15..b4a275922 100644 --- a/src/Simplex/Messaging/Server/Env/STM.hs +++ b/src/Simplex/Messaging/Server/Env/STM.hs @@ -368,6 +368,7 @@ data ServerSubscribers s = ServerSubscribers pendingEvents :: TVar (IntMap (NonEmpty (EntityId, BrokerMsg))) } +-- spec: spec/modules/Simplex/Messaging/Server/Env/STM.md#subscribedclients--tvar-of-maybe-pattern -- not exported, to prevent accidental concurrent Map lookups inside STM transactions. -- Map stores TVars with pointers to the clients rather than client ID to allow reading the same TVar -- inside transactions to ensure that transaction is re-evaluated in case subscriber changes. diff --git a/src/Simplex/Messaging/Server/MsgStore/Postgres.hs b/src/Simplex/Messaging/Server/MsgStore/Postgres.hs index edf7f481c..1617c1c91 100644 --- a/src/Simplex/Messaging/Server/MsgStore/Postgres.hs +++ b/src/Simplex/Messaging/Server/MsgStore/Postgres.hs @@ -76,6 +76,7 @@ data PostgresQueue = PostgresQueue queueRec' :: TVar (Maybe QueueRec) } +-- spec: spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md#msgqueue-is-unit-type instance StoreQueueClass PostgresQueue where recipientId = recipientId' {-# INLINE recipientId #-} diff --git a/src/Simplex/Messaging/Server/MsgStore/Types.hs b/src/Simplex/Messaging/Server/MsgStore/Types.hs index acb661a40..12566ec2f 100644 --- a/src/Simplex/Messaging/Server/MsgStore/Types.hs +++ b/src/Simplex/Messaging/Server/MsgStore/Types.hs @@ -49,6 +49,7 @@ import Simplex.Messaging.Server.QueueStore import Simplex.Messaging.Server.QueueStore.Types import Simplex.Messaging.Util ((<$$>), ($>>=)) +-- spec: spec/modules/Simplex/Messaging/Server/MsgStore/Types.md#injective-type-families--unambiguous-type-resolution class (Monad (StoreMonad s), QueueStoreClass (StoreQueue s) (QueueStore s)) => MsgStoreClass s where type StoreMonad s = (m :: Type -> Type) | m -> s type MsgStoreConfig s = c | c -> s diff --git a/src/Simplex/Messaging/Server/QueueStore/Postgres.hs b/src/Simplex/Messaging/Server/QueueStore/Postgres.hs index a8c8c040a..bba58e35b 100644 --- a/src/Simplex/Messaging/Server/QueueStore/Postgres.hs +++ b/src/Simplex/Messaging/Server/QueueStore/Postgres.hs @@ -169,6 +169,7 @@ instance StoreQueueClass q => QueueStoreClass q (PostgresQueueStore q) where (SRMessaging, SRNotifier) pure EntityCounts {queueCount, notifierCount, rcvServiceCount, ntfServiceCount, rcvServiceQueuesCount, ntfServiceQueuesCount} + -- spec: spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md#addqueue_--no-in-memory-duplicate-check-relies-on-db-constraint -- this implementation assumes that the lock is already taken by addQueue -- and relies on unique constraints in the database to prevent duplicate IDs. addQueue_ :: PostgresQueueStore q -> (RecipientId -> QueueRec -> IO q) -> RecipientId -> QueueRec -> IO (Either ErrorType q) diff --git a/src/Simplex/Messaging/Server/QueueStore/STM.hs b/src/Simplex/Messaging/Server/QueueStore/STM.hs index 3a236076c..110a9cd33 100644 --- a/src/Simplex/Messaging/Server/QueueStore/STM.hs +++ b/src/Simplex/Messaging/Server/QueueStore/STM.hs @@ -116,6 +116,7 @@ instance StoreQueueClass q => QueueStoreClass q (STMQueueStore q) where serviceCount role = M.foldl' (\ !n s -> if serviceRole (serviceRec s) == role then n + 1 else n) 0 serviceQueuesCount serviceSel = foldM (\n s -> (n +) . S.size . fst <$> readTVarIO (serviceSel s)) 0 + -- spec: spec/modules/Simplex/Messaging/Server/QueueStore/STM.md#addqueue_--atomic-multi-id-duplicate-check addQueue_ :: STMQueueStore q -> (RecipientId -> QueueRec -> IO q) -> RecipientId -> QueueRec -> IO (Either ErrorType q) addQueue_ st mkQ rId qr@QueueRec {senderId = sId, notifier, queueData, rcvServiceId} = do sq <- mkQ rId qr diff --git a/src/Simplex/Messaging/Server/StoreLog.hs b/src/Simplex/Messaging/Server/StoreLog.hs index 4ceb3cddd..8c69b4063 100644 --- a/src/Simplex/Messaging/Server/StoreLog.hs +++ b/src/Simplex/Messaging/Server/StoreLog.hs @@ -96,6 +96,7 @@ data SLRTag | NewService_ | QueueService_ +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#queuerec-strencoding--backward-compatible-parsing instance StrEncoding QueueRec where strEncode QueueRec {recipientKeys, rcvDhSecret, rcvServiceId, senderId, senderKey, queueMode, queueData, notifier, status, updatedAt} = B.concat @@ -242,6 +243,7 @@ closeStoreLog = \case where close_ h = hClose h `catchAny` \e -> logError ("STORE: closeStoreLog, error closing, " <> tshow e) +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#writestorelogrecord--atomicity-via-manual-write writeStoreLogRecord :: StrEncoding r => StoreLog 'WriteMode -> r -> IO () writeStoreLogRecord (WriteStoreLog _ h) r = E.uninterruptibleMask_ $ do B.hPut h $ strEncode r `B.snoc` '\n' -- hPutStrLn makes write non-atomic for length > 1024 @@ -289,6 +291,7 @@ logNewService s = writeStoreLogRecord s . NewService logQueueService :: (PartyI p, ServiceParty p) => StoreLog 'WriteMode -> RecipientId -> SParty p -> Maybe ServiceId -> IO () logQueueService s rId party = writeStoreLogRecord s . QueueService rId (ASP party) +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#readwritestorelog--crash-recovery-state-machine readWriteStoreLog :: (FilePath -> s -> IO ()) -> (StoreLog 'WriteMode -> s -> IO ()) -> FilePath -> s -> IO (StoreLog 'WriteMode) readWriteStoreLog readStore writeStore f st = ifM From f7be44981a3685b6ebc9c73cc1de2d9fd588a4b4 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Thu, 12 Mar 2026 11:29:18 +0000 Subject: [PATCH 15/61] SMP router specs --- spec/modules/README.md | 8 ++++++ spec/modules/Simplex/Messaging/Client.md | 14 +++++----- .../modules/Simplex/Messaging/Client/Agent.md | 14 +++++----- .../Simplex/Messaging/Crypto/ShortLink.md | 4 +-- .../Messaging/Notifications/Transport.md | 2 +- spec/modules/Simplex/Messaging/Protocol.md | 8 +++--- spec/modules/Simplex/Messaging/Server.md | 8 +++--- spec/modules/Simplex/Messaging/Server/CLI.md | 4 +-- .../Simplex/Messaging/Server/Control.md | 2 +- .../Simplex/Messaging/Server/Env/STM.md | 8 +++--- .../Simplex/Messaging/Server/Information.md | 2 +- spec/modules/Simplex/Messaging/Server/Main.md | 4 +-- .../Simplex/Messaging/Server/Main/Init.md | 2 +- .../Messaging/Server/MsgStore/Postgres.md | 2 +- .../Messaging/Server/MsgStore/Types.md | 4 +-- .../Simplex/Messaging/Server/NtfStore.md | 2 +- .../Simplex/Messaging/Server/Prometheus.md | 2 +- .../Messaging/Server/QueueStore/STM.md | 2 +- .../modules/Simplex/Messaging/Server/Stats.md | 4 +-- .../Simplex/Messaging/Server/StoreLog.md | 4 +-- spec/modules/Simplex/Messaging/Server/Web.md | 4 +-- spec/modules/Simplex/Messaging/Transport.md | 26 +++++++++---------- .../Simplex/Messaging/Transport/Server.md | 6 ++--- .../Simplex/Messaging/Transport/Shared.md | 6 ++--- 24 files changed, 75 insertions(+), 67 deletions(-) diff --git a/spec/modules/README.md b/spec/modules/README.md index 7b7666aad..0a961abdf 100644 --- a/spec/modules/README.md +++ b/spec/modules/README.md @@ -182,6 +182,14 @@ Before finishing a module doc, ask: If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes. +## Terminology — the spec as translation boundary + +The protocol documents (`protocol/overview-tjr.md`, `protocol/simplex-messaging.md`, `protocol/agent-protocol.md`) define the canonical terminology. Code uses different names for some of the same concepts. The spec is where the translation happens. + +The most important distinction: SimpleX protocol routers are referred to as "servers" in code. The term "server" was adopted historically because SimpleX routers were implemented as Linux-based software that is deployed in the same way as servers. But the similarity is entirely formal. Functionally, servers serve responses to the requests of their users - that is why the term "server" was adopted for computers and software that provide Internet services. SimpleX protocol routers don't serve responses - they route packets between endpoints, and they have no concept of a user. Functionally they are similar to Internet Protocol routers, but with a resource-based addressing scheme. Further, SimpleX protocol routers are hardware and software agnostic. SimpleX protocols are open and documented, so they can be implemented in any language and run on a different architecture. For example, [SimpleGo](https://simplego.dev) is a prototype implementation of the SimpleX protocol stack in C for a microcontroller architecture. + +**The rule**: use protocol terms for concepts, code terms for identifiers. Write "router" when describing the network node's role, `SMPServer` or `Server.hs` when referencing code. Similarly, "router identity" for the concept (called "server key hash" or "fingerprint" in code). When the distinction matters, bridge explicitly: "the SMP router (implemented by the `Server` module)" or "the `SMPServer` type (representing a router address)." + ## Exclusions - **Individual migration files** (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs. diff --git a/spec/modules/Simplex/Messaging/Client.md b/spec/modules/Simplex/Messaging/Client.md index a4f7be352..35fee9226 100644 --- a/spec/modules/Simplex/Messaging/Client.md +++ b/spec/modules/Simplex/Messaging/Client.md @@ -8,7 +8,7 @@ ## Overview -This module implements the client side of the `Protocol` typeclass — connecting to servers, sending commands, receiving responses, and managing connection lifecycle. It is generic over `Protocol v err msg`, instantiated for SMP as `SMPClient` (= `ProtocolClient SMPVersion ErrorType BrokerMsg`). The SMP proxy protocol (PRXY/PFWD/RFWD) is also implemented here. +This module implements the client side of the `Protocol` typeclass — connecting to SMP routers, sending commands, receiving responses, and managing connection lifecycle. It is generic over `Protocol v err msg`, instantiated for SMP as `SMPClient` (= `ProtocolClient SMPVersion ErrorType BrokerMsg`). The SMP proxy protocol (PRXY/PFWD/RFWD) is also implemented here. ## Four concurrent threads — teardown semantics @@ -36,9 +36,9 @@ The double-check pattern (`swapTVar pending False` + `tryTakeTMVar`) handles the `timeoutErrorCount` is reset to 0 in three places: in `getResponse` when a response arrives, in `receive` on every TLS read, and the monitor uses this count to decide when to drop the connection. -## processMsg — server events vs expired responses +## processMsg — router events vs expired responses -When `corrId` is empty, the message is an `STEvent` (server-initiated). When non-empty and the request was already expired (`wasPending` is `False`), the response becomes `STResponse` — not discarded, but forwarded to `msgQ` with the original command context. Entity ID mismatch is `STUnexpectedError`. +When `corrId` is empty, the message is an `STEvent` (router-initiated). When non-empty and the request was already expired (`wasPending` is `False`), the response becomes `STResponse` — not discarded, but forwarded to `msgQ` with the original command context. Entity ID mismatch is `STUnexpectedError`. ## nonBlockingWriteTBQueue — fork on full @@ -46,7 +46,7 @@ If `tryWriteTBQueue` returns `False`, a new thread is forked for the blocking wr ## Batch commands do not expire -See comment on `sendBatch`. Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands use `Just r` and the send thread checks `pending` after dequeue. The coupling: if the server stops responding, batched commands can block the send queue indefinitely since they have no timeout-based expiry. +See comment on `sendBatch`. Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands use `Just r` and the send thread checks `pending` after dequeue. The coupling: if the router stops responding, batched commands can block the send queue indefinitely since they have no timeout-based expiry. ## monitor — quasi-periodic adaptive ping @@ -68,7 +68,7 @@ See comment above `proxySMPCommand` for the 9 error scenarios (0-9) mapping each ## forwardSMPTransmission — proxy-side forwarding -Used by the proxy server to forward `RFWD` to the destination relay. Uses `cbEncryptNoPad`/`cbDecryptNoPad` (no padding) with the session secret from the proxy-relay connection. Response nonce is `reverseNonce` of the request nonce. +Used by the proxy router to forward `RFWD` to the destination relay. Uses `cbEncryptNoPad`/`cbDecryptNoPad` (no padding) with the session secret from the proxy-relay connection. Response nonce is `reverseNonce` of the request nonce. ## authTransmission — dual auth with service signature @@ -80,6 +80,6 @@ The service signature is only added when the entity authenticator is non-empty. `action` stores a `Weak ThreadId` (via `mkWeakThreadId`) to the main client thread. `closeProtocolClient` dereferences and kills it. The weak reference allows the thread to be garbage collected if all other references are dropped. -## writeSMPMessage — server-side event injection +## writeSMPMessage — router-side event injection -`writeSMPMessage` writes directly to `msgQ` as `STEvent`, bypassing the entire command/response pipeline. This is used by the server to inject MSG events into the subscription response path. +`writeSMPMessage` writes directly to `msgQ` as `STEvent`, bypassing the entire command/response pipeline. This is used by the router to inject MSG events into the subscription response path. diff --git a/spec/modules/Simplex/Messaging/Client/Agent.md b/spec/modules/Simplex/Messaging/Client/Agent.md index 96e6ff84b..30fbe2ac2 100644 --- a/spec/modules/Simplex/Messaging/Client/Agent.md +++ b/spec/modules/Simplex/Messaging/Client/Agent.md @@ -6,9 +6,9 @@ ## Overview -This is the "small agent" — used only in servers (SMP proxy, notification server) to manage client connections to other SMP servers. The "big agent" in `Simplex.Messaging.Agent` + `Simplex.Messaging.Agent.Client` serves client applications and adds the full messaging agent layer. See [Two agent layers](../../../../TOPICS.md) topic. +This is the "small agent" — used only in routers (SMP proxy, notification router) to manage client connections to other SMP routers. The "big agent" in `Simplex.Messaging.Agent` + `Simplex.Messaging.Agent.Client` serves client applications and adds the full messaging agent layer. See [Two agent layers](../../../../TOPICS.md) topic. -`SMPClientAgent` manages `SMPClient` connections via `smpClients :: TMap SMPServer SMPClientVar` (one per SMP server), tracks active and pending subscriptions, and handles automatic reconnection. It is parameterized by `Party` (`p`) and uses the `ServiceParty` constraint to support both `RecipientService` and `NotifierService` modes. +`SMPClientAgent` manages `SMPClient` connections via `smpClients :: TMap SMPServer SMPClientVar` (one per router), tracks active and pending subscriptions, and handles automatic reconnection. It is parameterized by `Party` (`p`) and uses the `ServiceParty` constraint to support both `RecipientService` and `NotifierService` modes. ## Dual subscription model @@ -19,7 +19,7 @@ Four TMap fields track subscriptions in two dimensions: | **Service** | `activeServiceSubs` (TMap SMPServer (TVar (Maybe (ServiceSub, SessionId)))) | `pendingServiceSubs` (TMap SMPServer (TVar (Maybe ServiceSub))) | | **Queue** | `activeQueueSubs` (TMap SMPServer (TMap QueueId (SessionId, C.APrivateAuthKey))) | `pendingQueueSubs` (TMap SMPServer (TMap QueueId C.APrivateAuthKey)) | -See comments on `activeServiceSubs` and `pendingServiceSubs` for the coexistence rules. Key constraint: only one service subscription per server. Active subs store the `SessionId` that established them. +See comments on `activeServiceSubs` and `pendingServiceSubs` for the coexistence rules. Key constraint: only one service subscription per router. Active subs store the `SessionId` that established them. ## SessionVar compare-and-swap — core concurrency safety @@ -27,11 +27,11 @@ See comments on `activeServiceSubs` and `pendingServiceSubs` for the coexistence ## removeClientAndSubs — outside-STM lookup optimization -See comment on `removeClientAndSubs`. Subscription TVar references are obtained outside STM (via `TM.lookupIO`), then modified inside `atomically`. This is safe because the invariant is that subscription TVar entries for a server are never deleted from the outer TMap, only their contents change. Moving lookups inside the STM transaction would cause excessive re-evaluation under contention. +See comment on `removeClientAndSubs`. Subscription TVar references are obtained outside STM (via `TM.lookupIO`), then modified inside `atomically`. This is safe because the invariant is that subscription TVar entries for a router are never deleted from the outer TMap, only their contents change. Moving lookups inside the STM transaction would cause excessive re-evaluation under contention. ## Disconnect preserves others' subscriptions -`updateServiceSub` only moves active→pending when `sessId` matches the disconnected client (see its comment). If a new client already established different subscriptions on the same server, those are preserved. Queue subs use `M.partition` to split by SessionId — only matching subs move to pending, non-matching remain active. +`updateServiceSub` only moves active→pending when `sessId` matches the disconnected client (see its comment). If a new client already established different subscriptions on the same router, those are preserved. Queue subs use `M.partition` to split by SessionId — only matching subs move to pending, non-matching remain active. ## Pending never reset to Nothing on disconnect @@ -63,7 +63,7 @@ When serviceId and sessionId match the existing active subscription, queue count ## CAServiceUnavailable — cascade to queue resubscription -When `smpSubscribeService` detects service ID or role mismatch with the connection, it fires `CAServiceUnavailable`. See comment on `CAServiceUnavailable` for the full implication: the app must resubscribe all queues individually, creating new associations. This can happen if the SMP server reassigns service IDs (e.g., after downgrade and upgrade). +When `smpSubscribeService` detects service ID or role mismatch with the connection, it fires `CAServiceUnavailable`. See comment on `CAServiceUnavailable` for the full implication: the app must resubscribe all queues individually, creating new associations. This can happen if the SMP router reassigns service IDs (e.g., after downgrade and upgrade). ## getPending — polymorphic over STM/IO @@ -89,4 +89,4 @@ During reconnection, `reconnectSMPClient` reads current active queue subs (outsi ## addSubs_ — left-biased union -`addSubs_` uses `TM.union` which delegates to `M.union` (left-biased). If a queue subscription already exists, the new auth key from the incoming map wins. Service subs use `writeTVar` (overwrite) since only one service sub exists per server. +`addSubs_` uses `TM.union` which delegates to `M.union` (left-biased). If a queue subscription already exists, the new auth key from the incoming map wins. Service subs use `writeTVar` (overwrite) since only one service sub exists per router. diff --git a/spec/modules/Simplex/Messaging/Crypto/ShortLink.md b/spec/modules/Simplex/Messaging/Crypto/ShortLink.md index 821a30c32..5b83de31a 100644 --- a/spec/modules/Simplex/Messaging/Crypto/ShortLink.md +++ b/spec/modules/Simplex/Messaging/Crypto/ShortLink.md @@ -12,8 +12,8 @@ Short links encode connection data in two encrypted blobs: fixed data (2048 byte Two distinct HKDF derivations with different info strings: -- **contactShortLinkKdf**: `HKDF("", linkKey, "SimpleXContactLink", 56)` → splits into 24-byte LinkId + 32-byte SbKey. The LinkId is used as the server-side identifier. -- **invShortLinkKdf**: `HKDF("", linkKey, "SimpleXInvLink", 32)` → 32-byte SbKey only. No LinkId because invitation links don't use server-side lookup. +- **contactShortLinkKdf**: `HKDF("", linkKey, "SimpleXContactLink", 56)` → splits into 24-byte LinkId + 32-byte SbKey. The LinkId is used as the router-side identifier. +- **invShortLinkKdf**: `HKDF("", linkKey, "SimpleXInvLink", 32)` → 32-byte SbKey only. No LinkId because invitation links don't use router-side lookup. ## Fixed padding lengths diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md index dd4564738..7c7955154 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Transport.md +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -29,7 +29,7 @@ The NTF protocol reuses SMP's transport infrastructure but with reduced paramete ## Same ALPN/legacy fallback pattern as SMP -`ntfServerHandshake` uses the same pattern as `smpServerHandshake`: if ALPN is not negotiated (`getSessionALPN` returns `Nothing`), the server offers only `legacyServerNTFVRange` (v1 only). +`ntfServerHandshake` uses the same pattern as `smpServerHandshake`: if ALPN is not negotiated (`getSessionALPN` returns `Nothing`), the notification router offers only `legacyServerNTFVRange` (v1 only). ## NTF handshake uses SMP shared types diff --git a/spec/modules/Simplex/Messaging/Protocol.md b/spec/modules/Simplex/Messaging/Protocol.md index dc1328cdf..2ed7113c8 100644 --- a/spec/modules/Simplex/Messaging/Protocol.md +++ b/spec/modules/Simplex/Messaging/Protocol.md @@ -8,11 +8,11 @@ ## Overview -This module defines the SMP protocol's type-level structure, wire encoding, and transport batching. It does not implement the server or client — those are in [Server.hs](./Server.md) and [Client.hs](./Client.md). The protocol spec governs the command semantics; this doc focuses on non-obvious implementation choices. +This module defines the SMP protocol's type-level structure, wire encoding, and transport batching. It does not implement the router or client — those are in [Server.hs](./Server.md) and [Client.hs](./Client.md). The protocol spec governs the command semantics; this doc focuses on non-obvious implementation choices. ## Two separate version scopes -SMP client protocol version (`SMPClientVersion`, 4 versions) is separate from SMP relay protocol version (`SMPVersion`, up to version 19, defined in [Transport.hs](./Transport.md)). The client version governs client-to-client concerns (binary encoding, multi-host addresses, SKEY command, short links). The relay version governs client-to-server wire format, transport encryption, and command availability. See comment above `SMPClientVersion` data declaration for version history. +SMP client protocol version (`SMPClientVersion`, 4 versions) is separate from SMP relay protocol version (`SMPVersion`, up to version 19, defined in [Transport.hs](./Transport.md)). The client version governs client-to-client concerns (binary encoding, multi-host addresses, SKEY command, short links). The relay version governs client-to-router wire format, transport encryption, and command availability. See comment above `SMPClientVersion` data declaration for version history. ## maxMessageLength — version-dependent @@ -57,7 +57,7 @@ The `MsgFlags` parser consumes the `notification` Bool then calls `A.takeTill (= ## BrokerErrorType NETWORK — detail loss -The `NETWORK` variant of `BrokerErrorType` encodes as just `"NETWORK"` (detail dropped), with `TODO once all upgrade` comment. The parser falls back to `NEFailedError` when the `NetworkError` detail can't be parsed (`_smpP <|> pure NEFailedError`). This means a newer server's detailed network error is seen as `NEFailedError` by older clients. +The `NETWORK` variant of `BrokerErrorType` encodes as just `"NETWORK"` (detail dropped), with `TODO once all upgrade` comment. The parser falls back to `NEFailedError` when the `NetworkError` detail can't be parsed (`_smpP <|> pure NEFailedError`). This means a newer router's detailed network error is seen as `NEFailedError` by older clients. ## Version-dependent encoding — scope @@ -65,4 +65,4 @@ The `NETWORK` variant of `BrokerErrorType` encodes as just `"NETWORK"` (detail d ## SUBS/NSUBS — asymmetric defaulting -When the server parses `SUBS`/`NSUBS` from a client using a version older than `rcvServiceSMPVersion`, both count and hash default (`-1` and `mempty`). For the response side (`SOKS`/`ENDS` via `serviceRespP`), count is still parsed from the wire — only hash defaults to `mempty`. This asymmetry means command-side and response-side parsing have different fallback behavior for the same version boundary. +When the router parses `SUBS`/`NSUBS` from a client using a version older than `rcvServiceSMPVersion`, both count and hash default (`-1` and `mempty`). For the response side (`SOKS`/`ENDS` via `serviceRespP`), count is still parsed from the wire — only hash defaults to `mempty`. This asymmetry means command-side and response-side parsing have different fallback behavior for the same version boundary. diff --git a/spec/modules/Simplex/Messaging/Server.md b/spec/modules/Simplex/Messaging/Server.md index 0ed6e43e1..8d23404c9 100644 --- a/spec/modules/Simplex/Messaging/Server.md +++ b/spec/modules/Simplex/Messaging/Server.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server -> SMP server: client handling, subscription lifecycle, message delivery, proxy forwarding, control port. +> SMP router (`Server` module): client handling, subscription lifecycle, message delivery, proxy forwarding, control port. **Source**: [`Server.hs`](../../../../src/Simplex/Messaging/Server.hs) @@ -8,7 +8,7 @@ ## Overview -The server runs as `raceAny_` over many threads — any thread exit stops the entire server. The thread set includes: one `serverThread` per subscription type (SMP, NTF), a notification delivery thread, a pending events thread, a proxy agent receiver, a SIGINT handler, plus per-transport listener threads and optional expiration/stats/prometheus/control-port threads. `E.finally` ensures `stopServer` runs on any exit. +The router runs as `raceAny_` over many threads — any thread exit stops the entire router process. The thread set includes: one `serverThread` per subscription type (SMP, NTF), a notification delivery thread, a pending events thread, a proxy agent receiver, a SIGINT handler, plus per-transport listener threads and optional expiration/stats/prometheus/control-port threads. `E.finally` ensures `stopServer` runs on any exit. ## serverThread — subscription lifecycle with split STM @@ -51,7 +51,7 @@ When the signature algorithm doesn't match the queue key, verification runs with ## Service subscription — hash-based drift detection -See comment on `sharedSubscribeService`. The client sends expected `(count, idsHash)`. The server reads the actual values from storage, then computes `subsChange = subtractServiceSubs currSubs subs'` — the **difference** between what the client's session currently tracks and the new values. This difference (not the absolute values) is passed to `serverThread` via `CSService` to adjust `totalServiceSubs`. Using differences prevents double-counting when a service resubscribes. +See comment on `sharedSubscribeService`. The client sends expected `(count, idsHash)`. The router reads the actual values from storage, then computes `subsChange = subtractServiceSubs currSubs subs'` — the **difference** between what the client's session currently tracks and the new values. This difference (not the absolute values) is passed to `serverThread` via `CSService` to adjust `totalServiceSubs`. Using differences prevents double-counting when a service resubscribes. Stats classification: exactly one of `srvSubOk`/`srvSubMore`/`srvSubFewer`/`srvSubDiff` is incremented per subscription. `count == -1` is a special case for old NTF servers. @@ -91,7 +91,7 @@ See `noSubscriptions`. The idle client disconnect thread only checks expiration ## clientDisconnected — ordered cleanup -On disconnect: (1) set `connected = False`, (2) atomically swap out all subscriptions, (3) cancel subscription threads, (4) if server is still active: delete client from server map, update queue and service subscribers. Service subscription cleanup (`updateServiceSubs`) subtracts the client's accumulated `(count, idsHash)` from `totalServiceSubs`. End threads are swapped out and killed. +On disconnect: (1) set `connected = False`, (2) atomically swap out all subscriptions, (3) cancel subscription threads, (4) if router is still active: delete client from `serverClients` map, update queue and service subscribers. Service subscription cleanup (`updateServiceSubs`) subtracts the client's accumulated `(count, idsHash)` from `totalServiceSubs`. End threads are swapped out and killed. ## Control port — single auth, no downgrade diff --git a/spec/modules/Simplex/Messaging/Server/CLI.md b/spec/modules/Simplex/Messaging/Server/CLI.md index a369b6981..5747eba8f 100644 --- a/spec/modules/Simplex/Messaging/Server/CLI.md +++ b/spec/modules/Simplex/Messaging/Server/CLI.md @@ -16,7 +16,7 @@ SMP ports are parsed first. When explicit WebSocket ports are provided, they are ## iniDBOptions — schema creation disabled at CLI -When reading database options from INI, `createSchema` is always set to `False` regardless of INI content. This enforces a security invariant: database schemas must be created manually or by migration, never automatically by the server. +When reading database options from INI, `createSchema` is always set to `False` regardless of INI content. This enforces a security invariant: database schemas must be created manually or by migration, never automatically by the router. ## createServerX509_ — external tool dependency @@ -24,7 +24,7 @@ Certificate generation shells out to `openssl` commands via `readCreateProcess`, ## checkSavedFingerprint — startup invariant -Fingerprint is extracted from the CA certificate and saved during init. On every server start, the saved fingerprint is compared against the current certificate. Mismatch → startup failure. See [Main.md#initializeserver--fingerprint-invariant](./Main.md#initializeserver--fingerprint-invariant). +Fingerprint is extracted from the CA certificate and saved during init. On every router start, the saved fingerprint is compared against the current certificate. Mismatch → startup failure. See [Main.md#initializeserver--fingerprint-invariant](./Main.md#initializeserver--fingerprint-invariant). ## genOnline — existing certificate dependency diff --git a/spec/modules/Simplex/Messaging/Server/Control.md b/spec/modules/Simplex/Messaging/Server/Control.md index 644fb786a..ddeedff3a 100644 --- a/spec/modules/Simplex/Messaging/Server/Control.md +++ b/spec/modules/Simplex/Messaging/Server/Control.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.Control -> Control port protocol types and encoding for server administration. +> Control port protocol types and encoding for router administration. **Source**: [`Control.hs`](../../../../../src/Simplex/Messaging/Server/Control.hs) diff --git a/spec/modules/Simplex/Messaging/Server/Env/STM.md b/spec/modules/Simplex/Messaging/Server/Env/STM.md index d0e948120..3d990a47d 100644 --- a/spec/modules/Simplex/Messaging/Server/Env/STM.md +++ b/spec/modules/Simplex/Messaging/Server/Env/STM.md @@ -1,12 +1,12 @@ # Simplex.Messaging.Server.Env.STM -> Server environment, configuration, client state, subscription types, and storage initialization. +> Router environment, configuration, client state, subscription types, and storage initialization. **Source**: [`Env/STM.hs`](../../../../../../src/Simplex/Messaging/Server/Env/STM.hs) ## Overview -This module defines the server's shared state (`Env`, `Server`, `Client`) and the subscription model types. Most non-obvious patterns are about concurrency safety — preventing STM contention while maintaining consistency. Key patterns are documented in [Server.md](../Server.md) where they're used; this doc covers patterns specific to the type definitions and initialization. +This module defines the router's shared state (`Env`, `Server`, `Client`) and the subscription model types. Most non-obvious patterns are about concurrency safety — preventing STM contention while maintaining consistency. Key patterns are documented in [Server.md](../Server.md) where they're used; this doc covers patterns specific to the type definitions and initialization. ## SubscribedClients — TVar-of-Maybe pattern @@ -26,7 +26,7 @@ See comment on `deleteSubcribedClient`. The TVar lookup is in a separate IO read ## insertServerClient — connected check -`insertServerClient` checks `connected` inside the STM transaction before inserting. If the client was already marked disconnected (race with cleanup), the insert is skipped and returns `False`. This prevents resurrecting a disconnected client in the server map. +`insertServerClient` checks `connected` inside the STM transaction before inserting. If the client was already marked disconnected (race with cleanup), the insert is skipped and returns `False`. This prevents resurrecting a disconnected client in the `serverClients` map. ## SupportedStore — compile-time storage validation @@ -34,7 +34,7 @@ Type family with `(Int ~ Bool, TypeError ...)` for invalid combinations. The uns ## newEnv — initialization order -Store initialization order matters: (1) create message store (loads store log for STM backends), (2) create notification store (empty TMap), (3) generate TLS credentials, (4) compute server identity from fingerprint, (5) create stats, (6) create proxy agent. The store log load (`loadStoreLog`) calls `readWriteQueueStore` which reads the existing log, replays it to build state, then opens a new log for writing. `setStoreLog` attaches the write log to the store. +Store initialization order matters: (1) create message store (loads store log for STM backends), (2) create notification store (empty TMap), (3) generate TLS credentials, (4) compute router identity from fingerprint, (5) create stats, (6) create proxy agent. The store log load (`loadStoreLog`) calls `readWriteQueueStore` which reads the existing log, replays it to build state, then opens a new log for writing. `setStoreLog` attaches the write log to the store. HTTPS credentials are validated: must be at least 4096-bit RSA (`public_size >= 512` bytes). The check explicitly notes that Let's Encrypt ECDSA uses "insecure curve p256." diff --git a/spec/modules/Simplex/Messaging/Server/Information.md b/spec/modules/Simplex/Messaging/Server/Information.md index 16f153154..a2efa040c 100644 --- a/spec/modules/Simplex/Messaging/Server/Information.md +++ b/spec/modules/Simplex/Messaging/Server/Information.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.Information -> Server public information types (config, operator, hosting) for the server info page. +> Router public information types (config, operator, hosting) for the router info page. **Source**: [`Information.hs`](../../../../../src/Simplex/Messaging/Server/Information.hs) diff --git a/spec/modules/Simplex/Messaging/Server/Main.md b/spec/modules/Simplex/Messaging/Server/Main.md index aed538573..00483d35b 100644 --- a/spec/modules/Simplex/Messaging/Server/Main.md +++ b/spec/modules/Simplex/Messaging/Server/Main.md @@ -1,12 +1,12 @@ # Simplex.Messaging.Server.Main -> Server CLI entry point: dispatches Init, Start, Delete, Journal, and Database commands. +> Router CLI entry point: dispatches Init, Start, Delete, Journal, and Database commands. **Source**: [`Main.hs`](../../../../../src/Simplex/Messaging/Server/Main.hs) ## Overview -This is the CLI dispatcher for the SMP server. It parses INI configuration, validates storage mode combinations, and dispatches to the appropriate command handler. The most complex logic is storage configuration validation and migration between storage modes. +This is the CLI dispatcher for the SMP router. It parses INI configuration, validates storage mode combinations, and dispatches to the appropriate command handler. The most complex logic is storage configuration validation and migration between storage modes. ## Storage mode compatibility — state machine diff --git a/spec/modules/Simplex/Messaging/Server/Main/Init.md b/spec/modules/Simplex/Messaging/Server/Main/Init.md index 665938ae8..2472164d0 100644 --- a/spec/modules/Simplex/Messaging/Server/Main/Init.md +++ b/spec/modules/Simplex/Messaging/Server/Main/Init.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.Main.Init -> Server initialization: INI file content generation, default settings, and CLI option structures. +> Router initialization: INI file content generation, default settings, and CLI option structures. **Source**: [`Main/Init.hs`](../../../../../../src/Simplex/Messaging/Server/Main/Init.hs) diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md index 7262bde0a..eaeca3b90 100644 --- a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.MsgStore.Postgres -> PostgreSQL message store: server-side stored procedures for message operations, COPY protocol for bulk import. +> PostgreSQL message store: router-side stored procedures for message operations, COPY protocol for bulk import. **Source**: [`Postgres.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Postgres.hs) diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md index 2fd4c79bf..c57aedb0b 100644 --- a/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md @@ -14,7 +14,7 @@ All associated types (`StoreMonad`, `MsgQueue`, `StoreQueue`, `QueueStore`, `Msg ## tryDelPeekMsg — atomic delete-and-peek -Deletes the current message AND peeks the next one in a single `isolateQueue` call. This atomicity is critical for the ACK flow: the server needs to know if there's a next message to deliver immediately after acknowledging the current one, without a window where a concurrent SEND could interleave. +Deletes the current message AND peeks the next one in a single `isolateQueue` call. This atomicity is critical for the ACK flow: the router needs to know if there's a next message to deliver immediately after acknowledging the current one, without a window where a concurrent SEND could interleave. ## withIdleMsgQueue — journal-specific lifecycle @@ -22,7 +22,7 @@ For Journal store, the message queue file handle is closed after the action if i ## unsafeWithAllMsgQueues — CLI-only -Explicitly unsafe: iterates all queues including those not in active memory. Only safe before server start or in CLI commands. During normal operation, Journal store may have queues on disk but not loaded — this function would load them, interfering with the lazy-loading lifecycle. +Explicitly unsafe: iterates all queues including those not in active memory. Only safe before router start or in CLI commands. During normal operation, Journal store may have queues on disk but not loaded — this function would load them, interfering with the lazy-loading lifecycle. ## snapshotTQueue visibility gap diff --git a/spec/modules/Simplex/Messaging/Server/NtfStore.md b/spec/modules/Simplex/Messaging/Server/NtfStore.md index b58a44fad..fca054f8b 100644 --- a/spec/modules/Simplex/Messaging/Server/NtfStore.md +++ b/spec/modules/Simplex/Messaging/Server/NtfStore.md @@ -6,7 +6,7 @@ ## storeNtf — outside-STM lookup with STM fallback -`storeNtf` uses `TM.lookupIO` outside STM, then falls back to `TM.lookup` inside STM if the notifier entry doesn't exist. This is the same outside-STM lookup pattern used in Server.hs and Client/Agent.hs — avoids transaction re-evaluation from unrelated map changes. The double-check inside STM prevents races when two messages arrive concurrently for a new notifier. +`storeNtf` uses `TM.lookupIO` outside STM, then falls back to `TM.lookup` inside STM if the notifier entry doesn't exist. This is the same outside-STM lookup pattern used in the router (`Server.hs`) and `Client/Agent.hs` — avoids transaction re-evaluation from unrelated map changes. The double-check inside STM prevents races when two messages arrive concurrently for a new notifier. ## deleteExpiredNtfs — last-is-earliest optimization diff --git a/spec/modules/Simplex/Messaging/Server/Prometheus.md b/spec/modules/Simplex/Messaging/Server/Prometheus.md index 11610ee23..626459dd1 100644 --- a/spec/modules/Simplex/Messaging/Server/Prometheus.md +++ b/spec/modules/Simplex/Messaging/Server/Prometheus.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.Prometheus -> Prometheus text exposition format for server metrics, with histogram gap-filling and derived aggregations. +> Prometheus text exposition format for router metrics, with histogram gap-filling and derived aggregations. **Source**: [`Prometheus.hs`](../../../../../src/Simplex/Messaging/Server/Prometheus.hs) diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md index 6ff8da3b5..b7c5d0593 100644 --- a/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md @@ -30,7 +30,7 @@ Batch queue lookups (`getQueues_`) read the entire TVar map once with `readTVarI ## closeQueueStore — non-atomic shutdown -`closeQueueStore` clears TMaps in separate `atomically` calls, not one transaction. Concurrent operations during shutdown could see partially cleared state. This is acceptable because the store log is closed first, and the server should not be processing new requests during shutdown. +`closeQueueStore` clears TMaps in separate `atomically` calls, not one transaction. Concurrent operations during shutdown could see partially cleared state. This is acceptable because the store log is closed first, and the router should not be processing new requests during shutdown. ## addQueueLinkData — conditional idempotency diff --git a/spec/modules/Simplex/Messaging/Server/Stats.md b/spec/modules/Simplex/Messaging/Server/Stats.md index 056dc4a88..e17620d2d 100644 --- a/spec/modules/Simplex/Messaging/Server/Stats.md +++ b/spec/modules/Simplex/Messaging/Server/Stats.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Server.Stats -> Server statistics: counters, rolling period tracking, delivery time histograms, proxy stats, service stats. +> Router statistics: counters, rolling period tracking, delivery time histograms, proxy stats, service stats. **Source**: [`Stats.hs`](../../../../../src/Simplex/Messaging/Server/Stats.hs) @@ -36,4 +36,4 @@ In `logServerStats` (Server.hs), each counter is read and reset via `atomicSwapI ## setPeriodStats — not thread safe -See comment on `setPeriodStats`. Uses `writeIORef` (not atomic). Only safe during server startup when no other threads are running. If called concurrently, period data could be corrupted. +See comment on `setPeriodStats`. Uses `writeIORef` (not atomic). Only safe during router startup when no other threads are running. If called concurrently, period data could be corrupted. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog.md b/spec/modules/Simplex/Messaging/Server/StoreLog.md index cef1bdfb2..9c7e49127 100644 --- a/spec/modules/Simplex/Messaging/Server/StoreLog.md +++ b/spec/modules/Simplex/Messaging/Server/StoreLog.md @@ -17,11 +17,11 @@ The `.start` temp backup file provides crash recovery during compaction. The seq 3. Write compacted state to new file 4. Rename `.start` to timestamped backup, remove old backups -If the server crashes during step 3, the next startup detects `.start` and restores from it instead of the incomplete new file. Any partially-written current file is preserved as `.bak`. The comment says "do not terminate" during compaction — there is no safe interrupt point between steps 2 and 4. +If the router crashes during step 3, the next startup detects `.start` and restores from it instead of the incomplete new file. Any partially-written current file is preserved as `.bak`. The comment says "do not terminate" during compaction — there is no safe interrupt point between steps 2 and 4. ## removeStoreLogBackups — layered retention policy -Backup retention is layered: (1) keep all backups newer than 24 hours, (2) of the rest, keep at least 3, (3) of those eligible for deletion, only delete backups older than 21 days. This means a server with infrequent restarts accumulates many backups (only cleaned on startup), while a frequently-restarting server keeps a rolling window. Backup timestamps come from ISO 8601 suffixes parsed from filenames. +Backup retention is layered: (1) keep all backups newer than 24 hours, (2) of the rest, keep at least 3, (3) of those eligible for deletion, only delete backups older than 21 days. This means a router with infrequent restarts accumulates many backups (only cleaned on startup), while a frequently-restarting router keeps a rolling window. Backup timestamps come from ISO 8601 suffixes parsed from filenames. ## QueueRec StrEncoding — backward-compatible parsing diff --git a/spec/modules/Simplex/Messaging/Server/Web.md b/spec/modules/Simplex/Messaging/Server/Web.md index 716845aa5..eb8449ef8 100644 --- a/spec/modules/Simplex/Messaging/Server/Web.md +++ b/spec/modules/Simplex/Messaging/Server/Web.md @@ -1,12 +1,12 @@ # Simplex.Messaging.Server.Web -> Static site generation, serving (HTTP, HTTPS, HTTP/2), and template rendering for the server info page. +> Static site generation, serving (HTTP, HTTPS, HTTP/2), and template rendering for the router info page. **Source**: [`Web.hs`](../../../../../src/Simplex/Messaging/Server/Web.hs) ## attachStaticFiles — reusing Warp internals for TLS connections -`attachStaticFiles` receives already-established TLS connections (which passed TLS handshake and ALPN check in the SMP transport layer) and runs Warp's HTTP handler on them. It manually calls `WI.withII`, `WT.attachConn`, `WI.registerKillThread`, and `WI.serveConnection` — internal Warp APIs. This couples the server to Warp internals and could break on Warp library updates. +`attachStaticFiles` receives already-established TLS connections (which passed TLS handshake and ALPN check in the SMP transport layer) and runs Warp's HTTP handler on them. It manually calls `WI.withII`, `WT.attachConn`, `WI.registerKillThread`, and `WI.serveConnection` — internal Warp APIs. This couples the router to Warp internals and could break on Warp library updates. ## serveStaticPageH2 — path traversal protection diff --git a/spec/modules/Simplex/Messaging/Transport.md b/spec/modules/Simplex/Messaging/Transport.md index f88188792..1b4656071 100644 --- a/spec/modules/Simplex/Messaging/Transport.md +++ b/spec/modules/Simplex/Messaging/Transport.md @@ -10,7 +10,7 @@ This is the core transport module. It defines: - The `Transport` typeclass abstracting over TLS and WebSocket connections -- The SMP handshake protocol (server and client sides) +- The SMP handshake protocol (router and client sides) - Optional block encryption using HKDF-derived symmetric key chains (v11+) - Version negotiation with backward-compatible extensions @@ -29,8 +29,8 @@ In practice (Server.hs), the SMP proxy uses `proxiedSMPRelayVRange` to cap the d ## withTlsUnique — different API calls yield same value `withTlsUnique` extracts the tls-unique channel binding (RFC 5929) using a type-level dispatch: -- **Server** (`STServer`): `T.getPeerFinished` — the peer's (client's) Finished message -- **Client** (`STClient`): `T.getFinished` — own (client's) Finished message +- **Router side** (`STServer`): `T.getPeerFinished` — the peer's (client's) Finished message +- **Client side** (`STClient`): `T.getFinished` — own (client's) Finished message Both calls yield the client's Finished message. If the result is `Nothing`, the connection is closed immediately (`closeTLS cxt >> ioe_EOF`). @@ -41,31 +41,31 @@ Two TLS parameter sets: - **`defaultSupportedParams`**: ChaCha20-Poly1305 ciphers only, Ed448/Ed25519 signatures only, X448/X25519 groups. Per the protocol spec: "TLS_CHACHA20_POLY1305_SHA256 cipher suite, ed25519 EdDSA algorithms for signatures, x25519 ECDHE groups for key exchange." - **`defaultSupportedParamsHTTPS`**: extends `defaultSupportedParams` with `ciphersuite_strong`, additional groups, and additional hash/signature combinations. The source comment says: "A selection of extra parameters to accomodate browser chains." -In the SMP server (Server.hs), when HTTP credentials are configured, `defaultSupportedParamsHTTPS` is used for all connections on that port (not selected per-connection). When no HTTP credentials are configured, `defaultSupportedParams` is used. +In the SMP router (`Server.hs`), when HTTP credentials are configured, `defaultSupportedParamsHTTPS` is used for all connections on that port (not selected per-connection). When no HTTP credentials are configured, `defaultSupportedParams` is used. ## SMP handshake flow Per the [protocol spec](../../../../protocol/simplex-messaging.md#transport-handshake), the handshake is a two-message exchange (three if service certs are used): -1. **Server → Client**: `paddedRouterHello` containing `smpVersionRange`, `sessionIdentifier` (tls-unique), and `routerCertKey` (certificate chain + X25519 key signed by the server's certificate) -2. **Client → Server**: `paddedClientHello` containing agreed `smpVersion`, `keyHash` (router identity — CA certificate fingerprint), optional `clientKey`, `proxyRouter` flag, and optional `clientService` -3. **Server → Client** (service only): `paddedRouterHandshakeResponse` containing assigned `serviceId` or `handshakeError` +1. **Router → Client**: `paddedRouterHello` containing `smpVersionRange`, `sessionIdentifier` (tls-unique), and `routerCertKey` (certificate chain + X25519 key signed by the router's certificate) +2. **Client → Router**: `paddedClientHello` containing agreed `smpVersion`, `keyHash` (router identity — CA certificate fingerprint), optional `clientKey`, `proxyRouter` flag, and optional `clientService` +3. **Router → Client** (service only): `paddedRouterHandshakeResponse` containing assigned `serviceId` or `handshakeError` -The client verifies `sessionIdentifier` matches its own tls-unique (`when (sessionId /= sessId) $ throwE TEBadSession`). The server verifies `keyHash` matches its CA fingerprint (`when (keyHash /= kh) $ throwE $ TEHandshake IDENTITY`). +The client verifies `sessionIdentifier` matches its own tls-unique (`when (sessionId /= sessId) $ throwE TEBadSession`). The router verifies `keyHash` matches its CA fingerprint (`when (keyHash /= kh) $ throwE $ TEHandshake IDENTITY`). Per the protocol spec: "For TLS transport client should assert that sessionIdentifier is equal to tls-unique channel binding defined in RFC 5929." ### legacyServerSMPRelayVRange when no ALPN -If ALPN is not negotiated (`getSessionALPN c` returns `Nothing`), the server offers `legacyServerSMPRelayVRange` (v6 only) instead of the full version range. Per the protocol spec: "If the client does not confirm this protocol name, the router would fall back to v6 of SMP protocol." The spec notes: "This is added to allow support of older clients without breaking backward compatibility and to extend or modify handshake syntax." +If ALPN is not negotiated (`getSessionALPN c` returns `Nothing`), the router offers `legacyServerSMPRelayVRange` (v6 only) instead of the full version range. Per the protocol spec: "If the client does not confirm this protocol name, the router would fall back to v6 of SMP protocol." The spec notes: "This is added to allow support of older clients without breaking backward compatibility and to extend or modify handshake syntax." ### Service certificate handshake extension -When `clientService` is present in the client handshake, the server performs additional verification: +When `clientService` is present in the client handshake, the router performs additional verification: - The TLS client certificate chain must exactly match the certificate chain in the handshake message (`getPeerCertChain c == cc`) - The signed X25519 public key is verified against the leaf certificate's key (`getCertVerifyKey leafCert` then `C.verifyX509`) -- On success, the server sends `SMPServerHandshakeResponse` with a `serviceId` -- On failure, the server sends `SMPServerHandshakeError` before raising the error +- On success, the router sends `SMPServerHandshakeResponse` with a `serviceId` +- On failure, the router sends `SMPServerHandshakeError` before raising the error Per the protocol spec (v16+): "`clientService` provides long-term service client certificate for high-volume services using SMP router (chat relays, notification routers, high traffic bots). The router responds with a third handshake message containing the assigned service ID." @@ -86,7 +86,7 @@ The protocol spec version history (v11) describes this as "additional encryption ## smpTHandleClient — chain key swap -`smpTHandleClient` applies `swap` to the chain key pair before creating `TSbChainKeys`. The code comment states: "swap is needed to use client's sndKey as server's rcvKey and vice versa." +`smpTHandleClient` applies `swap` to the chain key pair before creating `TSbChainKeys`. The code comment states: "swap is needed to use client's sndKey as server's rcvKey and vice versa." (Here "server" is the code's term for the router side of the transport.) ## Proxy version downgrade logic diff --git a/spec/modules/Simplex/Messaging/Transport/Server.md b/spec/modules/Simplex/Messaging/Transport/Server.md index 181951dcd..8027ef2d7 100644 --- a/spec/modules/Simplex/Messaging/Transport/Server.md +++ b/spec/modules/Simplex/Messaging/Transport/Server.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Transport.Server -> TLS server: socket lifecycle, client acceptance, SNI credential switching, socket leak detection. +> TLS listener: socket lifecycle, client acceptance, SNI credential switching, socket leak detection. **Source**: [`Transport/Server.hs`](../../../../../src/Simplex/Messaging/Transport/Server.hs) @@ -19,10 +19,10 @@ ## SNI credential switching `supportedTLSServerParams` selects TLS credentials based on SNI: -- **No SNI**: uses `credential` (the primary server credential) +- **No SNI**: uses `credential` (the primary router credential) - **SNI present**: uses `sniCredential` (when configured) -The `sniCredUsed` TVar records whether SNI triggered credential switching. In the SMP server (Server.hs), when `sniUsed` is `True`, the connection is dispatched to the HTTP handler instead of the SMP handler. +The `sniCredUsed` TVar records whether SNI triggered credential switching. In the SMP router (`Server.hs`), when `sniUsed` is `True`, the connection is dispatched to the HTTP handler instead of the SMP handler. ## startTCPServer — address resolution diff --git a/spec/modules/Simplex/Messaging/Transport/Shared.md b/spec/modules/Simplex/Messaging/Transport/Shared.md index 8248c068b..1817fd13e 100644 --- a/spec/modules/Simplex/Messaging/Transport/Shared.md +++ b/spec/modules/Simplex/Messaging/Transport/Shared.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Transport.Shared -> Certificate chain parsing and X.509 validation utilities shared between client and server. +> Certificate chain parsing and X.509 validation utilities shared between client and router. **Source**: [`Transport/Shared.hs`](../../../../../src/Simplex/Messaging/Transport/Shared.hs) @@ -19,10 +19,10 @@ | 4 | `CCValid {leafCert, idCert, _, caCert}` | "with network certificate" | | 5+ | `CCLong` | (rejected) | -The protocol spec defines supported chain lengths of 2, 3, and 4 certificates (see [Router certificate](../../../../protocol/simplex-messaging.md#router-certificate)). In all `CCValid` cases, `idCert` is the certificate whose fingerprint is compared against the server address key hash, and `caCert` is used as the X.509 trust anchor. +The protocol spec defines supported chain lengths of 2, 3, and 4 certificates (see [Router certificate](../../../../protocol/simplex-messaging.md#router-certificate)). In all `CCValid` cases, `idCert` is the certificate whose fingerprint is compared against the router identity (key hash in the queue URI), and `caCert` is used as the X.509 trust anchor. In the 4-cert case, index 2 is skipped (`_`) — it is present in the chain but not used as either the identity or the trust anchor. ## x509validate — FQHN check disabled -`x509validate` sets `checkFQHN = False`. The protocol spec identifies servers by certificate fingerprint (key hash in the server address), not by domain name. The validation uses a fresh `ValidationCache` (`ValidationCacheUnknown` for all lookups, no-op store) — each connection validates independently. +`x509validate` sets `checkFQHN = False`. The protocol spec identifies routers by certificate fingerprint (key hash in the queue URI), not by domain name. The validation uses a fresh `ValidationCache` (`ValidationCacheUnknown` for all lookups, no-op store) — each connection validates independently. From c8f2edc242ab8ca14f5341496a5bb1359c9daee7 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 07:35:15 +0000 Subject: [PATCH 16/61] spec for agent protocol --- spec/TOPICS.md | 6 + .../Simplex/Messaging/Agent/Protocol.md | 118 ++++++++++++++++++ src/Simplex/Messaging/Agent.hs | 3 +- src/Simplex/Messaging/Agent/Protocol.hs | 3 +- 4 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 spec/modules/Simplex/Messaging/Agent/Protocol.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index 6ef029e96..097107143 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -22,4 +22,10 @@ - **Server subscription architecture**: The SMP server's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module. +- **Duplex connection handshake**: The SMP duplex connection procedure (standard 10-step and fast 7-step) spans Agent.hs (orchestration, state machine), Agent/Protocol.hs (message types: AgentConfirmation/AgentConnInfoReply/AgentInvitation/HELLO, queue status types), Client.hs (SMP command dispatch), Protocol.hs (SMP-level KEY/SKEY commands). The handshake involves two-layer encryption (per-queue E2E + double ratchet), version-dependent paths (v2+ duplex, v6+ sender auth key, v7+ ratchet on confirmation, v9+ fast handshake with SKEY), and the asymmetry between initiating and accepting parties (different message types, different confirmation processing). The protocol spec (`agent-protocol.md`) defines the procedure but the implementation details — error handling, state persistence across restarts, race conditions between confirmation and message delivery — are only visible by reading the code across these modules. + +- **Connection links**: Full connection links (URI format with `#/?` query parameters) and binary-encoded links (`Encoding` instances) serve different contexts — URIs for out-of-band sharing, binary for agent-to-agent messages. Each has independent version-conditional encoding with different backward-compat rules (URI parser adjusts agent version ranges for old contact links, binary parser patches `queueMode` for forward compat). The `VersionI`/`VersionRangeI` typeclasses convert between `SMPQueueInfo` (versioned, in confirmations) and `SMPQueueUri` (version-ranged, in links). Full picture requires Agent/Protocol.hs, Protocol.hs, and agent-protocol.md. + +- **Short links**: Short links are a compact representation for sharing via URLs, not a replacement for full connection links — both are used. Short links store encrypted link data on the router and encode only a server hostname, link type character, and key hash in the URL. The link data lifecycle (creation, encryption with key derivation, owner chain-of-trust validation, mutable user data updates) spans Agent/Protocol.hs (types, serialization, owner validation, server shortening/restoration), Agent.hs (link creation and resolution API), and the router-side link storage. The `FixedLinkData`/`ConnLinkData` split (immutable vs mutable), `OwnerAuth` chain validation, and `PreparedLinkParams` pre-computation are not visible from any single module. + - **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. diff --git a/spec/modules/Simplex/Messaging/Agent/Protocol.md b/spec/modules/Simplex/Messaging/Agent/Protocol.md new file mode 100644 index 000000000..ad95df809 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Protocol.md @@ -0,0 +1,118 @@ +# Simplex.Messaging.Agent.Protocol + +> Agent protocol types, wire formats, connection link serialization, and error taxonomy. + +**Source**: [`Agent/Protocol.hs`](../../../../../../src/Simplex/Messaging/Agent/Protocol.hs) + +**Protocol spec**: [`protocol/agent-protocol.md`](../../../../../protocol/agent-protocol.md) — duplex connection procedure, agent message syntax, connection link formats. + +## Overview + +This module defines the agent-level protocol: the types exchanged between agents (via SMP routers) and between agent and client application. It contains no IO — purely types, serialization, and validation logic. + +The module carries two independent version scopes: `SMPAgentVersion` (agent-to-agent protocol, currently v2–v7) and `SMPClientVersion` (agent-to-router protocol, imported from `Protocol.hs`). These version scopes interact but are negotiated independently — see [Protocol.md](../Protocol.md#two-separate-version-scopes). + +## Two-layer message format + +Agent messages use a two-layer envelope structure: + +1. **Outer envelope** (`AgentMsgEnvelope`): version + single-char type tag (`C`/`M`/`I`/`R`) + type-specific payload. The `C` (confirmation) and `M` (message) variants carry double-ratchet encrypted content. The `I` (invitation) variant is encrypted only with per-queue E2E. The `R` (ratchet key) variant carries ratchet renegotiation parameters. + +2. **Inner message** (`AgentMessage`): after double-ratchet decryption, discriminated by tags `I`/`D`/`R`/`M`. The `M` variant contains `APrivHeader` (sequential message ID + previous message hash for integrity) followed by `AMessage` (the actual command: `HELLO`, `A_MSG`, queue rotation, etc). + +The tag characters overlap between layers (`I` means confirmation-conninfo in inner, invitation in outer; `M` means message-envelope in outer, agent-message in inner). These are disambiguated by context — outer parsing happens first, then decryption, then inner parsing. + +## e2eEncConnInfoLength / e2eEncAgentMsgLength — PQ-dependent size budgets + +Connection info and agent message size limits depend on both agent version and PQ support. When PQ is enabled (v5+), the limits are *smaller* — not larger — because the ratchet header and reply link grow with PQ keys (SNTRUP761), consuming space from the fixed SMP message block. The specific reductions (3726 for conninfo, 2222 for messages) are documented in source comments. + +## AgentMsgEnvelope — connInfo encryption asymmetry + +`AgentInvitation` encrypts `connInfo` only with per-queue E2E (no double ratchet) — see source comment. This is because invitations are sent to contact address queues where no ratchet has been established. `AgentConfirmation` encrypts with double ratchet. `AgentRatchetKey` uses per-queue E2E for the ratchet parameters themselves (bootstrapping problem: can't use ratchet to renegotiate the ratchet). + +## AgentMessageType — dual encoding paths + +`AgentMessageType` and `AMsgType` encode the same set of message types but serve different purposes: `AgentMessageType` includes the envelope types (`AM_CONN_INFO`, `AM_CONN_INFO_REPLY`, `AM_RATCHET_INFO`) for database storage, while `AMsgType` only covers the inner `AMessage` types. Both share the same wire tags for the overlapping types (`H`, `M`, `V`, `QC`, `QA`, `QK`, `QU`, `QT`, `E`). The `Q`-prefixed types use two-character tags (prefix dispatch), all others use single characters. + +## HELLO — sent once after securing + +`HELLO` is sent exactly once, when the queue is known to be secured (duplex handshake). Not used at all in fast duplex connection (v9+ SMP). The v1 slow handshake (which sent HELLO multiple times until securing succeeded) is no longer supported — `minSupportedSMPAgentVersion = duplexHandshakeSMPAgentVersion` (v2). + +## AEvent entity type system + +`AEvent` is a GADT indexed by `AEntity` (phantom type: `AEConn`, `AERcvFile`, `AESndFile`, `AENone`). This prevents the type system from allowing file events on connection entities and vice versa. The existential wrapper `AEvt` erases the entity type for storage in heterogeneous collections — equality comparison (`Eq AEvt`) uses `testEquality` on the singleton witness to recover type information. + +`AENone` is used for events that aren't associated with any specific entity (e.g., `DOWN`, `UP`, `SUSPENDED`, `DEL_USER`). These are router-level or agent-level notifications, not connection-level. + +## ConnectionMode singleton pattern + +`ConnectionMode` / `SConnectionMode` / `ConnectionModeI` implement the singleton pattern: `SConnectionMode` is the type-level witness, `ConnectionModeI` is the typeclass that lets you recover the singleton from a type parameter. Many types are parameterized by `ConnectionMode` (`ConnectionRequestUri m`, `ConnShortLink m`, `ConnectionLink m`, etc.) to prevent mixing invitation and contact types at compile time. + +`checkConnMode` is the runtime escape hatch — it uses `testEquality` to cast between mode-parameterized types, returning `Left "bad connection mode"` on mismatch. This is used extensively in parsers where the mode is determined at parse time. + +## ConnReqUriData smpP — queueMode patch + +The binary parser for `ConnReqUriData` applies `patchQueueMode` to all queues, setting `queueMode = Just QMContact` when it's `Nothing`. See source comment: this compensates for `QMContact` not being included in queue encoding until min SMP client version >= 3. This patch is safe because the binary encoding path was not used before SMP client version 4. + +## Connection link URI parsing — version range adjustment + +`connReqUriP` adjusts the agent version range for contact links: `adjustAgentVRange` clamps the minimum to `minSupportedSMPAgentVersion`. This preserves compatibility with old contact links published online — they may advertise version ranges starting below the current minimum, and clamping prevents negotiation from failing on an unsupported version. + +The semicolon separator for SMP queues in the URI query string is deliberate — commas are used within server addresses to separate hostnames, so semicolons separate queues to avoid ambiguity. + +## Short link encoding — contactConnType as URL path character + +Short links encode `ContactConnType` as a single lowercase letter in the URL path: `a` (contact), `c` (channel), `g` (group), `r` (relay). Invitation links use `i`. The parser uses `toUpper` before dispatching to `ctTypeP` (which expects uppercase), while the encoder uses `toLower` on `ctTypeChar` output. This case dance happens because the wire format wants lowercase URLs but the internal representation uses uppercase. + +## Short link server shortening + +`shortenShortLink` strips port and key hash from preset servers, leaving only the hostname (`SMPServerOnlyHost` pattern). This makes short links shorter for well-known servers. `restoreShortLink` reverses this by looking up the full server definition from the preset list. Both functions match on primary hostname only (first in the `NonEmpty` list). + +`isPresetServer` has a non-obvious port matching rule: empty port in the preset matches `"443"` or `"5223"` in the link. This handles servers that use default ports without explicitly listing them. + +## OwnerAuth — chain-of-trust validation + +`OwnerAuth` is double-encoded: the inner fields are `smpEncode`d, then the result is encoded as a `ByteString` (with length prefix). See source comment: "additionally encoded as ByteString to have known length and allow OwnerAuth extension." The parser uses `parseOnly` on the inner bytes, which silently ignores trailing data — providing forward compatibility for future field additions. + +`validateLinkOwners` enforces a chain-of-trust: each owner must be signed by either the root key or any *preceding* owner in the list. Order matters — an owner signed by a later owner in the list will fail validation. Duplicate keys or IDs are rejected. An owner key matching the root key is rejected (prevents trivial self-authorization). + +## UserLinkData — length-prefix switchover + +`UserLinkData` uses a 1-byte length prefix for data ≤ 254 bytes, switching to a `\255` sentinel byte followed by a 2-byte (`Large`) length prefix for longer data. This is a backward-compatible extension of the standard `smpEncode` string format (which uses 1-byte length, capping at 255 bytes). + +## FixedLinkData / ConnLinkData — forward-compatible parsing + +Both `FixedLinkData` and `ConnLinkData` (invitation variant) consume trailing bytes with `A.takeByteString` after parsing known fields. See source comment: "ignoring tail for forward compatibility with the future link data encoding." This allows newer agents to add fields without breaking older parsers. + +## AgentErrorType — BlockedIndefinitely promotion + +`fromSomeException` in the `AnyError` instance promotes `BlockedIndefinitelyOnSTM` and `BlockedIndefinitelyOnMVar` to `CRITICAL` errors (with `offerRestart = True`) rather than generic `INTERNAL`. These are thread deadlock signals from the GHC runtime — they indicate a program bug, not a transient error. The `CRITICAL` classification with restart offer means the client application should prompt the user. + +## cryptoErrToSyncState — error severity classification + +Maps crypto errors to ratchet sync states: `DECRYPT_AES`, `DECRYPT_CB`, and `RATCHET_EARLIER` map to `RSAllowed` (sync is optional, may self-recover). `RATCHET_HEADER`, `RATCHET_SKIPPED`, and `RATCHET_SYNC` map to `RSRequired` (sync must happen before communication can continue). This classification determines whether the agent automatically initiates ratchet resynchronization. + +## extraSMPServerHosts — hardcoded onion mappings + +Maps clearnet hostnames of preset SMP routers to their `.onion` addresses. `updateSMPServerHosts` adds the onion host as a second hostname when parsing legacy queue URIs that only have one host. This is used for backward compatibility with queue URIs created before multi-host support — modern URIs include all hosts directly. + +## Queue rotation state machines + +`RcvSwitchStatus` and `SndSwitchStatus` encode the two sides of the queue rotation protocol: + +- **Receiver side**: `RSSwitchStarted` → `RSSendingQADD` → `RSSendingQUSE` → `RSReceivedMessage` +- **Sender side**: `SSSendingQKEY` → `SSSendingQTEST` + +The asymmetry reflects the protocol: the receiver initiates rotation and sends more messages (QADD, QUSE), while the sender responds (QKEY, QTEST). These states are persisted to the database — the `StrEncoding` instances use snake_case strings as the canonical serialization format. See [agent-protocol.md — Rotating messaging queue](../../../../../protocol/agent-protocol.md#rotating-messaging-queue). + +## SMPQueueInfo / SMPQueueUri — version duality + +`SMPQueueInfo` (single version) and `SMPQueueUri` (version range) represent the same queue address but in different contexts. `VersionI` / `VersionRangeI` typeclasses convert between them — `toVersionT` pins a version range to a specific version, `toVersionRangeT` wraps a versioned type in a range. See source comment on `VersionI SMPClientVersion SMPQueueInfo`: the current conversion is trivial (just swapping the version/range field) but the typeclass exists so that future field additions can have version-dependent conversion logic. + +`SMPQueueInfo` encoding has four version-conditional paths: v1 (legacy server encoding), v2+ (standard encoding), v3+ with secure sender (appends `sndSecure` bool), v4+ (appends `queueMode`). The parser uses `clientVersion` to select between `legacyServerP` and standard `smpP` for the server field, and `updateSMPServerHosts` backfills onion addresses for legacy URIs. + +## ACommand — binary body parsing + +`commandP` takes a custom body parser. `dbCommandP` uses `A.take =<< A.decimal <* "\n"` — length-prefixed binary read. This is for commands stored in the database where the body must be fully parsed (not left as unparsed trailing bytes). The standard command parser uses `A.takeByteString` for bodies, consuming remaining input. + +`pqIKP` defaults to `IKLinkPQ PQSupportOff` when PQ support is not specified, and `pqSupP` defaults to `PQSupportOff`. These defaults maintain backward compatibility with commands serialized before PQ support was added. diff --git a/src/Simplex/Messaging/Agent.hs b/src/Simplex/Messaging/Agent.hs index 9e637ca96..25ad87b21 100644 --- a/src/Simplex/Messaging/Agent.hs +++ b/src/Simplex/Messaging/Agent.hs @@ -3352,6 +3352,7 @@ processSMPTransmissions c@AgentClient {subQ} (tSess@(userId, srv, _), THandlePar agentEnvelope <- parseMessage clientBody -- Version check is removed here, because when connecting via v1 contact address the agent still sends v2 message, -- to allow duplexHandshake mode, in case the receiving agent was updated to v2 after the address was created. + -- v1 slow handshake is no longer supported (minSupportedSMPAgentVersion = duplexHandshakeSMPAgentVersion). -- aVRange <- asks $ smpAgentVRange . config -- if agentVersion agentEnvelope `isCompatible` aVRange -- then pure (privHeader, agentEnvelope) @@ -3392,7 +3393,7 @@ processSMPTransmissions c@AgentClient {subQ} (tSess@(userId, srv, _), THandlePar AgentConnInfoReply smpQueues connInfo -> do processConf connInfo SMPConfirmation {senderKey, e2ePubKey, connInfo, smpReplyQueues = L.toList smpQueues, smpClientVersion = phVer} withStore' c $ \db -> updateRcvMsgHash db connId 1 (InternalRcvId 0) (C.sha256Hash agentMsgBody) - _ -> prohibited "conf: not AgentConnInfoReply" -- including AgentConnInfo, that is prohibited here in v2 + _ -> prohibited "conf: not AgentConnInfoReply" -- including AgentConnInfo, that is prohibited here in v2 (v1 slow handshake is no longer supported) where processConf connInfo senderConf = do let newConfirmation = NewConfirmation {connId, senderConf, ratchetState = rc'} diff --git a/src/Simplex/Messaging/Agent/Protocol.hs b/src/Simplex/Messaging/Agent/Protocol.hs index 557a92a73..6c5833e66 100644 --- a/src/Simplex/Messaging/Agent/Protocol.hs +++ b/src/Simplex/Messaging/Agent/Protocol.hs @@ -388,7 +388,7 @@ data AEvent (e :: AEntity) where INV :: AConnectionRequestUri -> AEvent AEConn LINK :: ConnShortLink 'CMContact -> UserConnLinkData 'CMContact -> AEvent AEConn LDATA :: FixedLinkData 'CMContact -> ConnLinkData 'CMContact -> AEvent AEConn - CONF :: ConfirmationId -> PQSupport -> [SMPServer] -> ConnInfo -> AEvent AEConn -- ConnInfo is from sender, [SMPServer] will be empty only in v1 handshake + CONF :: ConfirmationId -> PQSupport -> [SMPServer] -> ConnInfo -> AEvent AEConn -- ConnInfo is from sender, [SMPServer] will be empty only in v1 handshake (no longer supported) REQ :: InvitationId -> PQSupport -> NonEmpty SMPServer -> ConnInfo -> AEvent AEConn -- ConnInfo is from sender INFO :: PQSupport -> ConnInfo -> AEvent AEConn CON :: PQEncryption -> AEvent AEConn -- notification that connection is established @@ -1024,6 +1024,7 @@ data AMessage aMessageType :: AMessage -> AgentMessageType aMessageType = \case + -- v1 slow handshake is no longer supported (minSupportedSMPAgentVersion = duplexHandshakeSMPAgentVersion). -- HELLO is used both in v1 and in v2, but differently. -- - in v1 (and, possibly, in v2 for simplex connections) can be sent multiple times, -- until the queue is secured - the OK response from the server instead of initial AUTH errors confirms it. From 3a756f9842f6e549cea6f83914080fb4e831833c Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 08:49:50 +0000 Subject: [PATCH 17/61] agent client spec --- spec/TOPICS.md | 6 + .../modules/Simplex/Messaging/Agent/Client.md | 221 ++++++++++++++++++ .../Simplex/Messaging/Agent/understanding.md | 59 +++++ 3 files changed, 286 insertions(+) create mode 100644 spec/modules/Simplex/Messaging/Agent/Client.md create mode 100644 spec/modules/Simplex/Messaging/Agent/understanding.md diff --git a/spec/TOPICS.md b/spec/TOPICS.md index 097107143..c29e61705 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -28,4 +28,10 @@ - **Short links**: Short links are a compact representation for sharing via URLs, not a replacement for full connection links — both are used. Short links store encrypted link data on the router and encode only a server hostname, link type character, and key hash in the URL. The link data lifecycle (creation, encryption with key derivation, owner chain-of-trust validation, mutable user data updates) spans Agent/Protocol.hs (types, serialization, owner validation, server shortening/restoration), Agent.hs (link creation and resolution API), and the router-side link storage. The `FixedLinkData`/`ConnLinkData` split (immutable vs mutable), `OwnerAuth` chain validation, and `PreparedLinkParams` pre-computation are not visible from any single module. +- **Agent worker framework**: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval with doWork flag atomics) defined in Agent/Client.hs, consumed by Agent.hs (async commands, message delivery), NtfSubSupervisor.hs (notification workers), FileTransfer/Agent.hs (XFTP workers), and simplex-chat. The framework separates two concerns: worker lifecycle (create-or-reuse, fork async, rate-limit restarts, escalate to CRITICAL) and task pattern (get next task, do task, as separate parameters). The doWork TMVar flag choreography (clear before query to prevent race) and the work-item-error vs store-error distinction are not obvious from any single consumer. + +- **Agent operation suspension**: Five `AgentOpState` TVars (RcvNetwork, MsgDelivery, SndNetwork, Database, NtfNetwork) with a cascade ordering: ending RcvNetwork suspends MsgDelivery, ending MsgDelivery suspends SndNetwork + Database, ending SndNetwork suspends Database. `beginAgentOperation` retries if suspended, `endAgentOperation` decrements and cascades. All DB access goes through `withStore` which brackets with AODatabase. This ensures graceful shutdown propagates through dependent operations. Defined in Agent/Client.hs, used by Agent.hs subscriber and worker loops. + +- **Queue rotation protocol**: Four agent messages (QADD → QKEY → QUSE → QTEST) on top of SMP commands, with asymmetric state machines on receiver side (`RcvSwitchStatus`: 4 states) and sender side (`SndSwitchStatus`: 2 states). Receiver initiates, creates new queue, sends QADD. Sender responds with QKEY. Receiver sends QUSE. Sender sends QTEST to complete. State types in Agent/Protocol.hs, orchestration in Agent.hs, queue creation/deletion in Agent/Client.hs. Protocol spec in agent-protocol.md. The fast variant (v9+ SMP with SKEY) skips the KEY command step. + - **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. diff --git a/spec/modules/Simplex/Messaging/Agent/Client.md b/spec/modules/Simplex/Messaging/Agent/Client.md new file mode 100644 index 000000000..eb4ff47e7 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Client.md @@ -0,0 +1,221 @@ +# Simplex.Messaging.Agent.Client + +> Agent infrastructure layer: protocol client lifecycle, worker framework, subscription management, operation suspension, and concurrency primitives. + +**Source**: [`Agent/Client.hs`](../../../../../../src/Simplex/Messaging/Agent/Client.hs) + +**See also**: [Agent.hs](./Agent.md) — the orchestration layer that consumes these primitives. + +## Overview + +This module defines `AgentClient`, the central state container for the messaging agent, and all reusable infrastructure that Agent.hs and other consumers (NtfSubSupervisor.hs, FileTransfer/Agent.hs, simplex-chat) build upon. It contains ~2868 lines covering: + +- **Protocol client lifecycle**: lazy singleton connections to SMP/NTF/XFTP routers via `SessionVar` pattern, with disconnect callbacks and reconnection workers +- **Worker framework**: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval with doWork flag atomics) +- **Subscription state**: active/pending/removed queues, session-aware cleanup on disconnect, batch subscription RPCs with post-hoc session validation +- **Operation suspension**: five `AgentOpState` TVars with cascade ordering for graceful shutdown +- **Concurrency primitives**: per-connection locks, transport session batching, proxy routing + +The module is consumed by Agent.hs (which passes specific worker bodies, task queries, and handler logic into these frameworks) and by external consumers that reuse the worker and protocol client infrastructure. + +## AgentClient — central state container + +`AgentClient` has ~50 fields, almost all TVars or TMaps. Key architectural groupings: + +- **Event queues**: `subQ` (events to client application), `msgQ` (messages from SMP routers) +- **Protocol client pools**: `smpClients`, `ntfClients`, `xftpClients` — all are TMaps of `TransportSession` → `SessionVar`, implementing lazy singletons via `getSessVar` +- **Subscription tracking**: `currentSubs` (TSessionSubs, active+pending per transport session), `removedSubs` (failed subscriptions with errors), `subscrConns` (set of connection IDs currently subscribed) +- **Worker pools**: `smpDeliveryWorkers`, `asyncCmdWorkers`, `smpSubWorkers` — TMaps keyed by work address/connection +- **Operation states**: `ntfNetworkOp`, `rcvNetworkOp`, `msgDeliveryOp`, `sndNetworkOp`, `databaseOp` +- **Locking**: `connLocks`, `invLocks`, `deleteLock`, `getMsgLocks` + +All TVars are initialized in `newAgentClient`. The `active` TVar is the global kill switch — `closeAgentClient` sets it to `False`, and all protocol client getters check it first. + +## Protocol client lifecycle — SessionVar singleton pattern + +Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern implemented by [Session.hs](../../../Session.md): + +1. **`getSessVar`** atomically checks the TMap. Returns `Left newVar` if absent (caller must connect), `Right existingVar` if present (caller waits for the TMVar). +2. **`newProtocolClient`** wraps the connection attempt. On success, fills the `sessionVar` TMVar with `Right client`. On failure, fills with `Left (error, maybeRetryTime)` and re-throws. +3. **`waitForProtocolClient`** reads the TMVar with a timeout. If the stored error has an expiry time that has passed, it removes the SessionVar and retries from scratch — this is the `persistErrorInterval` retry mechanism. + +### SessionVar compare-and-swap + +`removeSessVar` (Session.hs) only removes a SessionVar from the map if its `sessionVarId` matches the current entry. The `sessionVarId` is a monotonically increasing counter from `workerSeq`. This prevents a stale disconnection callback from removing a *new* client that was created after the old one disconnected. Without this, the sequence "client A disconnects → client B connects → client A's callback runs" would incorrectly remove client B. + +### SMP disconnect callback + +`smpClientDisconnected` is the most complex disconnect handler (NTF/XFTP have simpler versions that just remove the SessionVar): + +1. `removeSessVar` atomically removes the client if still current +2. If `active`, moves active subscriptions to pending (only those matching the disconnecting client's `sessionId` — see next section) +3. Removes proxied relay sessions that this client created +4. Fires `DOWN` events for affected connections +5. Triggers `resubscribeSMPSession` to spawn a reconnection worker + +### Session-aware subscription cleanup + +`removeClientAndSubs` (inside `smpClientDisconnected`) uses `SS.setSubsPending` with the disconnecting client's `sessionId`. Only subscriptions whose session ID matches the disconnecting client are moved to pending. If a new client already connected and made its own subscriptions active, those are *not* disturbed. This prevents the race: "old client disconnects → new client subscribes → old client's cleanup incorrectly demotes new client's subscriptions." + +## ProtocolServerClient typeclass + +Unifies SMP/NTF/XFTP client management with associated types: +- `Client msg` — the connected client type (SMP wraps in `SMPConnectedClient` with proxied relay map; NTF and XFTP use the raw protocol client) +- `ProtoClient msg` — the underlying protocol client for logging/closing + +SMP is special: `SMPConnectedClient` bundles the protocol client with `proxiedRelays :: TMap SMPServer ProxiedRelayVar`, a per-connection map of relay sessions for proxy routing. + +## Worker framework + +Defined here, consumed by Agent.hs, NtfSubSupervisor.hs, FileTransfer/Agent.hs, and simplex-chat. Two separable parts: + +### getAgentWorker — lifecycle management + +Creates or reuses a worker for a given key. Workers are stored in a TMap keyed by their work address. + +- **Create-or-reuse**: atomically checks the map. If absent, creates a new `Worker` (with `doWork` TMVar pre-filled with `()`). If present and `hasWork=True`, signals the existing worker. +- **Fork**: `runWorkerAsync` takes the `action` TMVar. If `Nothing` (worker idle), it starts work. If `Just weakThreadId` (worker running), it puts the value back and returns. This bracket ensures at-most-one concurrent execution. +- **Restart rate limiting**: on worker exit (success or error), checks `restartCount` against `maxWorkerRestartsPerMin`. If under the limit, restarts with `hasWorkToDo` signal. If over the limit, deletes the worker from the map and sends a `CRITICAL True` error. +- **Worker identity**: `workerId` (from `workerSeq`) prevents a stale restart from interfering with a new worker that replaced it in the map. + +`getAgentWorker'` is the generic version with custom worker wrapper — used by `smpDeliveryWorkers` which pairs each Worker with a `TMVar ()` retry lock. + +### withWork / withWork_ / withWorkItems — task retrieval + +Takes `getWork` (fetch next task) and `action` (process it) as separate parameters. The consumer's worker body loops: `waitForWork doWork` → `withWork doWork getTask handleTask`. + +**Critical: doWork flag race prevention.** `noWorkToDo` (clearing the flag) happens BEFORE `getWork` (querying for tasks), not after. This prevents the race where: (1) worker queries, finds nothing, (2) another thread adds work and sets the flag, (3) worker clears the flag — losing the signal. By clearing first, any concurrent signal after the query will be preserved. + +**Error classification**: `withWork_` distinguishes work-item errors from store errors: +- **Work item error** (`isWorkItemError`): the worker stops and sends `CRITICAL False`. The next iteration would likely produce the same error, so stopping prevents infinite loops. +- **Store error**: the flag is re-set and an `INTERNAL` error is reported. The assumption is that store errors are transient (e.g., DB busy) and retrying may succeed. + +`withWorkItems` handles batched work — a list of items where some may have individual errors. If all items are work-item errors, the worker stops. If only some are, the worker continues with the successful items and reports errors. + +### runWorkerAsync — at-most-one execution + +Uses a bracket on the `action` TMVar: +- `takeTMVar action` — blocks if another thread is starting the worker (TMVar empty during start) +- If the taken value is `Nothing` — worker is idle, start it. Store `Just weakThreadId` in the TMVar. +- If `Just _` — worker is already running, put it back and return. + +The `Weak ThreadId` in `action` is a weak reference — it doesn't prevent the worker thread from being garbage collected. This is the cleanup mechanism: if the thread dies without explicitly clearing `action`, the weak reference becomes stale and the next `runWorkerAsync` call will detect it as idle. + +## Operation suspension cascade + +Five `AgentOpState` TVars track whether each operation category is suspended and how many operations are in-flight: + +``` +AONtfNetwork (independent) +AORcvNetwork → AOMsgDelivery → AOSndNetwork → AODatabase +``` + +The cascade means: +- `endAgentOperation AORcvNetwork` suspends `AOMsgDelivery`, which cascades to `AOSndNetwork` → `AODatabase` +- `endAgentOperation AOMsgDelivery` suspends `AOSndNetwork` → `AODatabase` +- `endAgentOperation AOSndNetwork` suspends `AODatabase` +- Each leaf in the cascade calls `notifySuspended` (writes `SUSPENDED` to `subQ`, sets `agentState` to `ASSuspended`) + +**`beginAgentOperation`** retries (blocks in STM) if the operation is suspended. This provides backpressure: new operations wait until the operation is resumed. + +**`agentOperationBracket`** wraps an operation with begin/end. All database access goes through `withStore` which brackets with `AODatabase`. This ensures graceful shutdown propagates: suspending `AORcvNetwork` eventually suspends all downstream operations, and `notifySuspended` only fires when all in-flight operations have completed. + +**`waitWhileSuspended`** vs **`waitUntilForeground`**: `waitWhileSuspended` proceeds during `ASSuspending` (allowing in-flight operations to complete), while `waitUntilForeground` blocks during both `ASSuspending` and `ASSuspended`. + +## Subscription management + +### subscribeQueues — batch-by-transport-session + +`subscribeQueues` is the main entry point for subscribing to receive queues: + +1. `checkQueues` filters out queues with active GET locks (prevents concurrent GET + SUB on the same queue) +2. `batchQueues` groups queues by transport session +3. `addPendingSubs` marks all queues as pending before the RPC +4. `mapConcurrently` subscribes each session batch in parallel + +### subscribeSessQueues_ — post-hoc session validation + +After the subscription RPC completes, `subscribeSessQueues_` validates `activeClientSession` — checking that the SessionVar still holds the same client that was used for the RPC. If the client was replaced during the RPC (reconnection happened), the results are discarded and resubscription is triggered. This is optimistic execution with post-hoc validation: do the work, then check if it's still valid. + +### processSubResults — partitioning + +Subscription results are partitioned into four categories: +1. **Failed with client notice** — queue has a server-side notice (e.g., queue status change) +2. **Failed permanently** — non-temporary error, queue is removed from pending and added to `removedSubs` +3. **Failed temporarily** — error is transient, queue stays in pending for retry on reconnect +4. **Subscribed** — moved from pending to active. Further split into: queues whose service ID matches the session service (added as service-associated) and others. +5. **Ignored** — queue was not in the pending map (already activated by a concurrent path), counted for statistics only + +### Resubscription worker + +`resubscribeSMPSession` spawns a worker per transport session that retries pending subscriptions with exponential backoff (`withRetryForeground`). The worker: + +1. Reads pending subs and pending service sub +2. Waits for foreground and network +3. Resubscribes service and queues +4. Loops until no pending subs remain + +**Cleanup blocks on TMVar fill** — the `cleanup` STM action retries (`whenM (isEmptyTMVar $ sessionVar v) retry`) until the async handle is inserted. This prevents the race where cleanup runs before the worker async is stored, which would leave a terminated worker in the map. + +## Proxy routing — sendOrProxySMPCommand + +Implements SMP proxy/direct routing with fallback: + +1. `shouldUseProxy` checks `smpProxyMode` (Always/Unknown/Unprotected/Never) and whether the destination server is "known" (in the user's server list) +2. If proxying: `getSMPProxyClient` creates or reuses a proxy connection, then `connectSMPProxiedRelay` establishes the relay session. On `NO_SESSION` error, re-creates the relay session through the same proxy. +3. If proxying fails with a host error and `smpProxyFallback` allows it: falls back to direct connection +4. `deleteRelaySession` carefully validates that the current relay session matches the one that failed before removing it (prevents removing a concurrently-created replacement session) + +## withStore — database access bracket + +`withStore` wraps database access with `agentOperationBracket c AODatabase`, ensuring the operation suspension cascade is respected. SQLite errors are classified: +- `ErrorBusy`/`ErrorLocked` → `SEDatabaseBusy` → `CRITICAL True` (prompts user restart) +- Other SQL errors → `SEInternal` + +`SEAgentError` is a special wrapper that allows agent-level errors to be threaded through store operations — used when "transaction-like" access is needed but the operation involves agent logic, not just DB queries. See source comment: "network IO should NOT be used inside AgentStoreMonad." + +## Server selection — getNextServer / withNextSrv + +Server selection has two-level diversity: +1. **Operator diversity**: prefer servers from operators not already used (tracked by `usedOperators` set) +2. **Host diversity**: prefer servers with hosts not already used (tracked by `usedHosts` set) + +`filterOrAll` ensures that if all servers are "used," the full list is returned rather than an empty one. + +`withNextSrv` is designed for retry loops — it re-reads user servers on each call (allowing configuration changes during retries) and tracks `triedHosts` across attempts. When all hosts are tried, the tried set is reset (`S.empty`), creating a round-robin effect. + +## Network configuration — slow/fast selection + +`getNetworkConfig` selects between slow and fast network configs based on `userNetworkInfo`: +- `UNCellular` or `UNNone` → slow config (1.5× timeouts via `slowNetworkConfig`) +- `UNWifi`, `UNEthernet`, `UNOther` → fast config + +Both configs are stored together in `useNetworkConfig :: TVar (NetworkConfig, NetworkConfig)`. The slow config is derived from the fast config in `newAgentClient`. + +## closeAgentClient — shutdown sequence + +1. Sets `active = False` — all protocol client getters will throw `INACTIVE` +2. Closes all protocol server clients (SMP, NTF, XFTP) by swapping maps to empty and forking close threads +3. Clears proxied relays +4. Cancels resubscription workers — forks cancellation threads (fire-and-forget, `closeAgentClient` may return before all workers are cancelled) +5. Clears delivery and async command workers +6. Clears subscription state + +The cancellation of resubscription workers reads the TMVar first (to get the Async handle), then calls `uninterruptibleCancel`. This is wrapped in a forked thread to avoid blocking the shutdown sequence. + +## Transport session modes + +`TransportSessionMode` (`TSMEntity` vs other) determines whether the transport session key includes the entity ID (connection/queue ID). When `TSMEntity`, each queue gets its own TLS connection to the router. When not, queues to the same router share a connection. This is controlled by `sessionMode` in the network config. + +`mkSMPTSession` and related functions compute the transport session key based on the current mode. This affects connection multiplexing — entity-mode sessions provide better privacy (router can't correlate queues) at the cost of more connections. + +## getMsgLocks — GET exclusion + +`getQueueMessage` creates a TMVar lock keyed by `(server, rcvId)` and takes it before sending GET. This prevents concurrent GET and SUB on the same queue (SUB is checked via `hasGetLock` in `checkQueues`). The lock is released by `releaseGetLock` after ACK or on error. + +## Error classification — temporaryAgentError + +Classifies errors as temporary (retryable) or permanent. Notable non-obvious classifications: +- `TEHandshake BAD_SERVICE` is temporary — it indicates a DB error on the router, not a permanent rejection +- `CRITICAL True` is temporary — `True` means the error shows a restart button, implying the user should retry +- `INACTIVE` is temporary — the agent may be reactivated diff --git a/spec/modules/Simplex/Messaging/Agent/understanding.md b/spec/modules/Simplex/Messaging/Agent/understanding.md new file mode 100644 index 000000000..4ca3ed2d9 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/understanding.md @@ -0,0 +1,59 @@ +# Agent Module Documentation Notes + +> Working notes for documenting Agent/Client.hs and Agent.hs. Not a spec doc — will be deleted after both docs are written. + +## Documentation approach + +**Bottom-up**: Client.hs first, then Agent.hs. + +**Client.hs** documents reusable infrastructure with contracts (what callers must provide, what guarantees they get), listing known consumers. Stands alone as "here's the framework." + +**Agent.hs** references Client.hs for infrastructure, focuses on what it passes into those frameworks — specific worker bodies, task queries, handler logic, and the orchestration policies (handshake, rotation, ratchet sync). Stands alone as "here's how the agent uses that framework." + +Coupling captured by cross-references, not duplication. + +## Module roles + +**Client.hs — infrastructure layer (~2868 lines):** +- `AgentClient`: central state container (TVars, TMaps, worker pools, locks, operation states) +- Protocol client lifecycle: lazy singleton for SMP/NTF/XFTP connections, disconnect callbacks, reconnection via sub workers +- Subscription state machine: active/pending/removed, session-aware cleanup on disconnect +- Worker framework: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval pattern with doWork flag atomics) +- Operation suspension cascade: RcvNetwork → MsgDelivery → SndNetwork → Database +- Queue creation (`newRcvQueue`) and protocol-level operations +- Concurrency primitives: per-connection locks, session vars with monotonic IDs, batching by transport session +- Encryption helpers, server selection, statistics + +**Agent.hs — orchestration/policy layer (~3868 lines):** +- Public API: createConnection, joinConnection, allowConnection, sendMessage, ackMessage, switchConnection, etc. +- Subscriber loop: reads `msgQ`, dispatches to per-connection handlers via `processSMP` +- Duplex handshake: confirmation processing, HELLO exchange, CON notification +- Queue rotation protocol: QADD → QKEY → QUSE → QTEST +- Ratchet synchronization: AgentRatchetKey exchange, hash-ordering to break symmetry +- Async command processing: `runCommandProcessing` worker body using `withWork` + `getPendingServerCommand` +- Message delivery: `runSmpQueueMsgDelivery` worker body per SndQueue +- Message integrity: sequential ID + hash chain validation + +## Worker framework details + +Defined in Client.hs, consumed by Agent.hs, NtfSubSupervisor.hs, FileTransfer/Agent.hs, and simplex-chat. + +Two separable parts: +1. **`getAgentWorker`**: lifecycle — create-or-reuse worker for a key, fork async, handle restart rate limiting (max per minute, delete after max). `getAgentWorker'` is generic version with custom worker wrapper (e.g., adding a retryLock TMVar for delivery workers). +2. **`withWork` / `withWork_` / `withWorkItems`**: task retrieval pattern — takes `getWork` (fetch next task) and `action` (process it) as separate parameters. Clears doWork flag BEFORE querying (prevents race where another thread sets flag after query returns empty). Re-sets flag if work was found. On work item error vs store error: work item errors stop the worker (CRITICAL), store errors re-set flag and log. + +Worker body (in consumer module) loops: `waitForWork doWork` → `withWork doWork getTask handleTask`. + +## Key non-obvious patterns to document + +### Client.hs — DONE (see Agent/Client.md) + +### Agent.hs +- Subscriber loop is the main event processor +- Duplex handshake role asymmetry: initiator expects AgentConnInfoReply, acceptor expects AgentConnInfo +- Queue rotation is 4 agent messages on top of SMP commands +- Ratchet sync hash-ordering: lower hash initializes receive ratchet +- Message integrity validation: external sender ID sequential + hash chain +- Split-phase connection creation (prepareConnectionLink + createConnectionForLink) prevents race +- ACK is NOT automatic for A_MSG (user must call ackMessage), IS automatic for control messages +- Connection upgrade: RcvConnection → DuplexConnection when reply queue created From 541b3f924b412e1714b810e6ed1b07a36b456c1a Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 09:43:58 +0000 Subject: [PATCH 18/61] agent spec --- spec/modules/Simplex/Messaging/Agent.md | 238 ++++++++++++++++++ .../modules/Simplex/Messaging/Agent/Client.md | 2 +- .../Simplex/Messaging/Agent/understanding.md | 59 ----- 3 files changed, 239 insertions(+), 60 deletions(-) create mode 100644 spec/modules/Simplex/Messaging/Agent.md delete mode 100644 spec/modules/Simplex/Messaging/Agent/understanding.md diff --git a/spec/modules/Simplex/Messaging/Agent.md b/spec/modules/Simplex/Messaging/Agent.md new file mode 100644 index 000000000..0b1e9cec1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent.md @@ -0,0 +1,238 @@ +# Simplex.Messaging.Agent + +> Orchestration layer: duplex connection lifecycle, message processing dispatch, queue rotation, ratchet synchronization, and async command framework. + +**Source**: [`Agent.hs`](../../../../../src/Simplex/Messaging/Agent.hs) + +**See also**: [Agent/Client.md](./Agent/Client.md) — the infrastructure layer (AgentClient, worker framework, protocol client lifecycle, subscription state, operation suspension). + +**Protocol spec**: [`agent-protocol.md`](../../../../protocol/agent-protocol.md) — duplex connection procedure, agent message syntax. + +## Overview + +This module is the top-level messaging agent, consumed by simplex-chat and other client applications. It passes specific worker bodies, task queries, and handler logic into the frameworks defined in [Agent/Client.hs](./Agent/Client.md), and implements the orchestration policies: duplex handshake, queue rotation, ratchet synchronization, message integrity validation. + +The agent starts four threads (in `getSMPAgentClient_`): `subscriber` (main event loop), `runNtfSupervisor` (notification token management), `cleanupManager` (periodic garbage collection), and `logServersStats` (statistics reporting). These threads are raced via `raceAny_` — if any exits, all are cancelled. + +## Split-phase connection creation + +`prepareConnectionLink` and `createConnectionForLink` separate link preparation (key generation, link formatting — no network) from queue creation (single network call). This prevents the race where a link is published before the queue exists on the router. The link can be shared out-of-band after `prepareConnectionLink`, and `createConnectionForLink` is called only when the user is ready to accept connections. + +## Subscriber loop — processSMPTransmissions + +The subscriber thread reads batches from `msgQ` (filled by SMP protocol clients) and dispatches to `processSMPTransmissions`. Key non-obvious behaviors: + +**Batch UP notification accumulation.** Successful subscription confirmations (`processSubOk`) append to a shared `upConnIds` TVar across the batch. A single `UP` event is emitted after all transmissions in the batch are processed, not per-transmission. Similarly, `serviceRQs` accumulates service-associated receive queues for batch processing via `processRcvServiceAssocs`. + +**Double validation for subscription results.** `isPendingSub` checks two conditions atomically: the queue must be in the pending map AND the client session must still be active. If either fails, the subscription result is counted as ignored (statistics only). This handles the race where a subscription response arrives after the client disconnected and a new client connected. + +**subQ overflow to pendingMsgs.** `processSMP` writes events to `subQ` (bounded TBQueue) but when it's full, events go into a `pendingMsgs` TVar instead. After processing completes, pending messages are drained in reverse order. This prevents the message processing thread from blocking on a full queue, which would stall the entire SMP client. + +**END/ENDS session validation.** Both `END` (single queue) and `ENDS` (service) check `activeClientSession` before removing subscriptions. If the session doesn't match (stale disconnect), the event is logged but ignored. This prevents a delayed END from a disconnected client from removing subscriptions that a new client established. + +## Message processing — processSMP + +`processSMP` dispatches on the SMP message type within a per-connection lock (`withConnLock`). + +### Four e2e key states + +The MSG handler discriminates on `(e2eDhSecret, e2ePubKey_)` — the per-queue shared secret and the incoming public key: + +- `(Nothing, Just key)` — **Handshake phase**: no shared secret yet, public key present. Computes DH, decrypts with per-queue E2E. Dispatches to `smpConfirmation` (if AgentConfirmation) or `smpInvitation` (if AgentInvitation). +- `(Just dh, Nothing)` — **Established phase**: shared secret exists, no new key. This is normal message flow. Dispatches to `AgentRatchetKey` (ratchet renegotiation) or `AgentMsgEnvelope` (double-ratchet encrypted message). +- `(Just dh, Just _)` — **Repeated confirmation**: both present. Only AgentConfirmation is accepted (this is a retry because ACK failed), everything else is rejected. +- `(Nothing, Nothing)` — **Error**: no keys at all. + +### ACK semantics + +ACK is NOT automatic for `A_MSG` — the function returns `ACKPending` and the user must call `ackMessage`. ACK IS automatic for all control messages (HELLO, QADD, QKEY, QUSE, QTEST, EREADY, A_RCVD). This is because `A_MSG` delivery to the user application must be confirmed before the message is removed from the router. + +`handleNotifyAck` wraps each MSG processing branch: if any error occurs, it sends `ERR` to the client but still ACKs the SMP message. This prevents a processing error from causing infinite re-delivery of the same message. + +### agentClientMsg — transactional message processing + +The inner function `agentClientMsg` performs ratchet decryption, message parsing, and integrity checking inside a single `withStore` transaction with `lockConnForUpdate`. This serializes all message processing for a given connection, preventing concurrent ratchet state modifications. The function returns the pre-decryption ratchet state (`rcPrev`) alongside the message — this is needed by `ereadyMsg` to decide whether to send EREADY. + +### Duplicate message handling + +Three paths for `A_DUPLICATE` errors: + +1. **Stored and user-acked**: `getLastMsg` finds it with `userAck = True` → `ackDel` (delete from router). +2. **Stored, A_MSG, not user-acked**: re-notify the user with `MSG` event and return `ACKPending`. The user may not have seen the original notification. +3. **Not stored or non-A_MSG**: verify via `checkDuplicateHash` that the encrypted hash exists in the DB. If it doesn't, the error is re-thrown (it's a real decryption failure, not a duplicate). + +For crypto errors (`A_CRYPTO`): the encrypted message hash is checked for existence. If the hash already exists, the error is silently suppressed (it's a duplicate that failed decryption differently). If not, `notifySync` classifies the error via `cryptoErrToSyncState` and may trigger ratchet resynchronization. + +### resetRatchetSync on successful decryption + +When a double-ratchet message is successfully decrypted and the connection's ratchet sync state is not `RSOk` or `RSStarted`, the state is reset to `RSOk` and `RSYNC RSOk` is notified. This means successful message delivery is the recovery signal for ratchet desynchronization. + +### updateConnVersion on every message + +Every received `AgentMsgEnvelope` triggers `updateConnVersion`, which upgrades the connection's agreed agent version if the message's version is higher and compatible. This is a monotonic upgrade — versions only increase. The `safeVersionRange` construction handles the case where the sender's version is higher than the receiver's maximum — it creates a range from `minVersion` to the sender's version. + +## Duplex handshake + +See [agent-protocol.md](../../../../protocol/agent-protocol.md) for the protocol description. Implementation-specific details: + +### Initiating party (RcvConnection) + +Receives AgentConfirmation with `e2eEncryption = Just sndParams`. Initializes the receive ratchet from the sender's E2E parameters. **v7+ (ratchetOnConfSMPAgentVersion)**: creates the ratchet immediately on confirmation processing, not later on `allowConnection`. See source comment on `processConf` — this supports decrypting messages that may arrive before `allowConnection` is called. The ratchet creation, E2E secret setup, and confirmation storage all happen in one `withStore` transaction. + +### Accepting party (DuplexConnection) + +Receives AgentConfirmation with `e2eEncryption = Nothing` and `AgentConnInfo` (not `AgentConnInfoReply`). The ratchet was already initialized during `joinConnection`. If `senderKey` is present, enqueues `ICDuplexSecure` (the queue needs to be secured with SKEY). If absent (sender already secured via LKEY), sends `CON` immediately. + +### HELLO exchange + +HELLO is processed in `helloMsg`. The key dispatch is on `sndStatus`: +- `sndStatus == Active`: this side already sent HELLO, so receiving HELLO means both sides are connected → emit `CON`. +- Otherwise: this side hasn't sent HELLO yet → enqueue HELLO reply via `enqueueDuplexHello`. + +HELLO is not used at all in fast duplex connection (v9+ SMP with SKEY — the sender secures the queue directly, skipping the HELLO exchange). + +## Queue rotation + +Four agent messages implement queue rotation. See [agent-protocol.md](../../../../protocol/agent-protocol.md#rotating-messaging-queue) for the protocol. Implementation-specific details: + +**QADD** (processed by sender in `qAddMsg`): Creates a new `SndQueue` with DH key exchange. Before creating the new queue, deletes any previous pending replacement (`delSqs` partitioned by `dbReplaceQId`). Responds with `QKEY`. The replacement chain means multiple consecutive rotation requests are handled correctly — only the latest replacement survives. + +**QKEY** (processed by recipient in `qKeyMsg`): Validates that the queue is `New` or `Confirmed` and the switch status is `RSSendingQADD`. Enqueues `ICQSecure` to secure the queue asynchronously — the actual KEY command is sent by `runCommandProcessing`. + +**QUSE** (processed by sender in `qUseMsg`): Marks the new queue as `Secured`. Sends `QTEST` **only to the new queue**, not the old one. The old queue is deleted after QTEST is successfully delivered (handled in `runSmpQueueMsgDelivery`). + +**QTEST** (no handler): Comment explains — any message received on the new queue triggers deletion of the old queue via the `dbReplaceQueueId` logic in `processSMP`'s AgentMsgEnvelope branch. QTEST exists only to ensure at least one message traverses the new queue. + +**Ratchet sync guard**: All four handlers check `ratchetSyncSendProhibited` before proceeding. Queue rotation is blocked during ratchet desynchronization. + +## Ratchet synchronization — newRatchetKey + +When an `AgentRatchetKey` message is received, `newRatchetKey` handles ratchet re-establishment. + +### Hash-ordering for initialization role + +Both parties generate key pairs and exchange them. The party whose `rkHash(k1, k2)` is **lower** (lexicographic comparison) initializes as the **receiving** ratchet; the other initializes as **sending** and sends EREADY. This deterministic ordering breaks the symmetry when both parties simultaneously request ratchet sync. + +### State machine + +The current `ratchetSyncState` determines behavior: +- `RSOk`, `RSAllowed`, `RSRequired` → **receiving client**: generate new keys, send `AgentRatchetKey` reply, then proceed with hash-ordering. +- `RSStarted` → **initiating client**: use the keys already stored (from `synchronizeRatchet'`), proceed with hash-ordering. +- `RSAgreed` → **error**: ratchet was already re-established but another key arrived. Sets state to `RSRequired` and throws `RATCHET_SYNC`. This handles the edge case where both parties initiate simultaneously and one has already completed. + +### Deduplication + +`checkRatchetKeyHashExists` prevents processing the same ratchet key message twice. The hash is stored before processing, so a duplicate delivery is detected and short-circuited via `ratchetExists`. + +### EREADY + +Sent when the ratchet was initialized as receiving (`rcSnd` is `Nothing` in the pre-decryption ratchet state). Carries `lastExternalSndId` so the other party knows which messages were sent with the old ratchet. Processed by `ereadyMsg`, which checks `rcPrev` (the ratchet state before decrypting the current message) for the same condition — if the pre-decryption ratchet had no send chain, it sends EREADY. + +## Message integrity — checkMsgIntegrity + +Sequential external sender ID + previous message hash chain. Five outcomes: +- **MsgOk**: `extSndId == prevExtSndId + 1` AND hashes match. +- **MsgBadId**: `extSndId < prevExtSndId` — message from the past. +- **MsgDuplicate**: `extSndId == prevExtSndId` — same ID as last message. +- **MsgSkipped**: `extSndId > prevExtSndId + 1` — gap in sequence, reports range of skipped IDs. +- **MsgBadHash**: IDs are sequential but hashes don't match — message was modified or a different message was inserted. + +The integrity result is stored in `MsgMeta` and delivered to the client application. The agent does not reject messages with integrity failures — it reports them and continues processing. This is intentional: the client application decides the policy. + +## Async command processing — runCommandProcessing + +Uses the worker framework from [Agent/Client.hs](./Agent/Client.md#worker-framework). The worker body calls `withWork` with `getPendingServerCommand` as the task source. + +### Internal commands + +The command processor dispatches internal commands that are enqueued by message handlers and other agent operations: + +- **ICAllowSecure / ICDuplexSecure**: Complete the duplex handshake by securing the queue and sending confirmation. `ICAllowSecure` is the user-initiated path (from `allowConnection`), `ICDuplexSecure` is the automatic path (from receiving AgentConnInfo with senderKey). +- **ICQSecure / ICQDelete**: Queue rotation — secure the new queue (KEY command) and delete the old queue. +- **ICAck / ICAckDel**: Send ACK to the SMP router, optionally deleting the internal message record. +- **ICDeleteConn / ICDeleteRcvQueue**: Connection and queue cleanup. + +### Retry semantics + +`runCommandProcessing` has two retry intervals: zero (immediate retry via `0`) for commands that fail with temporary errors, and `asyncCmdRetryInterval` for stuck commands. `tryMoveableCommand` attempts to skip a stuck command by marking it with a future `connId` so `getPendingServerCommand` returns the next one instead. + +### withConnLockNotify + +Wraps command execution with `withConnLock` plus automatic error notification to `subQ`. This ensures that even if a command fails, the client application is notified. + +## Message delivery — runSmpQueueMsgDelivery + +Per-queue delivery loop using the worker framework. Each `SndQueue` has its own delivery worker (keyed by queue address in `smpDeliveryWorkers`). + +### Per-message-type error handling + +Error handling differs by message type and SMP error: + +**QUOTA**: The queue has exceeded its message quota. Sets `quotaExceededTs` and starts an expiry timer if `messageExpireInterval` is configured. Does NOT retry — the sender must wait for the recipient to drain messages (signaled by `A_QCONT`). + +**AUTH**: Different response per message type: +- `A_MSG_` (user message): sends `SENT` with `SndMsgRcvQueued` status to the client. The message was accepted by the router but auth failed on the receive side — likely the queue was replaced during rotation. +- Other types: sends `MERR` error to the client. +- In both cases, if `messageExpireInterval` is configured, expired messages are deleted. + +**Timeout/network errors**: retried with the worker framework's built-in retry. The `retryLock` TMVar (paired with each delivery worker — see `getAgentWorker'` in [Agent/Client.md](./Agent/Client.md#getagentworker--lifecycle-management)) provides external retry signaling from `A_QCONT`. + +## Batch message sending — sendMessagesB_ + +`sendMessagesB_` sends messages to multiple connections. When multiple messages have the same body (common for group messages), the body is encrypted once and referenced via `VRRef` for subsequent connections. `vrCopyMap` tracks `ByteString → (VRValue encrypted)` mappings. This is a performance optimization — ratchet encryption is expensive, and group messages go to many connections with identical plaintext. + +The function partitions connections by send queue and builds per-queue delivery batches. Each connection's message is encrypted with its own ratchet but the plaintext body lookup avoids redundant work. + +## Subscription management + +### subscribeAllConnections' + +Batch subscription with throttling: `maxPending` limits how many pending subscriptions exist simultaneously. When the pending count exceeds the limit, the function waits before enqueuing more. This prevents memory exhaustion on reconnection when thousands of connections need resubscription. + +Service subscriptions are attempted first (`subscribeClientServices'`). If a service subscription succeeds, its associated queues don't need individual SUB commands — they're covered by the service subscription. Queues not associated with any service are subscribed individually. + +### resubscribeConnection' + +Individual connection resubscription. Checks connection status and queue status before subscribing — deleted or suspended connections are skipped. Used for targeted resubscription after specific operations (e.g., after `allowConnection`). + +## Notification token lifecycle + +`registerNtfToken'` → `verifyNtfToken'` → `checkNtfToken'` → `deleteNtfToken'` manage push notification token registration with the NTF server. Token verification uses a challenge-response flow where the NTF server sends a verification code through the push notification channel, and the client confirms receipt. + +## Cleanup manager + +Runs periodically (configurable interval, typically 1 minute). Operations: +- **Delete marked connections**: connections in "deleted" or "deleted-waiting-delivery" states +- **Delete expired/deleted files**: both receive and send files, with configurable TTLs +- **Clean temp paths**: remove temporary file paths from completed transfers +- **Delete orphaned users**: users with no remaining connections get `DEL_USER` notification + +Each cleanup operation catches errors individually (`catchAllErrors`) — a failure in one doesn't prevent others from running. The manager uses `waitActive` to pause during agent suspension, with `tryAny` to handle the case where the agent is being shut down. + +## Agent suspension + +`suspendAgent` triggers the operation suspension cascade defined in [Agent/Client.md](./Agent/Client.md#operation-suspension-cascade). `foregroundAgent` resumes operations. The cascade ordering (RcvNetwork → MsgDelivery → SndNetwork → Database) ensures that receiving stops first, then in-flight message delivery completes, then sending stops, and finally database operations complete. + +## connectReplyQueues — background duplex upgrade + +Used during async command processing to complete the duplex handshake. Handles two cases: +- **Fresh connection** (`sq_ = Nothing`): upgrades `RcvConnection` to `DuplexConnection` by creating a new send queue. +- **SKEY retry** (`sq_ = Just sq`): connection is already duplex from a previous attempt. Reuses the existing send queue. + +Both paths then secure the queue and enqueue the confirmation. + +## secureConfirmQueue vs secureConfirmQueueAsync + +Two paths for sending the confirmation message during duplex handshake: +- **secureConfirmQueue** (synchronous): secures the queue and sends confirmation directly via network. Used in `joinConnection` (foreground user-initiated path). +- **secureConfirmQueueAsync** (asynchronous): secures the queue, stores the confirmation in the database, and submits to the delivery worker. Used in `allowConnection` (background path via `ICAllowSecure`). + +Both call `agentSecureSndQueue` first, which returns whether the initiator's ratchet should be created on confirmation (v7+ behavior). + +## smpConfirmation — version compatibility + +The confirmation handler accepts messages where the agent version or client version is either within the configured range OR at-or-below the already-agreed version. See source comment: "checking agreed versions to continue connection in case of client/agent version downgrades." This means a downgraded client can still complete in-progress handshakes. + +## smpInvitation — contact address handling + +Invitation messages received on a contact address connection are passed through even if version-incompatible. See source comment: "show connection request even if invitation via contact address is not compatible." The client application sees the `REQ` event with `PQSupportOff` when incompatible, allowing it to display the request to the user (who may choose to respond from a compatible client). diff --git a/spec/modules/Simplex/Messaging/Agent/Client.md b/spec/modules/Simplex/Messaging/Agent/Client.md index eb4ff47e7..e5d83675b 100644 --- a/spec/modules/Simplex/Messaging/Agent/Client.md +++ b/spec/modules/Simplex/Messaging/Agent/Client.md @@ -8,7 +8,7 @@ ## Overview -This module defines `AgentClient`, the central state container for the messaging agent, and all reusable infrastructure that Agent.hs and other consumers (NtfSubSupervisor.hs, FileTransfer/Agent.hs, simplex-chat) build upon. It contains ~2868 lines covering: +This module defines `AgentClient`, the central state container for the messaging agent, and all reusable infrastructure that Agent.hs and other consumers (NtfSubSupervisor.hs, FileTransfer/Agent.hs, simplex-chat) build upon. It covers: - **Protocol client lifecycle**: lazy singleton connections to SMP/NTF/XFTP routers via `SessionVar` pattern, with disconnect callbacks and reconnection workers - **Worker framework**: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval with doWork flag atomics) diff --git a/spec/modules/Simplex/Messaging/Agent/understanding.md b/spec/modules/Simplex/Messaging/Agent/understanding.md deleted file mode 100644 index 4ca3ed2d9..000000000 --- a/spec/modules/Simplex/Messaging/Agent/understanding.md +++ /dev/null @@ -1,59 +0,0 @@ -# Agent Module Documentation Notes - -> Working notes for documenting Agent/Client.hs and Agent.hs. Not a spec doc — will be deleted after both docs are written. - -## Documentation approach - -**Bottom-up**: Client.hs first, then Agent.hs. - -**Client.hs** documents reusable infrastructure with contracts (what callers must provide, what guarantees they get), listing known consumers. Stands alone as "here's the framework." - -**Agent.hs** references Client.hs for infrastructure, focuses on what it passes into those frameworks — specific worker bodies, task queries, handler logic, and the orchestration policies (handshake, rotation, ratchet sync). Stands alone as "here's how the agent uses that framework." - -Coupling captured by cross-references, not duplication. - -## Module roles - -**Client.hs — infrastructure layer (~2868 lines):** -- `AgentClient`: central state container (TVars, TMaps, worker pools, locks, operation states) -- Protocol client lifecycle: lazy singleton for SMP/NTF/XFTP connections, disconnect callbacks, reconnection via sub workers -- Subscription state machine: active/pending/removed, session-aware cleanup on disconnect -- Worker framework: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval pattern with doWork flag atomics) -- Operation suspension cascade: RcvNetwork → MsgDelivery → SndNetwork → Database -- Queue creation (`newRcvQueue`) and protocol-level operations -- Concurrency primitives: per-connection locks, session vars with monotonic IDs, batching by transport session -- Encryption helpers, server selection, statistics - -**Agent.hs — orchestration/policy layer (~3868 lines):** -- Public API: createConnection, joinConnection, allowConnection, sendMessage, ackMessage, switchConnection, etc. -- Subscriber loop: reads `msgQ`, dispatches to per-connection handlers via `processSMP` -- Duplex handshake: confirmation processing, HELLO exchange, CON notification -- Queue rotation protocol: QADD → QKEY → QUSE → QTEST -- Ratchet synchronization: AgentRatchetKey exchange, hash-ordering to break symmetry -- Async command processing: `runCommandProcessing` worker body using `withWork` + `getPendingServerCommand` -- Message delivery: `runSmpQueueMsgDelivery` worker body per SndQueue -- Message integrity: sequential ID + hash chain validation - -## Worker framework details - -Defined in Client.hs, consumed by Agent.hs, NtfSubSupervisor.hs, FileTransfer/Agent.hs, and simplex-chat. - -Two separable parts: -1. **`getAgentWorker`**: lifecycle — create-or-reuse worker for a key, fork async, handle restart rate limiting (max per minute, delete after max). `getAgentWorker'` is generic version with custom worker wrapper (e.g., adding a retryLock TMVar for delivery workers). -2. **`withWork` / `withWork_` / `withWorkItems`**: task retrieval pattern — takes `getWork` (fetch next task) and `action` (process it) as separate parameters. Clears doWork flag BEFORE querying (prevents race where another thread sets flag after query returns empty). Re-sets flag if work was found. On work item error vs store error: work item errors stop the worker (CRITICAL), store errors re-set flag and log. - -Worker body (in consumer module) loops: `waitForWork doWork` → `withWork doWork getTask handleTask`. - -## Key non-obvious patterns to document - -### Client.hs — DONE (see Agent/Client.md) - -### Agent.hs -- Subscriber loop is the main event processor -- Duplex handshake role asymmetry: initiator expects AgentConnInfoReply, acceptor expects AgentConnInfo -- Queue rotation is 4 agent messages on top of SMP commands -- Ratchet sync hash-ordering: lower hash initializes receive ratchet -- Message integrity validation: external sender ID sequential + hash chain -- Split-phase connection creation (prepareConnectionLink + createConnectionForLink) prevents race -- ACK is NOT automatic for A_MSG (user must call ackMessage), IS automatic for control messages -- Connection upgrade: RcvConnection → DuplexConnection when reply queue created From c940f16f37b1852ea8bd81b45f8056e511a0c3a7 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 10:14:47 +0000 Subject: [PATCH 19/61] update agent specs --- spec/modules/Simplex/Messaging/Agent.md | 266 ++++++++++++------ .../modules/Simplex/Messaging/Agent/Client.md | 125 ++++++-- 2 files changed, 283 insertions(+), 108 deletions(-) diff --git a/spec/modules/Simplex/Messaging/Agent.md b/spec/modules/Simplex/Messaging/Agent.md index 0b1e9cec1..a52be2156 100644 --- a/spec/modules/Simplex/Messaging/Agent.md +++ b/spec/modules/Simplex/Messaging/Agent.md @@ -10,25 +10,41 @@ ## Overview -This module is the top-level messaging agent, consumed by simplex-chat and other client applications. It passes specific worker bodies, task queries, and handler logic into the frameworks defined in [Agent/Client.hs](./Agent/Client.md), and implements the orchestration policies: duplex handshake, queue rotation, ratchet synchronization, message integrity validation. +This module is the top-level SimpleX agent, consumed by simplex-chat and other client applications. It passes specific worker bodies, task queries, and handler logic into the frameworks defined in [Agent/Client.hs](./Agent/Client.md), and implements the orchestration policies: duplex handshake, queue rotation, ratchet synchronization, message integrity validation. -The agent starts four threads (in `getSMPAgentClient_`): `subscriber` (main event loop), `runNtfSupervisor` (notification token management), `cleanupManager` (periodic garbage collection), and `logServersStats` (statistics reporting). These threads are raced via `raceAny_` — if any exits, all are cancelled. +### Agent startup — backgroundMode + +`getSMPAgentClient_` accepts a `backgroundMode` flag that fundamentally changes agent capabilities: +- **Normal mode** (`backgroundMode = False`): starts four threads raced via `raceAny_` — `subscriber` (main event loop), `runNtfSupervisor` (notification management), `cleanupManager` (garbage collection), `logServersStats` (statistics). Also restores persisted server statistics. If any thread crashes, all are cancelled; statistics are saved in a `finally` block. +- **Background mode** (`backgroundMode = True`): starts only the `subscriber` thread. No cleanup, no notifications, no stats persistence. Used when the agent needs minimal receive-only operation. + +Thread crashes are caught by the `run` wrapper: if the agent is still active (`acThread` is set), the exception is reported as `CRITICAL True` to `subQ`. If the agent is being disposed, crashes are silently ignored. + +### Service + entity session mode prohibition + +Service certificates and entity transport session mode (`TSMEntity`) are mutually exclusive. This is checked in four places: `getSMPAgentClient_`, `setNetworkConfig`, `createUser'`, `setUserService'`. If violated, throws `CMD PROHIBITED`. The constraint exists because service certificates associate multiple queues under one identity, which contradicts entity session mode's goal of preventing queue correlation. ## Split-phase connection creation -`prepareConnectionLink` and `createConnectionForLink` separate link preparation (key generation, link formatting — no network) from queue creation (single network call). This prevents the race where a link is published before the queue exists on the router. The link can be shared out-of-band after `prepareConnectionLink`, and `createConnectionForLink` is called only when the user is ready to accept connections. +`prepareConnectionLink` and `createConnectionForLink` separate link preparation (key generation, link formatting — no network) from queue creation (single network call). This prevents the race where a link is published before the queue exists on the router. + +**Sender ID derivation.** The sender ID is deterministic: `SMP.EntityId $ B.take 24 $ C.sha3_384 corrId` where `corrId` is a random nonce. `createConnectionForLink` validates `actualSndId == sndId` — if the router returns a different sender ID, the connection is rejected. See source comment: "the remaining 24 bytes are reserved, possibly for notifier ID in the new notifications protocol." + +**PQ restriction.** `IKUsePQ` is prohibited for prepared links — throws `CMD PROHIBITED`. PQ keys are too large for the short link format. ## Subscriber loop — processSMPTransmissions -The subscriber thread reads batches from `msgQ` (filled by SMP protocol clients) and dispatches to `processSMPTransmissions`. Key non-obvious behaviors: +The subscriber thread reads batches from `msgQ` (filled by SMP protocol clients) and dispatches to `processSMPTransmissions`. Each batch is processed within `agentOperationBracket c AORcvNetwork waitUntilActive`, tying into the operation suspension cascade. -**Batch UP notification accumulation.** Successful subscription confirmations (`processSubOk`) append to a shared `upConnIds` TVar across the batch. A single `UP` event is emitted after all transmissions in the batch are processed, not per-transmission. Similarly, `serviceRQs` accumulates service-associated receive queues for batch processing via `processRcvServiceAssocs`. +**Batch UP notification accumulation.** Successful subscription confirmations (`processSubOk`) append to a shared `upConnIds` TVar across the batch. A single `UP` event is emitted after all transmissions are processed, not per-transmission. Similarly, `serviceRQs` accumulates service-associated receive queues for batch processing via `processRcvServiceAssocs`. -**Double validation for subscription results.** `isPendingSub` checks two conditions atomically: the queue must be in the pending map AND the client session must still be active. If either fails, the subscription result is counted as ignored (statistics only). This handles the race where a subscription response arrives after the client disconnected and a new client connected. +**Double validation for subscription results.** `isPendingSub` checks two conditions atomically: the queue must be in the pending map AND the client session must still be active (`activeClientSession`). If either fails, the result is counted as ignored (statistics only). This handles the race where a subscription response arrives after reconnection. -**subQ overflow to pendingMsgs.** `processSMP` writes events to `subQ` (bounded TBQueue) but when it's full, events go into a `pendingMsgs` TVar instead. After processing completes, pending messages are drained in reverse order. This prevents the message processing thread from blocking on a full queue, which would stall the entire SMP client. +**SUB response piggybacking MSG.** When a SUB response arrives as `Right msg@SMP.MSG {}`, the connection is marked UP (via `processSubOk`) AND the MSG is processed. The UP notification happens even if the MSG processing fails — the connection is up regardless. -**END/ENDS session validation.** Both `END` (single queue) and `ENDS` (service) check `activeClientSession` before removing subscriptions. If the session doesn't match (stale disconnect), the event is logged but ignored. This prevents a delayed END from a disconnected client from removing subscriptions that a new client established. +**subQ overflow to pendingMsgs.** `processSMP` writes events to `subQ` (bounded TBQueue) but when full, events go into a `pendingMsgs` TVar. After processing, pending messages are drained in reverse order (LIFO). This prevents the message processing thread from blocking on a full queue, which would stall the entire SMP client. + +**END/ENDS session validation.** Both check `activeClientSession` before removing subscriptions. If the session doesn't match (stale disconnect), the event is logged but ignored. ## Message processing — processSMP @@ -36,40 +52,44 @@ The subscriber thread reads batches from `msgQ` (filled by SMP protocol clients) ### Four e2e key states -The MSG handler discriminates on `(e2eDhSecret, e2ePubKey_)` — the per-queue shared secret and the incoming public key: +The MSG handler discriminates on `(e2eDhSecret, e2ePubKey_)`: -- `(Nothing, Just key)` — **Handshake phase**: no shared secret yet, public key present. Computes DH, decrypts with per-queue E2E. Dispatches to `smpConfirmation` (if AgentConfirmation) or `smpInvitation` (if AgentInvitation). -- `(Just dh, Nothing)` — **Established phase**: shared secret exists, no new key. This is normal message flow. Dispatches to `AgentRatchetKey` (ratchet renegotiation) or `AgentMsgEnvelope` (double-ratchet encrypted message). -- `(Just dh, Just _)` — **Repeated confirmation**: both present. Only AgentConfirmation is accepted (this is a retry because ACK failed), everything else is rejected. +- `(Nothing, Just key)` — **Handshake**: computes DH, decrypts with per-queue E2E. Dispatches to `smpConfirmation` or `smpInvitation`. +- `(Just dh, Nothing)` — **Established**: normal message flow. Dispatches to `AgentRatchetKey` or `AgentMsgEnvelope`. +- `(Just dh, Just _)` — **Repeated confirmation**: only AgentConfirmation is accepted (ACK for previous one failed), everything else is rejected. - `(Nothing, Nothing)` — **Error**: no keys at all. ### ACK semantics -ACK is NOT automatic for `A_MSG` — the function returns `ACKPending` and the user must call `ackMessage`. ACK IS automatic for all control messages (HELLO, QADD, QKEY, QUSE, QTEST, EREADY, A_RCVD). This is because `A_MSG` delivery to the user application must be confirmed before the message is removed from the router. +ACK is NOT automatic for `A_MSG` — the function returns `ACKPending` and the user must call `ackMessage`. ACK IS automatic for all control messages (HELLO, QADD, QKEY, QUSE, QTEST, EREADY, A_RCVD). -`handleNotifyAck` wraps each MSG processing branch: if any error occurs, it sends `ERR` to the client but still ACKs the SMP message. This prevents a processing error from causing infinite re-delivery of the same message. +`handleNotifyAck` wraps the MSG processing: if any error occurs, it sends `ERR` to the client but still ACKs the SMP message. This prevents a processing error from causing infinite re-delivery. ### agentClientMsg — transactional message processing -The inner function `agentClientMsg` performs ratchet decryption, message parsing, and integrity checking inside a single `withStore` transaction with `lockConnForUpdate`. This serializes all message processing for a given connection, preventing concurrent ratchet state modifications. The function returns the pre-decryption ratchet state (`rcPrev`) alongside the message — this is needed by `ereadyMsg` to decide whether to send EREADY. +Performs ratchet decryption, message parsing, and integrity checking inside a single `withStore` transaction with `lockConnForUpdate`. This serializes all message processing for a given connection, preventing concurrent ratchet state modifications. Returns the pre-decryption ratchet state (`rcPrev`) alongside the message — needed by `ereadyMsg` to decide whether to send EREADY. + +### Additional queue status transitions on message receipt + +When receiving an `AgentMsgEnvelope` on a non-Active queue, the queue is set to Active. For primary queues during rotation (`dbReplaceQueueId` is set), the new queue is set as primary and the old queue is scheduled for deletion via `ICQDelete`. This is how the receiving side completes queue rotation — any message on the new queue triggers cleanup of the old one. ### Duplicate message handling Three paths for `A_DUPLICATE` errors: -1. **Stored and user-acked**: `getLastMsg` finds it with `userAck = True` → `ackDel` (delete from router). -2. **Stored, A_MSG, not user-acked**: re-notify the user with `MSG` event and return `ACKPending`. The user may not have seen the original notification. -3. **Not stored or non-A_MSG**: verify via `checkDuplicateHash` that the encrypted hash exists in the DB. If it doesn't, the error is re-thrown (it's a real decryption failure, not a duplicate). +1. **Stored and user-acked**: `getLastMsg` finds it with `userAck = True` → `ackDel`. +2. **Stored, A_MSG, not user-acked**: re-notify the user with `MSG` event and return `ACKPending`. The user may not have seen the original. +3. **Not stored or non-A_MSG**: `checkDuplicateHash` verifies the encrypted hash exists in the DB. If not, re-throws (real decryption failure, not duplicate). -For crypto errors (`A_CRYPTO`): the encrypted message hash is checked for existence. If the hash already exists, the error is silently suppressed (it's a duplicate that failed decryption differently). If not, `notifySync` classifies the error via `cryptoErrToSyncState` and may trigger ratchet resynchronization. +For crypto errors (`A_CRYPTO`): if the encrypted hash already exists, suppressed (duplicate). If not, `notifySync` classifies via `cryptoErrToSyncState` (RSAllowed or RSRequired) and updates the connection's ratchet sync state. ### resetRatchetSync on successful decryption -When a double-ratchet message is successfully decrypted and the connection's ratchet sync state is not `RSOk` or `RSStarted`, the state is reset to `RSOk` and `RSYNC RSOk` is notified. This means successful message delivery is the recovery signal for ratchet desynchronization. +When a double-ratchet message is successfully decrypted and the connection's ratchet sync state is not `RSOk` or `RSStarted`, the state is reset to `RSOk` and `RSYNC RSOk` is notified. Successful message delivery is the recovery signal for ratchet desynchronization. -### updateConnVersion on every message +### updateConnVersion — monotonic upgrade -Every received `AgentMsgEnvelope` triggers `updateConnVersion`, which upgrades the connection's agreed agent version if the message's version is higher and compatible. This is a monotonic upgrade — versions only increase. The `safeVersionRange` construction handles the case where the sender's version is higher than the receiver's maximum — it creates a range from `minVersion` to the sender's version. +Every received `AgentMsgEnvelope` triggers `updateConnVersion`. If the message's agent version is higher than the current agreed version and compatible, the agreed version is upgraded. Versions only increase. `safeVersionRange` handles the case where the sender's version exceeds the receiver's maximum — creates a range from `minVersion` to the sender's version. ## Duplex handshake @@ -81,29 +101,41 @@ Receives AgentConfirmation with `e2eEncryption = Just sndParams`. Initializes th ### Accepting party (DuplexConnection) -Receives AgentConfirmation with `e2eEncryption = Nothing` and `AgentConnInfo` (not `AgentConnInfoReply`). The ratchet was already initialized during `joinConnection`. If `senderKey` is present, enqueues `ICDuplexSecure` (the queue needs to be secured with SKEY). If absent (sender already secured via LKEY), sends `CON` immediately. +Receives AgentConfirmation with `e2eEncryption = Nothing` and `AgentConnInfo` (not `AgentConnInfoReply`). The ratchet was already initialized during `joinConnection`. If `senderKey` is present, enqueues `ICDuplexSecure` (queue needs securing with SKEY). If absent (sender already secured via LKEY), sends `CON` immediately and sets the queue Active. ### HELLO exchange HELLO is processed in `helloMsg`. The key dispatch is on `sndStatus`: - `sndStatus == Active`: this side already sent HELLO, so receiving HELLO means both sides are connected → emit `CON`. -- Otherwise: this side hasn't sent HELLO yet → enqueue HELLO reply via `enqueueDuplexHello`. +- Otherwise: this side hasn't sent HELLO yet → enqueue HELLO reply. + +HELLO is not used in fast duplex connection (v9+ SMP with SKEY). + +### startJoinInvitation — retry-safe ratchet creation + +When retrying a join (existing `SndQueue`), `startJoinInvitation` tries to get the existing ratchet via `getSndRatchet` before creating a new one. If the ratchet exists, it reuses it. If not (error), it logs a non-blocking error via `nonBlockingWriteTBQueue` and creates a fresh ratchet. This prevents a retry from corrupting an already-established ratchet. The same pattern appears in `mkJoinInvitation` for contact URI joins. -HELLO is not used at all in fast duplex connection (v9+ SMP with SKEY — the sender secures the queue directly, skipping the HELLO exchange). +### PQ support negotiation + +PQ support is the AND of four conditions: the local client's PQ preference, the peer's agent version (>= `pqdrSMPAgentVersion`), the E2E encryption version (>= `pqRatchetE2EEncryptVersion`), and the connection's current PQ support. This negotiation happens at `joinConn` and `smpConfirmation` time via `versionPQSupport_` and `pqSupportAnd`. ## Queue rotation Four agent messages implement queue rotation. See [agent-protocol.md](../../../../protocol/agent-protocol.md#rotating-messaging-queue) for the protocol. Implementation-specific details: -**QADD** (processed by sender in `qAddMsg`): Creates a new `SndQueue` with DH key exchange. Before creating the new queue, deletes any previous pending replacement (`delSqs` partitioned by `dbReplaceQId`). Responds with `QKEY`. The replacement chain means multiple consecutive rotation requests are handled correctly — only the latest replacement survives. +**QADD** (processed by sender in `qAddMsg`): Creates a new `SndQueue` with DH key exchange. Deletes any previous pending replacement (`delSqs` partitioned by `dbReplaceQId`). Responds with `QKEY`. The replacement chain means consecutive rotation requests are handled correctly — only the latest survives. + +**QKEY** (processed by recipient in `qKeyMsg`): Validates queue is `New` or `Confirmed` and switch status is `RSSendingQADD`. Enqueues `ICQSecure` for async processing. + +**QUSE** (processed by sender in `qUseMsg`): Marks new queue `Secured`. Sends `QTEST` **only to the new queue**. -**QKEY** (processed by recipient in `qKeyMsg`): Validates that the queue is `New` or `Confirmed` and the switch status is `RSSendingQADD`. Enqueues `ICQSecure` to secure the queue asynchronously — the actual KEY command is sent by `runCommandProcessing`. +**QTEST** (no handler in processSMP): Any message on the new queue triggers old queue deletion via `dbReplaceQueueId` logic. QTEST exists only to ensure at least one message traverses the new queue. -**QUSE** (processed by sender in `qUseMsg`): Marks the new queue as `Secured`. Sends `QTEST` **only to the new queue**, not the old one. The old queue is deleted after QTEST is successfully delivered (handled in `runSmpQueueMsgDelivery`). +**Sender-side completion in delivery handler.** When `AM_QTEST_` is successfully sent in `runSmpQueueMsgDelivery`, the old send queue is removed from the connection: pending messages are deleted, the queue record is removed, and the old queue's delivery worker is deleted from `smpDeliveryWorkers` (stopping its thread). This happens inside `withConnLockNotify` to prevent deadlock with the subscriber. -**QTEST** (no handler): Comment explains — any message received on the new queue triggers deletion of the old queue via the `dbReplaceQueueId` logic in `processSMP`'s AgentMsgEnvelope branch. QTEST exists only to ensure at least one message traverses the new queue. +**ICQDelete error tolerance.** In `runCommandProcessing`, if deleting the old receive queue fails with a permanent error (e.g., queue already gone on router), `finalizeSwitch` still runs — the local switch completes. Only temporary errors prevent completion. -**Ratchet sync guard**: All four handlers check `ratchetSyncSendProhibited` before proceeding. Queue rotation is blocked during ratchet desynchronization. +**Ratchet sync guard**: All four message handlers check `ratchetSyncSendProhibited` before proceeding. ## Ratchet synchronization — newRatchetKey @@ -111,128 +143,192 @@ When an `AgentRatchetKey` message is received, `newRatchetKey` handles ratchet r ### Hash-ordering for initialization role -Both parties generate key pairs and exchange them. The party whose `rkHash(k1, k2)` is **lower** (lexicographic comparison) initializes as the **receiving** ratchet; the other initializes as **sending** and sends EREADY. This deterministic ordering breaks the symmetry when both parties simultaneously request ratchet sync. +Both parties generate key pairs and exchange them. The party whose `rkHash(k1, k2)` is **lower** (lexicographic comparison) initializes the **receiving** ratchet; the other initializes **sending** and sends EREADY. This breaks the symmetry when both parties simultaneously request ratchet sync. ### State machine -The current `ratchetSyncState` determines behavior: - `RSOk`, `RSAllowed`, `RSRequired` → **receiving client**: generate new keys, send `AgentRatchetKey` reply, then proceed with hash-ordering. -- `RSStarted` → **initiating client**: use the keys already stored (from `synchronizeRatchet'`), proceed with hash-ordering. -- `RSAgreed` → **error**: ratchet was already re-established but another key arrived. Sets state to `RSRequired` and throws `RATCHET_SYNC`. This handles the edge case where both parties initiate simultaneously and one has already completed. +- `RSStarted` → **initiating client**: use keys already stored (from `synchronizeRatchet'`), proceed with hash-ordering. +- `RSAgreed` → **error**: sets state to `RSRequired`, throws `RATCHET_SYNC`. Handles the edge case where both parties initiate simultaneously and one has completed. ### Deduplication -`checkRatchetKeyHashExists` prevents processing the same ratchet key message twice. The hash is stored before processing, so a duplicate delivery is detected and short-circuited via `ratchetExists`. +`checkRatchetKeyHashExists` prevents processing the same ratchet key twice. The hash is stored atomically before processing begins. ### EREADY -Sent when the ratchet was initialized as receiving (`rcSnd` is `Nothing` in the pre-decryption ratchet state). Carries `lastExternalSndId` so the other party knows which messages were sent with the old ratchet. Processed by `ereadyMsg`, which checks `rcPrev` (the ratchet state before decrypting the current message) for the same condition — if the pre-decryption ratchet had no send chain, it sends EREADY. +Sent when the ratchet was initialized as receiving (`rcSnd` is `Nothing` in the pre-decryption ratchet state). Carries `lastExternalSndId` so the other party knows which messages were sent with the old ratchet. ## Message integrity — checkMsgIntegrity -Sequential external sender ID + previous message hash chain. Five outcomes: -- **MsgOk**: `extSndId == prevExtSndId + 1` AND hashes match. -- **MsgBadId**: `extSndId < prevExtSndId` — message from the past. -- **MsgDuplicate**: `extSndId == prevExtSndId` — same ID as last message. -- **MsgSkipped**: `extSndId > prevExtSndId + 1` — gap in sequence, reports range of skipped IDs. -- **MsgBadHash**: IDs are sequential but hashes don't match — message was modified or a different message was inserted. +Sequential external sender ID + previous message hash chain. Five outcomes: `MsgOk` (sequential + hashes match), `MsgBadId` (ID from the past), `MsgDuplicate` (same ID), `MsgSkipped` (gap in sequence), `MsgBadHash` (sequential but hashes differ). -The integrity result is stored in `MsgMeta` and delivered to the client application. The agent does not reject messages with integrity failures — it reports them and continues processing. This is intentional: the client application decides the policy. +The integrity result is delivered to the client application via `MsgMeta`. The agent does not reject messages with integrity failures — it reports them and continues processing. The client decides the policy. ## Async command processing — runCommandProcessing -Uses the worker framework from [Agent/Client.hs](./Agent/Client.md#worker-framework). The worker body calls `withWork` with `getPendingServerCommand` as the task source. +Uses the worker framework from [Agent/Client.hs](./Agent/Client.md#worker-framework). Keyed by `(connId, server)` — each connection/server combination gets its own command worker. Uses `AOSndNetwork` for operation suspension. ### Internal commands -The command processor dispatches internal commands that are enqueued by message handlers and other agent operations: - -- **ICAllowSecure / ICDuplexSecure**: Complete the duplex handshake by securing the queue and sending confirmation. `ICAllowSecure` is the user-initiated path (from `allowConnection`), `ICDuplexSecure` is the automatic path (from receiving AgentConnInfo with senderKey). -- **ICQSecure / ICQDelete**: Queue rotation — secure the new queue (KEY command) and delete the old queue. -- **ICAck / ICAckDel**: Send ACK to the SMP router, optionally deleting the internal message record. -- **ICDeleteConn / ICDeleteRcvQueue**: Connection and queue cleanup. +- **ICAllowSecure**: User-initiated handshake completion (from `allowConnection`). On DuplexConnection (SKEY retry), if the error is temporary and the send queue's server differs from the command's server, the command is **moved** to the correct server queue via `updateCommandServer` + `getAsyncCmdWorker`. Returns `CCMoved` instead of `CCCompleted`. +- **ICDuplexSecure**: Automatic handshake completion (from receiving AgentConnInfo with senderKey). Secures queue and sends HELLO. +- **ICQSecure / ICQDelete**: Queue rotation — secure the new queue (KEY) and delete the old queue. +- **ICAck / ICAckDel**: Send ACK to the router, optionally deleting the internal message record. +- **ICDeleteConn**: No longer used, but may exist in old databases — cleaned up by deleting the command record. +- **ICDeleteRcvQueue**: Queue cleanup during rotation. ### Retry semantics -`runCommandProcessing` has two retry intervals: zero (immediate retry via `0`) for commands that fail with temporary errors, and `asyncCmdRetryInterval` for stuck commands. `tryMoveableCommand` attempts to skip a stuck command by marking it with a future `connId` so `getPendingServerCommand` returns the next one instead. +`tryMoveableCommand` wraps execution with `withRetryInterval`: waits for `waitWhileSuspended` and `waitForUserNetwork`, then executes. Temporary/host errors trigger retry via `retrySndOp`. On success, the command is deleted. On permanent error, the error is notified and the command is deleted. `retrySndOp` separates `endAgentOperation`/`beginAgentOperation` into separate `atomically` blocks — see source comment: if `beginAgentOperation` blocks, `SUSPENDED` won't be sent. -### withConnLockNotify +### withConnLockNotify — deadlock prevention -Wraps command execution with `withConnLock` plus automatic error notification to `subQ`. This ensures that even if a command fails, the client application is notified. +Returns `Maybe ATransmission` and writes to `subQ` **after** releasing the lock. This prevents deadlock: if the lock holder writes to a full `subQ` while the subscriber thread needs the lock to process a message, both block indefinitely. ## Message delivery — runSmpQueueMsgDelivery -Per-queue delivery loop using the worker framework. Each `SndQueue` has its own delivery worker (keyed by queue address in `smpDeliveryWorkers`). +Per-queue delivery loop. Each `SndQueue` has its own worker keyed by queue address in `smpDeliveryWorkers`, paired with a `TMVar ()` retry lock (via `getAgentWorker'`). + +### Deferred encryption + +Message bodies are NOT encrypted at enqueue time. `enqueueMessageB` advances the ratchet header (`agentRatchetEncryptHeader`) and validates padding (`rcCheckCanPad`), but stores only the body reference (`sndMsgBodyId`) and encryption key (`encryptKey`, `paddedLen`). The actual message body encoding (`encodeAgentMsgStr`) and encryption (`rcEncryptMsg`) happen at delivery time. This allows the same body to be shared across multiple send queues via `sndMsgBodyId` — each delivery encrypts independently with its connection's ratchet. + +For confirmation and ratchet key messages (AM_CONN_INFO, AM_CONN_INFO_REPLY, AM_RATCHET_INFO), the body is pre-encrypted and stored in `msgBody` directly — no deferred encryption. ### Per-message-type error handling -Error handling differs by message type and SMP error: +**QUOTA**: Checks `internalTs` against `quotaExceededTimeout`. If the message is older than the timeout, expires it and all subsequent expired messages in the queue (via `getExpiredSndMessages` → bulk `MERRS` notification). If not expired, sends `MWARN` and retries with `RISlow`. For confirmation messages (AM_CONN_INFO/AM_CONN_INFO_REPLY), QUOTA is treated as `NOT_AVAILABLE`. + +**AUTH**: Per message type: +- `AM_CONN_INFO` / `AM_CONN_INFO_REPLY` / `AM_RATCHET_INFO`: connection error `NOT_AVAILABLE` +- `AM_HELLO_` with receive queue (initiating party): `NOT_AVAILABLE`. Without receive queue (joining party): `NOT_ACCEPTED`. +- `AM_A_MSG_` / `AM_A_RCVD_` / `AM_QCONT_` / `AM_EREADY_`: delete message and notify `MERR`. +- Queue rotation messages (`AM_QADD_` through `AM_QTEST_`): queue error with descriptive string. + +**Timeout/network errors**: message-type-aware timeout — `AM_HELLO_` uses `helloTimeout`, all others use `messageTimeout`. If expired, uses `notifyDelMsgs` which expires the current message AND fetches all expired messages for the queue in bulk. If `serverHostError`, sends `MWARN` before retrying. Non-host temporary errors retry silently. + +### Delivery success handling + +On successful send, per message type: +- `AM_CONN_INFO` with `senderCanSecure` (fast handshake): sends `CON` + sets status `Active`. +- `AM_CONN_INFO` without `senderCanSecure`: sets status `Confirmed` only. +- `AM_CONN_INFO_REPLY`: sets status `Confirmed`. +- `AM_HELLO_`: sets status `Active`. If receive queue exists AND its status is `Active`, sends `CON` (accepting party in v2). +- `AM_A_MSG_`: sends `SENT msgId proxySrv_` to notify the client. +- `AM_QKEY_`: re-reads connection and sends `SWITCH QDSnd SPConfirmed`. +- `AM_QTEST_`: see "Sender-side completion" under Queue rotation above. +- All other types: no notification. + +After success, the delivery record is deleted. For `AM_A_MSG_`, `keepForReceipt = True` — the record is kept until a receipt is received. -**QUOTA**: The queue has exceeded its message quota. Sets `quotaExceededTs` and starts an expiry timer if `messageExpireInterval` is configured. Does NOT retry — the sender must wait for the recipient to drain messages (signaled by `A_QCONT`). +### withRetryLock2 — external retry signaling -**AUTH**: Different response per message type: -- `A_MSG_` (user message): sends `SENT` with `SndMsgRcvQueued` status to the client. The message was accepted by the router but auth failed on the receive side — likely the queue was replaced during rotation. -- Other types: sends `MERR` error to the client. -- In both cases, if `messageExpireInterval` is configured, expired messages are deleted. +The delivery loop uses `withRetryLock2` which combines the standard retry interval with `qLock` (the `TMVar ()` paired with the worker). When `A_QCONT` is received, the handler puts `()` into the retry lock, causing the retry to fire immediately instead of waiting for the backoff interval. See `continueSending` in `processSMP`. -**Timeout/network errors**: retried with the worker framework's built-in retry. The `retryLock` TMVar (paired with each delivery worker — see `getAgentWorker'` in [Agent/Client.md](./Agent/Client.md#getagentworker--lifecycle-management)) provides external retry signaling from `A_QCONT`. +### submitPendingMsg — operation counting + +`submitPendingMsg` increments `opsInProgress` on `msgDeliveryOp` BEFORE spawning the delivery worker. This means the operation is counted even before the worker starts, ensuring the suspension cascade waits for all enqueued deliveries. ## Batch message sending — sendMessagesB_ -`sendMessagesB_` sends messages to multiple connections. When multiple messages have the same body (common for group messages), the body is encrypted once and referenced via `VRRef` for subsequent connections. `vrCopyMap` tracks `ByteString → (VRValue encrypted)` mappings. This is a performance optimization — ratchet encryption is expensive, and group messages go to many connections with identical plaintext. +### MsgReq grouping contract + +Messages to the same connection must be contiguous in the traversable, with only the first having a non-empty `connId`. Subsequent messages for the same connection must have empty `connId`. This is validated by `addConnId` which rejects duplicate `connId` values and empty first `connId`. The `getConn_` function uses a `TVar prev` to cache the last connection lookup, avoiding redundant database reads. + +### Connection locking + +`withConnLocks` takes locks for ALL connections in the batch before processing. This prevents concurrent sends to the same connection from interleaving ratchet state updates. + +### PQ support monotonic upgrade + +When `pqEnc == PQEncOn` but the connection has `pqSupport == PQSupportOff`, PQ support is upgraded via `setConnPQSupport`. PQ support can only be enabled, never disabled. The upgrade IDs are accumulated via `mapAccumL` and applied in a single batch database write. -The function partitions connections by send queue and builds per-queue delivery batches. Each connection's message is encrypted with its own ratchet but the plaintext body lookup avoids redundant work. +### VRValue/VRRef — database body deduplication + +VRValue/VRRef deduplication operates at the **database body storage** level, not encryption. `enqueueMessageB` tracks an `IntMap (Maybe Int64, AMessage)` mapping integer indices to database body IDs (`sndMsgBodyId`): + +- `VRValue (Just i) body`: stores the body in `snd_message_bodies`, records the `sndMsgBodyId`, and associates it with index `i` for future reference. +- `VRRef i`: looks up index `i` to get the previously stored `sndMsgBodyId`, and creates a new `snd_messages` record linked to the same body. + +Encryption is NOT deduplicated — each connection's ratchet header is independently advanced at enqueue time, and each delivery encrypts the body independently. The optimization is purely about avoiding redundant database storage of identical message bodies (common for group messages). + +### Error propagation constraint + +When a connection type is wrong (e.g., SndConnection, NewConnection), the error is returned per-message but the batch continues. See source comment: "we can't fail here, as it may prevent delivery of subsequent messages that reference the body of the failed message." If a VRValue message fails, subsequent VRRef messages that reference it would break. ## Subscription management +### subscribeConnections_ + +Partitions connections by type. SndConnection with `Confirmed` status returns success (it's not subscribed, just waiting). SndConnection with `Active` status returns `CONN SIMPLEX` (can't subscribe a send-only connection). After subscribing queues, resumes delivery workers for connections with pending deliveries (via `getConnectionsForDelivery`). + +**Multi-queue result combining.** For connections with multiple receive queues, results are combined using a priority system: Active+Success (1) > Active+Error (2) > non-Active+Success (3) > non-Active+Error (4). The highest-priority (lowest number) result is used. This ensures that if at least one Active queue subscribes successfully, the connection reports success. + ### subscribeAllConnections' -Batch subscription with throttling: `maxPending` limits how many pending subscriptions exist simultaneously. When the pending count exceeds the limit, the function waits before enqueuing more. This prevents memory exhaustion on reconnection when thousands of connections need resubscription. +**Active user priority.** If `activeUserId_` is provided, that user's subscriptions are processed first (`sortOn`). + +**Service subscription with fallback.** Service subscriptions are attempted first. If a service subscription fails with `SSErrorServiceId` or zero subscribed queues, the queues are unassociated from the service and subscribed individually. If the error is a client-level error (not a service-specific error), the same fallback applies. -Service subscriptions are attempted first (`subscribeClientServices'`). If a service subscription succeeds, its associated queues don't need individual SUB commands — they're covered by the service subscription. Queues not associated with any service are subscribed individually. +**Pending throttle.** `maxPending` limits concurrent pending subscriptions. The counter is incremented inside the database transaction (before leaving `withStore'`) and decremented in a `finally` block. When the count exceeds the limit, `subscribeUserServer` blocks in STM via `retry`. -### resubscribeConnection' +### resubscribeConnections' -Individual connection resubscription. Checks connection status and queue status before subscribing — deleted or suspended connections are skipped. Used for targeted resubscription after specific operations (e.g., after `allowConnection`). +Filters out connections that already have active subscriptions (via `hasActiveSubscription`). For store errors, returns `True` for `isActiveConn` — this causes the error to be processed by `subscribeConnections_` which will report it. ## Notification token lifecycle -`registerNtfToken'` → `verifyNtfToken'` → `checkNtfToken'` → `deleteNtfToken'` manage push notification token registration with the NTF server. Token verification uses a challenge-response flow where the NTF server sends a verification code through the push notification channel, and the client confirms receipt. +`registerNtfToken'` is a complex state machine. Key non-obvious behavior: on `NTF AUTH` error during token operations, the token is removed and re-registered from scratch (see `withToken` catch of `NTF AUTH`). Device token changes trigger `replaceToken`, which attempts an in-place replacement; if that fails with a permanent error, the token is removed and recreated. ## Cleanup manager -Runs periodically (configurable interval, typically 1 minute). Operations: -- **Delete marked connections**: connections in "deleted" or "deleted-waiting-delivery" states -- **Delete expired/deleted files**: both receive and send files, with configurable TTLs -- **Clean temp paths**: remove temporary file paths from completed transfers -- **Delete orphaned users**: users with no remaining connections get `DEL_USER` notification +Runs periodically with a `cleanupStepInterval` delay BETWEEN each cleanup operation (not just between cycles). This prevents cleanup from monopolizing database access. -Each cleanup operation catches errors individually (`catchAllErrors`) — a failure in one doesn't prevent others from running. The manager uses `waitActive` to pause during agent suspension, with `tryAny` to handle the case where the agent is being shut down. +Additional cleanup not previously mentioned: +- **Expired receive message hashes**: `deleteRcvMsgHashesExpired` +- **Expired send messages**: `deleteSndMsgsExpired` +- **Expired ratchet key hashes**: `deleteRatchetKeyHashesExpired` +- **Expired notification tokens**: `deleteExpiredNtfTokensToDelete` +- **Expired send chunk replicas**: `deleteDeletedSndChunkReplicasExpired` ## Agent suspension -`suspendAgent` triggers the operation suspension cascade defined in [Agent/Client.md](./Agent/Client.md#operation-suspension-cascade). `foregroundAgent` resumes operations. The cascade ordering (RcvNetwork → MsgDelivery → SndNetwork → Database) ensures that receiving stops first, then in-flight message delivery completes, then sending stops, and finally database operations complete. +`suspendAgent` has two modes: +- **Immediate** (`maxDelay = 0`): sets `ASSuspended` and suspends all operations immediately. +- **Gradual** (`maxDelay > 0`): sets `ASSuspending` and triggers the cascade (NtfNetwork independent; RcvNetwork → MsgDelivery → SndNetwork → Database). A timeout thread fires after `maxDelay` and forces suspension of sending and database if still suspending. + +`foregroundAgent` resumes in reverse order: database → sending → delivery → receiving → notifications. ## connectReplyQueues — background duplex upgrade Used during async command processing to complete the duplex handshake. Handles two cases: -- **Fresh connection** (`sq_ = Nothing`): upgrades `RcvConnection` to `DuplexConnection` by creating a new send queue. -- **SKEY retry** (`sq_ = Just sq`): connection is already duplex from a previous attempt. Reuses the existing send queue. - -Both paths then secure the queue and enqueue the confirmation. +- **Fresh connection** (`sq_ = Nothing`): upgrades `RcvConnection` to `DuplexConnection`. +- **SKEY retry** (`sq_ = Just sq`): connection is already duplex. See source comment: "in case of SKEY retry the connection is already duplex." ## secureConfirmQueue vs secureConfirmQueueAsync -Two paths for sending the confirmation message during duplex handshake: -- **secureConfirmQueue** (synchronous): secures the queue and sends confirmation directly via network. Used in `joinConnection` (foreground user-initiated path). -- **secureConfirmQueueAsync** (asynchronous): secures the queue, stores the confirmation in the database, and submits to the delivery worker. Used in `allowConnection` (background path via `ICAllowSecure`). +- **secureConfirmQueue** (synchronous): secures queue and sends confirmation directly via network. Used in `joinConnection`. +- **secureConfirmQueueAsync** (asynchronous): secures queue, stores confirmation, submits to delivery worker. Used in `allowConnection` (via `ICAllowSecure`). -Both call `agentSecureSndQueue` first, which returns whether the initiator's ratchet should be created on confirmation (v7+ behavior). +Both call `agentSecureSndQueue`, which returns `initiatorRatchetOnConf` — whether the initiator's ratchet should be created on confirmation (v7+ behavior). When the queue was already secured (retry), returns the same flag without re-securing. ## smpConfirmation — version compatibility -The confirmation handler accepts messages where the agent version or client version is either within the configured range OR at-or-below the already-agreed version. See source comment: "checking agreed versions to continue connection in case of client/agent version downgrades." This means a downgraded client can still complete in-progress handshakes. +The confirmation handler accepts messages where the agent version or client version is either within the configured range OR at-or-below the already-agreed version. See source comment. This means a downgraded client can still complete in-progress handshakes. ## smpInvitation — contact address handling -Invitation messages received on a contact address connection are passed through even if version-incompatible. See source comment: "show connection request even if invitation via contact address is not compatible." The client application sees the `REQ` event with `PQSupportOff` when incompatible, allowing it to display the request to the user (who may choose to respond from a compatible client). +Invitation messages received on a contact address are passed through even if version-incompatible. See source comment. The client application sees `REQ` with `PQSupportOff` when incompatible. + +## ackMessage' — receipt sending + +After ACKing a message, if the user provides receipt info (`rcptInfo_`), a receipt message (`A_RCVD`) is enqueued. Receipts are only allowed for `AM_A_MSG_` type. If the user ACKs without receipt info and the message already has a receipt with `MROk` status, the corresponding sent message is deleted from the database — it's confirmed delivered. + +## acceptContactAsync' — rollback on failure + +See source comment. Unlike the synchronous `acceptContact'` which takes a lock first, `acceptContactAsync'` marks the invitation as accepted before joining. On failure, `unacceptInvitation` rolls back. The comment notes this could be improved with an invitation lock map. + +## prepareConnectionToJoin — race prevention + +See source comment. Creates a connection record without queues, returning a `ConnId`. The caller saves this ID before the peer can send a confirmation. Without this, the sequence "joinConnection → peer sends confirmation → caller saves ConnId" could result in the confirmation arriving before the caller has the ID. diff --git a/spec/modules/Simplex/Messaging/Agent/Client.md b/spec/modules/Simplex/Messaging/Agent/Client.md index e5d83675b..f1f4965b6 100644 --- a/spec/modules/Simplex/Messaging/Agent/Client.md +++ b/spec/modules/Simplex/Messaging/Agent/Client.md @@ -8,7 +8,7 @@ ## Overview -This module defines `AgentClient`, the central state container for the messaging agent, and all reusable infrastructure that Agent.hs and other consumers (NtfSubSupervisor.hs, FileTransfer/Agent.hs, simplex-chat) build upon. It covers: +This module defines `AgentClient`, the central state container for the SimpleX agent, and all reusable infrastructure that Agent.hs and other consumers (NtfSubSupervisor.hs, FileTransfer/Agent.hs, simplex-chat) build upon. It covers: - **Protocol client lifecycle**: lazy singleton connections to SMP/NTF/XFTP routers via `SessionVar` pattern, with disconnect callbacks and reconnection workers - **Worker framework**: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval with doWork flag atomics) @@ -20,14 +20,17 @@ The module is consumed by Agent.hs (which passes specific worker bodies, task qu ## AgentClient — central state container -`AgentClient` has ~50 fields, almost all TVars or TMaps. Key architectural groupings: +`AgentClient` has ~43 fields, almost all TVars or TMaps. Key architectural groupings: - **Event queues**: `subQ` (events to client application), `msgQ` (messages from SMP routers) - **Protocol client pools**: `smpClients`, `ntfClients`, `xftpClients` — all are TMaps of `TransportSession` → `SessionVar`, implementing lazy singletons via `getSessVar` - **Subscription tracking**: `currentSubs` (TSessionSubs, active+pending per transport session), `removedSubs` (failed subscriptions with errors), `subscrConns` (set of connection IDs currently subscribed) -- **Worker pools**: `smpDeliveryWorkers`, `asyncCmdWorkers`, `smpSubWorkers` — TMaps keyed by work address/connection +- **Worker pools**: `smpDeliveryWorkers`, `asyncCmdWorkers` — TMaps keyed by work address/connection. `smpSubWorkers` — TMaps keyed by transport session for resubscription. - **Operation states**: `ntfNetworkOp`, `rcvNetworkOp`, `msgDeliveryOp`, `sndNetworkOp`, `databaseOp` -- **Locking**: `connLocks`, `invLocks`, `deleteLock`, `getMsgLocks` +- **Locking**: `connLocks`, `invLocks`, `deleteLock`, `getMsgLocks`, `clientNoticesLock` +- **Service state**: `useClientServices` (per-user boolean controlling whether service certificates are used) +- **Proxy routing**: `smpProxiedRelays` (maps destination transport session → proxy server used) +- **Network state**: `userNetworkInfo`, `userNetworkUpdated`, `useNetworkConfig` (slow/fast pair) All TVars are initialized in `newAgentClient`. The `active` TVar is the global kill switch — `closeAgentClient` sets it to `False`, and all protocol client getters check it first. @@ -36,22 +39,43 @@ All TVars are initialized in `newAgentClient`. The `active` TVar is the global k Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern implemented by [Session.hs](../../../Session.md): 1. **`getSessVar`** atomically checks the TMap. Returns `Left newVar` if absent (caller must connect), `Right existingVar` if present (caller waits for the TMVar). -2. **`newProtocolClient`** wraps the connection attempt. On success, fills the `sessionVar` TMVar with `Right client`. On failure, fills with `Left (error, maybeRetryTime)` and re-throws. +2. **`newProtocolClient`** wraps the connection attempt. On success, fills the `sessionVar` TMVar with `Right client` and writes a `CONNECT` event to `subQ`. On failure, fills with `Left (error, maybeRetryTime)` and re-throws. 3. **`waitForProtocolClient`** reads the TMVar with a timeout. If the stored error has an expiry time that has passed, it removes the SessionVar and retries from scratch — this is the `persistErrorInterval` retry mechanism. +### Error caching with persistErrorInterval + +When `newProtocolClient` fails and `persistErrorInterval > 0`, the error is cached with an expiry timestamp (`Just ts`). Future connection attempts during the interval immediately receive the cached error from `waitForProtocolClient` without attempting a connection. When `persistErrorInterval == 0`, the SessionVar is removed immediately on failure, so the next attempt starts a fresh connection. This prevents connection storms to unreachable routers. + ### SessionVar compare-and-swap `removeSessVar` (Session.hs) only removes a SessionVar from the map if its `sessionVarId` matches the current entry. The `sessionVarId` is a monotonically increasing counter from `workerSeq`. This prevents a stale disconnection callback from removing a *new* client that was created after the old one disconnected. Without this, the sequence "client A disconnects → client B connects → client A's callback runs" would incorrectly remove client B. +### SMP connection — service credentials and session setup + +`smpConnectClient` connects an SMP client, with two important post-connection steps: + +1. **Session ID registration**: `SS.setSessionId` records the TLS session ID in `currentSubs`, linking the transport session to the actual TLS connection for later session validation. + +2. **Service credential synchronization** (`updateClientService`): After connecting, compares client-side and server-side service state. Four cases: + - Both have service and IDs match → update DB (no-op if same) + - Both have service but IDs differ → update DB and remove old queue-service associations + - Client has service, server doesn't → delete client service (handles server version downgrade) + - Server has service, client doesn't → log error (should not happen in normal flow) + +On connection failure, `smpConnectClient` triggers `resubscribeSMPSession` before re-throwing the error. This ensures pending subscriptions get retry logic even when the initial connection attempt fails. + ### SMP disconnect callback -`smpClientDisconnected` is the most complex disconnect handler (NTF/XFTP have simpler versions that just remove the SessionVar): +`smpClientDisconnected` is the most complex disconnect handler (NTF/XFTP have simpler versions that remove the SessionVar and write a `DISCONNECT` event): 1. `removeSessVar` atomically removes the client if still current 2. If `active`, moves active subscriptions to pending (only those matching the disconnecting client's `sessionId` — see next section) 3. Removes proxied relay sessions that this client created -4. Fires `DOWN` events for affected connections -5. Triggers `resubscribeSMPSession` to spawn a reconnection worker +4. Fires `DISCONNECT`, `DOWN`, and `SERVICE_DOWN` events for affected connections +5. Releases GET locks for affected queues +6. Triggers resubscription (see below) + +**Resubscription mode switching**: The disconnect handler chooses between two resubscription paths based on whether the session mode matches the entity presence: `(mode == TSMEntity) == isJust cId`. When they match, it calls `resubscribeSMPSession` which handles both service and queue resubscription in a single worker. When they don't match (e.g., entity-mode session disconnects but there's also a shared session), it separately resubscribes the service and queues, because they belong to different transport sessions. ### Session-aware subscription cleanup @@ -65,6 +89,8 @@ Unifies SMP/NTF/XFTP client management with associated types: SMP is special: `SMPConnectedClient` bundles the protocol client with `proxiedRelays :: TMap SMPServer ProxiedRelayVar`, a per-connection map of relay sessions for proxy routing. +XFTP is special in a different way: its `getProtocolServerClient` ignores the `NetworkRequestMode` parameter and always uses `NRMBackground` for `waitForProtocolClient`. This means XFTP connections always use background timing regardless of the caller's request mode. + ## Worker framework Defined here, consumed by Agent.hs, NtfSubSupervisor.hs, FileTransfer/Agent.hs, and simplex-chat. Two separable parts: @@ -75,8 +101,7 @@ Creates or reuses a worker for a given key. Workers are stored in a TMap keyed b - **Create-or-reuse**: atomically checks the map. If absent, creates a new `Worker` (with `doWork` TMVar pre-filled with `()`). If present and `hasWork=True`, signals the existing worker. - **Fork**: `runWorkerAsync` takes the `action` TMVar. If `Nothing` (worker idle), it starts work. If `Just weakThreadId` (worker running), it puts the value back and returns. This bracket ensures at-most-one concurrent execution. -- **Restart rate limiting**: on worker exit (success or error), checks `restartCount` against `maxWorkerRestartsPerMin`. If under the limit, restarts with `hasWorkToDo` signal. If over the limit, deletes the worker from the map and sends a `CRITICAL True` error. -- **Worker identity**: `workerId` (from `workerSeq`) prevents a stale restart from interfering with a new worker that replaced it in the map. +- **Restart rate limiting**: on worker exit (success or error), `restartOrDelete` checks `restartCount` against `maxWorkerRestartsPerMin`. If under the limit, resets `action` to `Nothing` (idle), signals `hasWorkToDo`, and reports `INTERNAL` error. If over the limit, deletes the worker from the map and sends a `CRITICAL True` error. The restart only happens if the worker's `workerId` still matches the map entry — a stale restart from a replaced worker silently no-ops. `getAgentWorker'` is the generic version with custom worker wrapper — used by `smpDeliveryWorkers` which pairs each Worker with a `TMVar ()` retry lock. @@ -90,16 +115,20 @@ Takes `getWork` (fetch next task) and `action` (process it) as separate paramete - **Work item error** (`isWorkItemError`): the worker stops and sends `CRITICAL False`. The next iteration would likely produce the same error, so stopping prevents infinite loops. - **Store error**: the flag is re-set and an `INTERNAL` error is reported. The assumption is that store errors are transient (e.g., DB busy) and retrying may succeed. -`withWorkItems` handles batched work — a list of items where some may have individual errors. If all items are work-item errors, the worker stops. If only some are, the worker continues with the successful items and reports errors. +`withWorkItems` handles batched work — a list of items where some may have individual errors. If all items are work-item errors, the worker stops. If only some are, the worker continues with the successful items and reports errors via `ERRS` event. ### runWorkerAsync — at-most-one execution Uses a bracket on the `action` TMVar: - `takeTMVar action` — blocks if another thread is starting the worker (TMVar empty during start) -- If the taken value is `Nothing` — worker is idle, start it. Store `Just weakThreadId` in the TMVar. +- If the taken value is `Nothing` — worker is idle, start it. Store `Just weakThreadId` in the TMVar via `forkIO`. - If `Just _` — worker is already running, put it back and return. -The `Weak ThreadId` in `action` is a weak reference — it doesn't prevent the worker thread from being garbage collected. This is the cleanup mechanism: if the thread dies without explicitly clearing `action`, the weak reference becomes stale and the next `runWorkerAsync` call will detect it as idle. +The `Weak ThreadId` in `action` is a weak reference — it doesn't prevent the worker thread from being garbage collected. It is used by `cancelWorker`, which calls `deRefWeak` to get the thread ID and kills it; if the thread was already GC'd, the kill is a no-op. The primary lifecycle management is through the `restartOrDelete` chain in `getAgentWorker'`, not the weak reference. + +### throwWhenNoDelivery — delivery worker self-termination + +Delivery workers call `throwWhenNoDelivery` to check if their entry still exists in the `smpDeliveryWorkers` map. If the worker was removed (delivery complete), it throws `ThreadKilled` to terminate the worker thread. This is distinct from `throwWhenInactive` (which checks global `active` state) — it allows individual workers to be stopped without shutting down the entire agent. ## Operation suspension cascade @@ -118,10 +147,12 @@ The cascade means: **`beginAgentOperation`** retries (blocks in STM) if the operation is suspended. This provides backpressure: new operations wait until the operation is resumed. -**`agentOperationBracket`** wraps an operation with begin/end. All database access goes through `withStore` which brackets with `AODatabase`. This ensures graceful shutdown propagates: suspending `AORcvNetwork` eventually suspends all downstream operations, and `notifySuspended` only fires when all in-flight operations have completed. +**`agentOperationBracket`** wraps an operation with begin/end. It takes a `check` function that runs before `beginAgentOperation` — typically `throwWhenInactive`, which throws `ThreadKilled` if the agent is inactive. All database access goes through `withStore` which brackets with `AODatabase`. This ensures graceful shutdown propagates: suspending `AORcvNetwork` eventually suspends all downstream operations, and `notifySuspended` only fires when all in-flight operations have completed. **`waitWhileSuspended`** vs **`waitUntilForeground`**: `waitWhileSuspended` proceeds during `ASSuspending` (allowing in-flight operations to complete), while `waitUntilForeground` blocks during both `ASSuspending` and `ASSuspended`. +**`waitForUserNetwork`**: bounded wait for network — if the network doesn't come online within `userNetworkInterval`, proceeds anyway. Uses `registerDelay` for the timeout. + ## Subscription management ### subscribeQueues — batch-by-transport-session @@ -133,17 +164,28 @@ The cascade means: 3. `addPendingSubs` marks all queues as pending before the RPC 4. `mapConcurrently` subscribes each session batch in parallel -### subscribeSessQueues_ — post-hoc session validation +### subscribeSessQueues_ — post-hoc session validation and atomicity + +After the subscription RPC completes, `subscribeSessQueues_` validates `activeClientSession` — checking that the SessionVar still holds the same client that was used for the RPC. If the client was replaced during the RPC (reconnection happened), the results are discarded (errors converted to temporary `BROKER NETWORK` to ensure retry) and resubscription is triggered. + +The post-RPC processing runs under `uninterruptibleMask_` for atomicity. The sequence is: +1. **Atomically**: `processSubResults` partitions results and updates subscription state; if there are client notices, takes `clientNoticesLock` TMVar +2. **IO**: `processRcvServiceAssocs` updates service associations in the DB +3. **IO**: `processClientNotices` updates notice state, always releases `clientNoticesLock` in `finally` + +The `clientNoticesLock` TMVar serializes notice processing across concurrent subscription batches. + +**UP events for newly-active connections only**: After processing, UP events are sent only for connections that were NOT already active before this batch — existing active subscriptions (from `SS.getActiveSubs`) are excluded to prevent duplicate notifications. -After the subscription RPC completes, `subscribeSessQueues_` validates `activeClientSession` — checking that the SessionVar still holds the same client that was used for the RPC. If the client was replaced during the RPC (reconnection happened), the results are discarded and resubscription is triggered. This is optimistic execution with post-hoc validation: do the work, then check if it's still valid. +**Client close on all-temporary-error**: When ALL subscription results are temporary errors, no connections were already active, and the session is still current, the SMP client session is closed. This forces a fresh connection on the next attempt rather than reusing a potentially broken one. ### processSubResults — partitioning -Subscription results are partitioned into four categories: -1. **Failed with client notice** — queue has a server-side notice (e.g., queue status change) -2. **Failed permanently** — non-temporary error, queue is removed from pending and added to `removedSubs` -3. **Failed temporarily** — error is transient, queue stays in pending for retry on reconnect -4. **Subscribed** — moved from pending to active. Further split into: queues whose service ID matches the session service (added as service-associated) and others. +Subscription results are partitioned into five categories: +1. **Failed with client notice** — error has an associated server-side notice (e.g., queue status change). Queue is treated as failed (removed from pending, added to `removedSubs`) AND the notice is recorded for processing. +2. **Failed permanently** — non-temporary error without notice, queue is removed from pending and added to `removedSubs` +3. **Failed temporarily** — error is transient, queue stays in pending unchanged for retry on reconnect +4. **Subscribed** — moved from pending to active. Further split into: queues whose service ID matches the session service (added as service-associated) and others. If the queue had a tracked `clientNoticeId`, it is cleared (notice resolved by successful subscription). 5. **Ignored** — queue was not in the pending map (already activated by a concurrent path), counted for statistics only ### Resubscription worker @@ -155,6 +197,8 @@ Subscription results are partitioned into four categories: 3. Resubscribes service and queues 4. Loops until no pending subs remain +**Spawn guard**: Before creating a new worker, `resubscribeSMPSession` checks `SS.hasPendingSubs`. If there are no pending subs, it returns without spawning. This prevents creating idle workers. + **Cleanup blocks on TMVar fill** — the `cleanup` STM action retries (`whenM (isEmptyTMVar $ sessionVar v) retry`) until the async handle is inserted. This prevents the race where cleanup runs before the worker async is stored, which would leave a terminated worker in the map. ## Proxy routing — sendOrProxySMPCommand @@ -166,6 +210,21 @@ Implements SMP proxy/direct routing with fallback: 3. If proxying fails with a host error and `smpProxyFallback` allows it: falls back to direct connection 4. `deleteRelaySession` carefully validates that the current relay session matches the one that failed before removing it (prevents removing a concurrently-created replacement session) +**NO_SESSION retry limit**: On `NO_SESSION`, `sendViaProxy` is called recursively with `Just proxySrv` to reuse the same proxy server. If the recursive call also gets `NO_SESSION`, it throws `proxyError` instead of recursing again — `proxySrv_` is `Just`, so the `Nothing` branch (which recurses) is not taken. This limits retry to exactly one attempt. + +**Proxy selection caching** (`smpProxiedRelays`): When `getSMPProxyClient` selects a proxy for a destination, it atomically inserts the proxy→destination mapping into `smpProxiedRelays`. If a mapping already exists (another thread selected a proxy for the same destination), the existing mapping is used. On relay creation failure with non-host errors, both the relay session and proxy mapping are removed. On host errors, they are preserved to allow fallback logic. + +## Service credentials lifecycle + +`getServiceCredentials` manages per-user, per-server service certificate credentials: + +1. Checks `useClientServices` — if the user has services disabled, returns `Nothing` +2. Looks up existing credentials in DB via `getClientServiceCredentials` +3. If none exist, generates new TLS credentials on-the-fly (`genCredentials`) and stores them +4. Extracts the private signing key from the X.509 certificate + +The generated credentials are Ed25519 self-signed certificates with `simplex` organization, valid for ~2740 years. The certificate chain and hash are bundled into `ServiceCredentials` for the SMP handshake. + ## withStore — database access bracket `withStore` wraps database access with `agentOperationBracket c AODatabase`, ensuring the operation suspension cascade is respected. SQLite errors are classified: @@ -174,6 +233,8 @@ Implements SMP proxy/direct routing with fallback: `SEAgentError` is a special wrapper that allows agent-level errors to be threaded through store operations — used when "transaction-like" access is needed but the operation involves agent logic, not just DB queries. See source comment: "network IO should NOT be used inside AgentStoreMonad." +`withStoreBatch` / `withStoreBatch'` run multiple DB operations in a single transaction, catching exceptions per-operation to report individual failures. The entire batch is within one `agentOperationBracket`. + ## Server selection — getNextServer / withNextSrv Server selection has two-level diversity: @@ -184,6 +245,14 @@ Server selection has two-level diversity: `withNextSrv` is designed for retry loops — it re-reads user servers on each call (allowing configuration changes during retries) and tracks `triedHosts` across attempts. When all hosts are tried, the tried set is reset (`S.empty`), creating a round-robin effect. +## Locking primitives + +**`withConnLock`**: Per-connection lock via `connLocks` TMap. Non-obvious: `withConnLock'` with empty `ConnId` is a no-op (identity function) — allows agent operations on entities without real connection IDs to skip locking. + +**`withConnLocks`**: Takes a `Set ConnId` and acquires locks for all connections. Uses `withGetLocks` which acquires all locks concurrently via `forConcurrently`. Note: concurrent acquisition of overlapping lock sets from different threads could theoretically deadlock, so callers must ensure non-overlapping lock sets or use a higher-level coordination. + +**`getMapLock`**: Creates a lock on first access and caches it in the TMap. Locks are never removed — the TMap grows monotonically. + ## Network configuration — slow/fast selection `getNetworkConfig` selects between slow and fast network configs based on `userNetworkInfo`: @@ -198,11 +267,15 @@ Both configs are stored together in `useNetworkConfig :: TVar (NetworkConfig, Ne 2. Closes all protocol server clients (SMP, NTF, XFTP) by swapping maps to empty and forking close threads 3. Clears proxied relays 4. Cancels resubscription workers — forks cancellation threads (fire-and-forget, `closeAgentClient` may return before all workers are cancelled) -5. Clears delivery and async command workers +5. Clears delivery and async command workers (delivery workers are also cancelled via `cancelWorker`) 6. Clears subscription state The cancellation of resubscription workers reads the TMVar first (to get the Async handle), then calls `uninterruptibleCancel`. This is wrapped in a forked thread to avoid blocking the shutdown sequence. +**`closeClient_` edge case**: When closing individual clients, `closeClient_` handles `BlockedIndefinitelyOnSTM` — which occurs if the SessionVar TMVar was never filled (connection attempt in progress when shutdown started). The exception is caught and treated as a no-op. + +**`reconnectServerClients` vs `closeProtocolServerClients`**: `closeProtocolServerClients` swaps the map to empty and closes all clients — no new connections can be made to those sessions. `reconnectServerClients` reads the map without clearing it and closes current clients — the disconnect callbacks trigger reconnection, effectively forcing fresh connections while keeping the session entries. + ## Transport session modes `TransportSessionMode` (`TSMEntity` vs other) determines whether the transport session key includes the entity ID (connection/queue ID). When `TSMEntity`, each queue gets its own TLS connection to the router. When not, queues to the same router share a connection. This is controlled by `sessionMode` in the network config. @@ -213,9 +286,15 @@ The cancellation of resubscription workers reads the TMVar first (to get the Asy `getQueueMessage` creates a TMVar lock keyed by `(server, rcvId)` and takes it before sending GET. This prevents concurrent GET and SUB on the same queue (SUB is checked via `hasGetLock` in `checkQueues`). The lock is released by `releaseGetLock` after ACK or on error. +The lock creation uses `TM.alterF` to atomically create-or-reuse: if no lock exists, creates a new `TMVar ()` and immediately takes it; if one exists, takes it. This avoids a race between two concurrent GET attempts on the same queue. + ## Error classification — temporaryAgentError Classifies errors as temporary (retryable) or permanent. Notable non-obvious classifications: - `TEHandshake BAD_SERVICE` is temporary — it indicates a DB error on the router, not a permanent rejection -- `CRITICAL True` is temporary — `True` means the error shows a restart button, implying the user should retry +- `CRITICAL True` is temporary — `True` means the error shows a restart button, implying the user should retry. `CRITICAL False` is permanent. - `INACTIVE` is temporary — the agent may be reactivated +- `SMP.PROXY NO_SESSION` via proxy is temporary — session can be re-established +- `SMP.STORE _` is temporary — server-side store error, not a client issue + +`temporaryOrHostError` extends `temporaryAgentError` to also include host-related errors (`HOST`, `TRANSPORT TEVersion`). Used in subscription management where host errors should trigger resubscription rather than permanent failure. From 8557d2ab291d4c191307e87b86ad9d664dd3e627 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 11:18:29 +0000 Subject: [PATCH 20/61] agent util specs --- .../Simplex/Messaging/Agent/Env/SQLite.md | 9 +++ spec/modules/Simplex/Messaging/Agent/Lock.md | 7 +++ .../Simplex/Messaging/Agent/QueryString.md | 7 +++ .../Simplex/Messaging/Agent/RetryInterval.md | 35 +++++++++++ spec/modules/Simplex/Messaging/Agent/Stats.md | 7 +++ .../Simplex/Messaging/Agent/TSessionSubs.md | 60 +++++++++++++++++++ 6 files changed, 125 insertions(+) create mode 100644 spec/modules/Simplex/Messaging/Agent/Env/SQLite.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Lock.md create mode 100644 spec/modules/Simplex/Messaging/Agent/QueryString.md create mode 100644 spec/modules/Simplex/Messaging/Agent/RetryInterval.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Stats.md create mode 100644 spec/modules/Simplex/Messaging/Agent/TSessionSubs.md diff --git a/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md b/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md new file mode 100644 index 000000000..7bfb10bbc --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md @@ -0,0 +1,9 @@ +# Simplex.Messaging.Agent.Env.SQLite + +> Agent environment configuration, default values, and worker/supervisor record types. + +**Source**: [`Agent/Env/SQLite.hs`](../../../../../../src/Simplex/Messaging/Agent/Env/SQLite.hs) + +## mkUserServers — silent fallback on all-disabled + +See comment on `mkUserServers`. If filtering servers by `enabled && role` yields an empty list, `fromMaybe srvs` falls back to *all* servers regardless of enabled/role status. This prevents a configuration where all servers are disabled from leaving the user with no servers — but means disabled servers can still be used if every server in a role is disabled. diff --git a/spec/modules/Simplex/Messaging/Agent/Lock.md b/spec/modules/Simplex/Messaging/Agent/Lock.md new file mode 100644 index 000000000..8300266c7 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Lock.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Lock + +> TMVar-based named mutex with concurrent multi-lock acquisition. + +**Source**: [`Agent/Lock.hs`](../../../../../src/Simplex/Messaging/Agent/Lock.hs) + +No non-obvious behavior. See source. See comment on `getPutLock` for the atomicity argument. diff --git a/spec/modules/Simplex/Messaging/Agent/QueryString.md b/spec/modules/Simplex/Messaging/Agent/QueryString.md new file mode 100644 index 000000000..cfcd99451 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/QueryString.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.QueryString + +> HTTP query string parsing utilities for connection link URIs. + +**Source**: [`Agent/QueryString.hs`](../../../../../src/Simplex/Messaging/Agent/QueryString.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Agent/RetryInterval.md b/spec/modules/Simplex/Messaging/Agent/RetryInterval.md new file mode 100644 index 000000000..dbc5c35f4 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/RetryInterval.md @@ -0,0 +1,35 @@ +# Simplex.Messaging.Agent.RetryInterval + +> Retry-with-backoff combinators for agent reconnection and worker loops. + +**Source**: [`Agent/RetryInterval.hs`](../../../../../src/Simplex/Messaging/Agent/RetryInterval.hs) + +## Overview + +Four retry combinators with increasing sophistication: basic (`withRetryInterval`), counted (`withRetryIntervalCount`), foreground-aware (`withRetryForeground`), and dual-interval with external wake-up (`withRetryLock2`). All share the same backoff curve via `nextRetryDelay`. + +## Backoff curve — nextRetryDelay + +Delay stays constant at `initialInterval` until `elapsed >= increaseAfter`, then grows by 1.5x per step (`delay * 3 / 2`) up to `maxInterval`. The `delay == maxInterval` guard short-circuits the comparison once the cap is reached. + +## updateRetryInterval2 — resume from saved state + +Sets `increaseAfter = 0` on both intervals. This skips the initial constant-delay phase — the next retry will immediately begin increasing from the saved interval. Used to restore retry state across reconnections without restarting from the initial interval. + +## withRetryForeground — reset on foreground/online transition + +The retry loop resets to `initialInterval` when either: +- The app transitions from background to foreground (`not wasForeground && foreground`) +- The network transitions from offline to online (`not wasOnline && online`) + +The STM transaction blocks on three things simultaneously: the `registerDelay` timer, the `isForeground` TVar, and the `isOnline` TVar. Whichever fires first unblocks the retry. On reset, elapsed time is zeroed. + +The `registerDelay` is capped at `maxBound :: Int` (~36 minutes on 32-bit) to prevent overflow. + +## withRetryLock2 — interruptible dual-interval retry + +Maintains two independent backoff states (slow and fast) that the action toggles between by calling the loop continuation with `RISlow` or `RIFast`. Only the chosen interval advances; the other preserves its state. + +The `wait` function is the non-obvious part: it spawns a timer thread that puts `()` into the `lock` TMVar after the delay, while the main thread blocks on `takeTMVar lock`. This means the retry can be woken early by *external code* putting into the same TMVar — the timer is just a fallback. The `waiting` TVar prevents a stale timer from firing after the main thread has already been woken by an external signal. + +**Consumed by**: [Agent/Client.hs](./Client.md) — `reconnectSMPClient` uses the lock TMVar to allow immediate reconnection when new subscriptions arrive, rather than waiting for the full backoff delay. diff --git a/spec/modules/Simplex/Messaging/Agent/Stats.md b/spec/modules/Simplex/Messaging/Agent/Stats.md new file mode 100644 index 000000000..d793564e7 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Stats.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Stats + +> Per-server statistics counters (SMP, XFTP, NTF) with TVar-based live state and serializable snapshots. + +**Source**: [`Agent/Stats.hs`](../../../../../src/Simplex/Messaging/Agent/Stats.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md b/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md new file mode 100644 index 000000000..0274de59d --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md @@ -0,0 +1,60 @@ +# Simplex.Messaging.Agent.TSessionSubs + +> Per-session subscription state machine tracking active and pending queue subscriptions. + +**Source**: [`Agent/TSessionSubs.hs`](../../../../../src/Simplex/Messaging/Agent/TSessionSubs.hs) + +## Overview + +TSessionSubs manages the two-tier (active/pending) subscription state for SMP queues, keyed by transport session. Every subscription confirmation from a router is validated against the current session ID before being promoted to active — if the session has changed (reconnect happened), the subscription is demoted to pending for resubscription. + +Service subscriptions (aggregate, router-managed) and queue subscriptions (individual, per-recipient-ID) are tracked separately but follow the same active/pending pattern. + +**Consumed by**: [Agent/Client.hs](./Client.md) — `subscribeSMPQueues`, `subscribeSessQueues_`, `resubscribeSMPSession`, `smpClientDisconnected`. + +## Session ID gating + +The central invariant: a subscription is only active if it was confirmed on the *current* TLS session. Every function that promotes subscriptions to active (`addActiveSub'`, `batchAddActiveSubs`, `setActiveServiceSub`) checks `Just sessId == sessId'` (stored session ID). On mismatch, the subscription goes to pending instead — silently, with no error. + +This means subscription RPCs that succeed but return after a reconnect are safely caught: the response carries the old session ID, which won't match the new one stored by `setSessionId`. + +## setSessionId — silent demotion on reconnect + +`setSessionId` has two behaviors: +- **First call** (stored is `Nothing`): stores the session ID. No side effects. +- **Subsequent call with different ID**: calls `setSubsPending_`, which moves *all* active subscriptions to pending and demotes the active service subscription. The new session ID is stored. +- **Same ID**: no-op (the `unless` guard). + +This is the mechanism by which reconnection invalidates all prior subscriptions. Callers don't need to explicitly move subscriptions — setting the new session ID does it atomically. + +## addActiveSub' — service-associated queue elision + +When `serviceId_` is `Just` and `serviceAssoc` is `True`, the queue is **not** added to `activeSubs`. Instead, `updateActiveService` increments the service subscription's count and XORs the queue's `IdsHash`. The queue is also removed from `pendingSubs`. + +This means service-associated queues have no individual representation in `activeSubs` — they exist only as aggregated count + hash in `activeServiceSub`. The router tracks them via the service subscription; the agent doesn't need per-queue state. + +When `serviceAssoc` is `False` (or no service ID), the queue goes to `activeSubs` normally. + +## updateActiveService — accumulative XOR merge + +`updateActiveService` adds to an existing `ServiceSub` rather than replacing it. It increments the queue count (`n + addN`) and appends the IdsHash (`idsHash <> addIdsHash`). The `<>` on `IdsHash` is XOR — this means the hash is order-independent and can be built incrementally as individual subscription confirmations arrive. + +The guard `serviceId == serviceId'` silently drops updates if the service ID has changed (e.g., credential rotation happened between individual queue confirmations). + +## setSubsPending — mode-dependent redistribution + +`setSubsPending` handles two cases based on whether the transport session mode (entity vs shared) matches the session key shape: + +1. **Mode matches key shape** (`entitySession == isJust connId_`): in-place demotion via `setSubsPending_` — active subs move to pending within the same `SessSubs` entry. Session ID is cleared (`Nothing`). + +2. **Mode mismatch** (e.g., switching from shared session to entity mode): the entire `SessSubs` entry is **deleted** from the map (`TM.lookupDelete`), and all subscriptions are redistributed to new per-entity session keys via `addPendingSub (uId, srv, sessEntId (connId rq))`. This changes the map granularity — one shared entry becomes many entity entries. + +Both paths check `Just sessId == sessId'` first — if the stored session ID doesn't match the one being invalidated, no work is done (returns empty). + +## getSessSubs — lazy initialization + +`getSessSubs` creates a new `SessSubs` entry if none exists for the transport session. This means any write operation (`addPendingSub`, `setSessionId`, etc.) will create map entries as a side effect. Read operations (`hasActiveSub`, `getActiveSubs`) use `lookupSubs` instead, which returns `Nothing`/empty without creating entries. + +## updateClientNotices + +Adjusts the `clientNoticeId` field on pending subscriptions in bulk. Uses `M.adjust`, so missing recipient IDs are silently skipped. Only modifies pending subs — active subs are not touched because they've already been confirmed. From bde90500ea50ef180ba95abc4b7393f8c37119ef Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 12:00:22 +0000 Subject: [PATCH 21/61] agent store and notifications specs --- .../Messaging/Agent/NtfSubSupervisor.md | 75 ++++++++++++++++++ spec/modules/Simplex/Messaging/Agent/Store.md | 44 +++++++++++ .../Messaging/Agent/Store/AgentStore.md | 76 ++++++++++++++++++ .../Simplex/Messaging/Agent/Store/Common.md | 7 ++ .../Simplex/Messaging/Agent/Store/DB.md | 7 ++ .../Simplex/Messaging/Agent/Store/Entity.md | 7 ++ .../Messaging/Agent/Store/Interface.md | 7 ++ .../Simplex/Messaging/Agent/Store/Postgres.md | 23 ++++++ .../Simplex/Messaging/Agent/Store/SQLite.md | 26 ++++++ .../Simplex/Messaging/Agent/Store/Shared.md | 7 ++ .../Simplex/Messaging/Notifications/Client.md | 15 ++++ .../Messaging/Notifications/Protocol.md | 43 ++++++++++ .../Simplex/Messaging/Notifications/Server.md | 79 +++++++++++++++++++ .../Messaging/Notifications/Server/Control.md | 7 ++ .../Messaging/Notifications/Server/Env.md | 21 +++++ .../Messaging/Notifications/Server/Main.md | 7 ++ .../Notifications/Server/Push/APNS.md | 35 ++++++++ .../Server/Push/APNS/Internal.md | 7 ++ .../Messaging/Notifications/Server/Stats.md | 19 +++++ .../Messaging/Notifications/Server/Store.md | 23 ++++++ .../Notifications/Server/Store/Postgres.md | 54 +++++++++++++ .../Notifications/Server/Store/Types.md | 7 ++ .../Messaging/Notifications/Transport.md | 36 ++++----- .../Simplex/Messaging/Notifications/Types.md | 19 +++++ 24 files changed, 630 insertions(+), 21 deletions(-) create mode 100644 spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/Common.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/DB.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/Entity.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/Interface.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/Postgres.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/SQLite.md create mode 100644 spec/modules/Simplex/Messaging/Agent/Store/Shared.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Client.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Protocol.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Control.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Env.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Main.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS/Internal.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Stats.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Store.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Server/Store/Types.md create mode 100644 spec/modules/Simplex/Messaging/Notifications/Types.md diff --git a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md new file mode 100644 index 000000000..33cd3eacb --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md @@ -0,0 +1,75 @@ +# Simplex.Messaging.Agent.NtfSubSupervisor + +> Supervisor-worker architecture for notification subscription lifecycle management. + +**Source**: [`Agent/NtfSubSupervisor.hs`](../../../../../src/Simplex/Messaging/Agent/NtfSubSupervisor.hs) + +## Architecture + +The notification system uses a supervisor with **three worker pools**, each keyed by server address: + +| Pool | Key | Purpose | +|------|-----|---------| +| `ntfWorkers` | NtfServer | Create/check/delete/rotate subscriptions on notification router | +| `ntfSMPWorkers` | SMPServer | Create/delete notifier credentials on messaging router | +| `ntfTknDelWorkers` | NtfServer | Delete tokens on notification router (background cleanup) | + +The supervisor (`runNtfSupervisor`) reads commands from `ntfSubQ` and dispatches work to the appropriate pools. Workers are created lazily via `getAgentWorker` and process batches from the database. + +## Non-obvious behavior + +### 1. NSCCreate four-way partition + +`partitionQueueSubActions` classifies each (queue, subscription) pair into one of four buckets: + +- **New sub**: no existing subscription record — create from scratch +- **Reset sub**: credentials mismatch (SMP server changed, notifier ID changed, action was nulled by error, or action is a delete) — wipe and restart from SMP key exchange +- **Continue SMP work**: existing action is `NSASMP` and credentials are consistent — kick the SMP worker +- **Continue NTF work**: existing action is `NSANtf` and credentials are consistent — kick the NTF worker + +The key decision point: when `subAction_` is `Nothing` (set by `workerErrors` after permanent failures), the subscription is treated as needing a full reset. This interacts with the null-action sentinel pattern from `AgentStore`. + +### 2. retrySubActions shrinking retry with TVar + +`retrySubActions` holds the list of subs-to-retry in a `TVar`. Each iteration, the action function returns only the subs that got temporary errors (via `splitResults`). The `TVar` is overwritten with this shrinking list. On success or permanent error, subs drop out. This means retry batches get smaller over time. + +`splitResults` implements a three-way partition: temporary errors → retry, permanent errors → null the action + notify, successes → continue pipeline. + +### 3. rescheduleWork deferred wake-up + +When the NTF worker finds that all pending `NSACheck` actions have future timestamps, it does not spin-wait. Instead it: +1. Takes itself out of the `doWork` TMVar (so the worker blocks on `waitForWork`) +2. Forks a thread that sleeps until the first action's timestamp +3. The forked thread re-signals `doWork` when the time arrives + +This is the mechanism for time-scheduled subscription health checks. + +### 4. checkSubs AUTH triggers full recreation + +When the notification router returns `AUTH` for a subscription check, the subscription is not simply marked as failed — it is fully recreated from scratch by resetting to `NSASMP NSASmpKey` state. This handles the case where the notification router has lost its subscription state (restart, data loss). The SMP worker is kicked to re-establish notifier credentials. + +Non-AUTH failure statuses that are not in `subscribeNtfStatuses` also trigger recreation. + +### 5. deleteToken two-phase with restart survival + +Token deletion splits into two phases: +1. **Store phase**: Remove token from active store, persist `(server, privateKey, tokenId)` to a deletion queue via `addNtfTokenToDelete` +2. **Network phase**: `runNtfTknDelWorker` reads from the queue and performs the actual server-side deletion + +On supervisor startup, `startTknDelete` scans for any pending deletion queue entries and launches workers. This ensures token cleanup survives agent restarts. + +If the token has no server-side ID (`ntfTokenId = Nothing`), only the store phase runs — no worker is launched. + +### 6. workerErrors nulls subscription action + +When permanent (non-temporary, non-host) errors occur in batch operations, `workerErrors` sets the subscription's action to `NULL` in the database and notifies the client. The next `NSCCreate` for that connection will see `subAction_ = Nothing` in `contOrReset` and trigger a full subscription reset. + +This null-action sentinel is the bridge between worker failure recovery and supervisor-driven re-creation. + +### 7. NSADelete and NSARotate are deprecated + +These NTF worker actions are no longer generated by current code but are kept for processing legacy database records. They are explicitly not batched (processed one at a time via `mapM`). `NSARotate` deletes the subscription then re-queues `NSCCreate` back to the supervisor. + +### 8. Stats counting groups by userId + +`incStatByUserId` groups batch subscriptions by `userId` before incrementing stats counters, ensuring per-user counts are accurate even when a single batch contains subscriptions from multiple users. diff --git a/spec/modules/Simplex/Messaging/Agent/Store.md b/spec/modules/Simplex/Messaging/Agent/Store.md new file mode 100644 index 000000000..0eecbf8d1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store.md @@ -0,0 +1,44 @@ +# Simplex.Messaging.Agent.Store + +> Domain entity types for agent persistence — queues, connections, messages, commands, and store errors. + +**Source**: [`Agent/Store.hs`](../../../../../src/Simplex/Messaging/Agent/Store.hs) + +## Overview + +This module defines the data types that represent agent state. It contains no database operations — those are in [AgentStore.hs](./Store/AgentStore.md). The key abstractions are: + +- **Queue types** (`StoredRcvQueue`, `StoredSndQueue`) parameterized by `DBStored` phantom type for new vs persisted distinction +- **Connection GADT** (`Connection'`) encoding the connection state machine at the type level +- **Message containers** (`RcvMsgData`, `SndMsgData`, `PendingMsgData`) for the message lifecycle +- **Store errors** (`StoreError`) including two sentinel errors with special semantics + +## Connection' — type-level state machine + +The `Connection'` GADT encodes connection lifecycle as a type parameter: `CNew` → `CRcv`/`CSnd` → `CDuplex`, plus `CContact` for reusable contact connections. `SomeConn` wraps an existential to store connections of unknown type. + +`TestEquality SConnType` deliberately omits `SCNew` — `testEquality SCNew SCNew` returns `Nothing`. This is intentional: `NewConnection` has no queues and is not a valid target for type-level connection matching in store operations. + +## canAbortRcvSwitch — race condition boundary + +See comments on `canAbortRcvSwitch`. The `RSSendingQUSE` and `RSReceivedMessage` states cannot be aborted because the sender may have already deleted the original queue. Aborting (deleting the new queue) at that point would break the connection with no recovery path. + +## ratchetSyncAllowed / ratchetSyncSendProhibited — cross-repo contract + +See comments on `ratchetSyncAllowed`. Both functions carry the comment "this function should be mirrored in the clients" — simplex-chat must implement identical logic. The agent enforces these state checks, but the chat client also needs them for UI decisions (e.g., disabling send when `ratchetSyncSendProhibited`). + +## SEWorkItemError — worker suspension sentinel + +`SEWorkItemError` is a sentinel error that triggers worker suspension when encountered during work item retrieval. The `AnyStoreError` typeclass exposes `isWorkItemError` for the worker framework ([Agent/Client.hs](./Client.md)) to detect this case. The comment "do not use!" means it should not be thrown for normal error conditions — only when the work item itself is corrupt/unreadable and the worker should stop rather than retry. + +## SEAgentError — store-level error wrapping + +`SEAgentError` wraps `AgentErrorType` inside store operations. This allows store functions to return agent-level errors (e.g., connection state violations detected during a DB transaction) without breaking the `ExceptT StoreError` type. The "to avoid race conditions" rationale: checking a condition and acting on it must happen in the same DB transaction, so the agent error is returned through the store error channel. + +## InvShortLink — secure-on-read semantics + +See comment on `InvShortLink`. Stored separately from the connection because 1-time invitation short links have a "secure-on-read" property: accessing the link data on the router marks it as read, preventing undetected observation. The `sndPrivateKey` is persisted to allow retries of the link creation without generating new keys. + +## RcvQueueSub — subscription-optimized projection + +`RcvQueueSub` strips cryptographic fields from `RcvQueue`, keeping only what's needed for subscription tracking in [TSessionSubs](./TSessionSubs.md). This reduces memory pressure when tracking thousands of subscriptions in STM. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md new file mode 100644 index 000000000..9fbc2beb3 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md @@ -0,0 +1,76 @@ +# Simplex.Messaging.Agent.Store.AgentStore + +> Core CRUD operations for agent persistence — users, connections, queues, messages, ratchets, notifications, and file transfers. + +**Source**: [`Agent/Store/AgentStore.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/AgentStore.hs) + +## Overview + +At ~3700 lines, this is the largest module in the codebase. It implements all database operations for the agent, compiled with CPP for both SQLite and PostgreSQL backends. Most functions are straightforward SQL CRUD, but several patterns are non-obvious. + +The module re-exports `withConnection`, `withTransaction`, `withTransactionPriority`, `firstRow`, `firstRow'`, `maybeFirstRow`, and `fromOnlyBI` from the backend-specific Common module. + +## Dual-backend compilation + +The module uses `#if defined(dbPostgres)` throughout. Key behavioral differences: +- **Row locking**: PostgreSQL uses `FOR UPDATE` on reads that precede writes (e.g., `getConnForUpdate`, `getRatchetForUpdate`, `retrieveLastIdsAndHashRcv_`). SQLite relies on its single-writer model instead. +- **Batch queries**: PostgreSQL uses `IN ?` with `In` wrapper for batch operations. SQLite falls back to per-row `forM` loops. +- **Constraint handling**: PostgreSQL uses `constraintViolation`, SQLite checks `SQL.ErrorConstraint`. + +## getWorkItem / getWorkItems — worker store pattern + +`getWorkItem` implements the store-side pattern for the [worker framework](../Client.md): `getId → getItem → markFailed`. If `getId` or `getItem` throws an IO exception, `handleWrkErr` wraps it as `SEWorkItemError` (via `mkWorkItemError`), which signals the worker to suspend rather than retry. This prevents crash loops on corrupt data. + +`getWorkItems` extends this to batch work items, where each item failure is independent. + +**Consumed by**: `getPendingQueueMsg`, `getPendingServerCommand`, `getNextNtfSubNTFActions`, `getNextNtfSubSMPActions`, `getNextDeletedSndChunkReplica`, `getNextNtfTokenToDelete`. + +## Notification subscription — supervisor/worker coordination + +`updateNtfSubscription`, `setNullNtfSubscriptionAction`, and `deleteNtfSubscription` all check `updated_by_supervisor` before writing. When `True`, the worker only updates local fields (ntf IDs, status) and skips action/server fields that the supervisor may have changed. This prevents the worker from overwriting supervisor decisions during concurrent execution. + +`markUpdatedByWorker` resets the flag to `False` before each work item is processed, so the worker "claims" the subscription for the duration of its operation. + +## createServer / getServerKeyHash_ — key hash migration + +`createServer` returns `Maybe KeyHash`: `Nothing` means the server was newly created with the passed hash; `Just kh` means the server already existed and the passed hash differs from the stored one. This `Just` value is stored as `server_key_hash` on queues to allow per-queue key hash overrides. + +The `COALESCE(q.server_key_hash, s.key_hash)` pattern appears throughout queries — queues can override the server-level hash, enabling gradual migration when a router's identity key changes. + +## updateRcvMsgHash / updateSndMsgHash — race condition guard + +Both functions include `AND last_internal_*_msg_id = ?` in their UPDATE WHERE clause. This prevents a race: if another message was processed between `updateIds` and `updateHash` (incrementing the last ID), the hash update is silently skipped rather than corrupting the chain. See comments on these functions. + +## deleteConn — conditional delivery wait + +Three deletion paths: +1. No timeout: immediate delete. +2. Timeout + no pending deliveries: immediate delete. +3. Timeout + pending deliveries + `deleted_at_wait_delivery` expired: delete. +4. Timeout + pending deliveries + not expired: return `Nothing` (skip). + +This allows graceful delivery completion before connection cleanup. + +## createSndConn — confirmed queue guard + +See comment on `createSndConn`. Checks `checkConfirmedSndQueueExists_` before creating, because `insertSndQueue_` uses `ON CONFLICT DO UPDATE` which would silently replace an existing confirmed send queue. The pre-check prevents this destructive upsert. + +## insertRcvQueue_ / insertSndQueue_ — queue ID preservation + +Both functions check if a queue already exists (by server + queue ID) and reuse the existing database `queue_id`. If not found, they generate the next sequential ID (`MAX + 1`). This preserves database IDs across retries of queue creation. + +## createClientService — service_id reset on upsert + +The `ON CONFLICT DO UPDATE` clause sets `service_id = NULL` when credentials are updated. This forces re-registration with the router after credential rotation — the old service ID is invalidated. + +## deleteSndMsgDelivery — conditional message retention + +After removing the delivery record, checks whether any pending deliveries remain for the message. If none remain and the receipt status is `MROk`, the entire message is deleted. Otherwise, if `keepForReceipt` is true, only the message body is cleared (for debugging receipt mismatches). Handles shared `snd_message_bodies` with `FOR UPDATE` locking on PostgreSQL to prevent concurrent deletion races. + +## createWithRandomId' — bounded retry + +Generates random 12-byte IDs (base64url encoded) and retries up to 3 times on constraint violations (unique ID collision). Returns `SEUniqueID` if all attempts fail. + +## setRcvQueuePrimary / setSndQueuePrimary — two-step primary swap + +First clears primary flag on all queues in the connection, then sets it on the target queue. Also clears `replace_*_queue_id` on the new primary — this completes the queue rotation by removing the "replacing" marker. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/Common.md b/spec/modules/Simplex/Messaging/Agent/Store/Common.md new file mode 100644 index 000000000..45db84995 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/Common.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Store.Common + +> CPP-conditional re-export of backend-specific common utilities (DBStore, withConnection, withTransaction). + +**Source**: [`Agent/Store/Common.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/Common.hs) + +No non-obvious behavior. See source. One of three CPP re-export wrappers (Interface, Common, DB). diff --git a/spec/modules/Simplex/Messaging/Agent/Store/DB.md b/spec/modules/Simplex/Messaging/Agent/Store/DB.md new file mode 100644 index 000000000..70be997d6 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/DB.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Store.DB + +> CPP-conditional re-export of backend-specific database primitives (Connection, FromField, ToField). + +**Source**: [`Agent/Store/DB.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/DB.hs) + +No non-obvious behavior. See source. One of three CPP re-export wrappers (Interface, Common, DB). diff --git a/spec/modules/Simplex/Messaging/Agent/Store/Entity.md b/spec/modules/Simplex/Messaging/Agent/Store/Entity.md new file mode 100644 index 000000000..801398f1a --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/Entity.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Store.Entity + +> Phantom-typed database entity IDs distinguishing new (unsaved) from stored records. + +**Source**: [`Agent/Store/Entity.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/Entity.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/Interface.md b/spec/modules/Simplex/Messaging/Agent/Store/Interface.md new file mode 100644 index 000000000..923cbfca9 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/Interface.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Store.Interface + +> CPP-conditional re-export of the active database backend (SQLite or PostgreSQL). + +**Source**: [`Agent/Store/Interface.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/Interface.hs) + +No non-obvious behavior. See source. One of three CPP re-export wrappers (Interface, Common, DB) that select the active backend at compile time via `dbPostgres`. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/Postgres.md b/spec/modules/Simplex/Messaging/Agent/Store/Postgres.md new file mode 100644 index 000000000..8cb29c1b0 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/Postgres.md @@ -0,0 +1,23 @@ +# Simplex.Messaging.Agent.Store.Postgres + +> PostgreSQL backend — dual-pool connection management, schema lifecycle, and migration. + +**Source**: [`Agent/Store/Postgres.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/Postgres.hs) + +## Dual pool architecture + +`connectPostgresStore` creates two connection pools (`dbPriorityPool` and `dbPool`), each with `poolSize` connections. Priority pool is used by `withTransactionPriority` for operations that shouldn't be blocked by regular queries. Both pools are TBQueue-based — connections are taken and returned after use. + +All connections are created eagerly at initialization, not lazily on demand. + +## uninterruptibleMask_ — pool atomicity invariant + +See comment on `connectStore`. `uninterruptibleMask_` prevents async exceptions from interrupting pool filling or draining. The invariant: when `dbClosed = True`, queues are empty; when `False`, queues are full (or connections are in-flight with threads that will return them). Interruption mid-fill would break this invariant. + +## Schema creation — fail-fast on missing + +If the PostgreSQL schema doesn't exist and `createSchema` is `False`, the process logs an error and calls `exitFailure`. This prevents silent operation against the wrong schema. + +## execSQL — not implemented + +`execSQL` throws "not implemented" — the PostgreSQL client doesn't support raw SQL execution via the agent API. The function exists only to satisfy the shared interface. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md b/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md new file mode 100644 index 000000000..2513882ff --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md @@ -0,0 +1,26 @@ +# Simplex.Messaging.Agent.Store.SQLite + +> SQLite backend — store creation, encrypted connection management, migration, and custom SQL functions. + +**Source**: [`Agent/Store/SQLite.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/SQLite.hs) + +## Security-relevant PRAGMAs + +`connectDB` sets PRAGMAs at connection time: +- `secure_delete = ON`: data is overwritten (not just unlinked) on DELETE +- `auto_vacuum = FULL`: freed pages are reclaimed immediately +- `foreign_keys = ON`: referential integrity enforced + +These are set per-connection, not per-database — every new connection (including re-opens) gets them. + +## simplex_xor_md5_combine — custom SQLite function + +A C-exported SQLite function registered at connection time. Takes an existing `IdsHash` and a `RecipientId`, XORs the hash with the MD5 of the ID. This is the SQLite implementation of the accumulative IdsHash used by service subscriptions (see [TSessionSubs.md](../TSessionSubs.md#updateActiveService--accumulative-xor-merge)). PostgreSQL uses its native `md5()` and `decode()` functions instead. + +## openSQLiteStore_ — connection swap under MVar + +Uses `bracketOnError` with `takeMVar`/`tryPutMVar`: takes the connection MVar, creates a new connection, and puts the new one back. If connection fails, `tryPutMVar` restores the old connection. The `dbClosed` TVar is flipped atomically with the key update. + +## storeKey — conditional key retention + +`storeKey key keepKey` stores the encryption key in the `dbKey` TVar only if `keepKey` is true. This allows `reopenDBStore` to re-open without the caller re-supplying the key. If `keepKey` is false and the store is closed, `reopenDBStore` fails with "no key". diff --git a/spec/modules/Simplex/Messaging/Agent/Store/Shared.md b/spec/modules/Simplex/Messaging/Agent/Store/Shared.md new file mode 100644 index 000000000..bc60de14e --- /dev/null +++ b/spec/modules/Simplex/Messaging/Agent/Store/Shared.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Agent.Store.Shared + +> Migration types, error reporting, and confirmation modes shared across database backends. + +**Source**: [`Agent/Store/Shared.hs`](../../../../../../src/Simplex/Messaging/Agent/Store/Shared.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Notifications/Client.md b/spec/modules/Simplex/Messaging/Notifications/Client.md new file mode 100644 index 000000000..d2c3eef0e --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Client.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Notifications.Client + +> Typed wrappers around `ProtocolClient` for NTF protocol commands. + +**Source**: [`Notifications/Client.hs`](../../../../../src/Simplex/Messaging/Notifications/Client.hs) + +## Non-obvious behavior + +### 1. Subscription operations always use NRMBackground + +`ntfCreateSubscription`, `ntfCheckSubscription`, `ntfDeleteSubscription`, and their batch variants hardcode `NRMBackground` as the network request mode. Token operations (`ntfRegisterToken`, `ntfVerifyToken`, etc.) accept the mode as a parameter. This reflects that subscription management is a background activity driven by the supervisor, while token operations can be user-initiated. + +### 2. Batch operations return per-item errors + +`ntfCreateSubscriptions` and `ntfCheckSubscriptions` return `NonEmpty (Either NtfClientError result)` — individual items in a batch can fail independently. Callers must handle partial success (some created, some failed). The singular variants throw on any error. diff --git a/spec/modules/Simplex/Messaging/Notifications/Protocol.md b/spec/modules/Simplex/Messaging/Notifications/Protocol.md new file mode 100644 index 000000000..9354e2086 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Protocol.md @@ -0,0 +1,43 @@ +# Simplex.Messaging.Notifications.Protocol + +> NTF protocol entities, commands, responses, and wire encoding for the notification system. + +**Source**: [`Notifications/Protocol.hs`](../../../../../src/Simplex/Messaging/Notifications/Protocol.hs) + +## Non-obvious behavior + +### 1. Asymmetric credential validation + +`checkCredentials` enforces different rules per command category: + +| Category | Signature required | Entity ID | +|----------|-------------------|-----------| +| TNEW, SNEW | Yes | Must be empty (new entity) | +| PING | No | Must be empty | +| All others | Yes | Must be present | + +For responses, the rule inverts: `NRTknId`, `NRSubId`, and `NRPong` must NOT have entity IDs (they are returned before/without entity context), while `NRErr` optionally has one (errors can occur with or without entity context). + +### 2. PNMessageData semicolon separator + +`encodePNMessages` uses `;` as the separator between push notification message items instead of the standard `,` used by `NonEmpty` `strEncode`. This is because `SMPQueueNtf` contains an `SMPServer` whose host list encoding already uses commas, which would create ambiguous parsing. + +### 3. NTInvalid reason is version-gated + +When encoding `NRTkn` responses, the `NTInvalid` reason is only included if the negotiated protocol version is >= `invalidReasonNTFVersion` (v3). Older clients receive `NTInvalid Nothing`. This prevents parse failures on clients that don't understand the reason field. + +### 4. subscribeNtfStatuses migration invariant + +The comment on `subscribeNtfStatuses` (`[NSNew, NSPending, NSActive, NSInactive]`) warns that changing these statuses requires a new database migration for queue ID hashes (see `m20250830_queue_ids_hash`). This is a cross-module invariant between protocol types and server storage. + +### 5. allowNtfSubCommands permits NTInvalid and NTExpired + +Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL), which is counterintuitive. The rationale (noted in a TODO comment) is that invalidation can happen after verification, and existing subscriptions should remain manageable. `NTExpired` is also permitted for the same reason. + +### 6. PPApnsNull test provider + +`PPApnsNull` is a push provider that never communicates with APNS. It's used for end-to-end testing of the notification server from clients without requiring actual push infrastructure. + +### 7. DeviceToken hex validation + +`DeviceToken` string parsing has two paths: a hardcoded literal match for `"apns_null test_ntf_token"` (test tokens), and hex string validation for real tokens (must be even-length hex). The wire encoding (`smpP`) does not perform this validation — it accepts any `ByteString`. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server.md b/spec/modules/Simplex/Messaging/Notifications/Server.md new file mode 100644 index 000000000..9c88cf7a0 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server.md @@ -0,0 +1,79 @@ +# Simplex.Messaging.Notifications.Server + +> NTF server: manages tokens, subscriptions, SMP subscriber connections, and push notification delivery. + +**Source**: [`Notifications/Server.hs`](../../../../../src/Simplex/Messaging/Notifications/Server.hs) + +## Architecture + +The NTF server runs several concurrent threads via `raceAny_`: + +| Thread | Purpose | +|--------|---------| +| `ntfSubscriber` | Receives SMP messages (NMSG, END, DELD) and agent events (connect/disconnect/subscribe) | +| `ntfPush` | Reads push queue and delivers via APNS provider | +| `periodicNtfsThread` | Sends periodic "check messages" push notifications (cron) | +| `runServer` (per transport) | Accepts client connections and runs NTF protocol | +| Stats/Prometheus/Control | Optional monitoring and admin threads | + +Each client connection spawns `receive`, `send`, and `client` threads via `raceAny_`. + +## Non-obvious behavior + +### 1. Timing attack mitigation on entity lookup + +When `verifyNtfTransmission` encounters an AUTH error (entity not found), it calls `dummyVerifyCmd` to equalize response timing before returning the error. This prevents attackers from distinguishing "entity doesn't exist" from "signature invalid" based on response latency. + +### 2. TNEW idempotent re-registration + +When TNEW is received for an already-registered token, the server: +1. Looks up the existing token via `findNtfTokenRegistration` +2. Verifies the DH secret matches (recomputed from the new `dhPubKey` and stored `tknDhPrivKey`) +3. If DH secrets differ → AUTH error (prevents token hijacking) +4. If they match → re-sends verification push notification + +This makes TNEW safe for client retransmission after connection drops. + +### 3. SNEW idempotent subscription + +When SNEW is received for an existing subscription (same token + SMP queue), the server returns the existing `ntfSubId` if the notifier key matches. If keys differ, AUTH error. New subscriptions are only created when no match exists in `findNtfSubscription`. + +### 4. PPApnsNull suppresses statistics + +`incNtfStatT` skips all stat increments when the device token uses `PPApnsNull` provider. This prevents test tokens from polluting production metrics. + +### 5. END requires active session validation + +SMP END messages are only processed when the originating session is the currently active session for that server (`activeClientSession'` check). This prevents stale END messages from previous (reconnected) sessions from incorrectly marking subscriptions as ended. + +### 6. waitForSMPSubscriber two-phase wait + +`waitForSMPSubscriber` first tries a non-blocking `tryReadTMVar`. If the subscriber isn't ready yet, it falls back to a blocking `readTMVar` with a 10-second timeout. This avoids creating an extra timeout thread in the common case where the subscriber is already available. + +### 7. CAServiceUnavailable triggers individual resubscription + +When a service subscription becomes unavailable (SMP server rejects service credentials), the NTF server: +1. Removes the service association from the database +2. Resubscribes all individual queues for that server via `subscribeSrvSubs` + +This is the fallback path from service-level to queue-level SMP subscriptions. + +### 8. Push delivery single retry + +`deliverNotification` retries exactly once on connection errors (`PPConnection`) or `PPRetryLater`: +1. Creates a new push client (`newPushClient`) to get a fresh connection +2. Retries the delivery + +On the second failure, the error is logged and returned. `PPTokenInvalid` marks the token as `NTInvalid` on either the first or retry attempt. + +### 9. TCRN minimum interval enforcement + +Cron notification interval has a hard minimum of 20 minutes. `TCRN 0` disables cron notifications. `TCRN n` where `1 <= n < 20` returns `QUOTA` error. + +### 10. Startup resubscription is concurrent per server + +`resubscribe` uses `mapConcurrently` to resubscribe to all known SMP servers in parallel. Within each server, subscriptions are paginated via `subscribeLoop` using cursor-based pagination (`afterSubId_`). + +### 11. receive separates error responses from commands + +The `receive` function processes incoming transmissions and partitions results: malformed/unauthorized requests are written directly to `sndQ` as error responses, while valid commands go to `rcvQ` for processing. This ensures protocol errors get immediate responses without competing for the command processing queue. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Control.md b/spec/modules/Simplex/Messaging/Notifications/Server/Control.md new file mode 100644 index 000000000..897f81c16 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Control.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Notifications.Server.Control + +> Control port command protocol for NTF server administration. + +**Source**: [`Notifications/Server/Control.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Control.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Env.md b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md new file mode 100644 index 000000000..96221a012 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Notifications.Server.Env + +> NTF server environment: configuration, subscriber state, and push provider management. + +**Source**: [`Notifications/Server/Env.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Env.hs) + +## Non-obvious behavior + +### 1. Service credentials are lazily generated + +`mkDbService` in `newNtfServerEnv` generates service credentials on demand: when `getCredentials` is called for an SMP server, it first checks the database. If credentials exist, they are used. If not (`Nothing`), new credentials are generated via `genCredentials`, stored in the database, and returned. This happens per SMP server on first connection. + +Service credentials are only used when `useServiceCreds` is enabled in the config. + +### 2. PPApnsNull creates a no-op push client + +`newPushClient` checks `apnsProviderHost` for the push provider. `PPApnsNull` returns `Nothing`, which creates a no-op client (`\_ _ -> pure ()`). Real providers create an actual APNS connection. This is the mechanism that allows `PPApnsNull` tokens to function without push infrastructure. + +### 3. getPushClient lazy initialization + +`getPushClient` looks up the push client by provider in `pushClients` TMap. If not found, it calls `newPushClient` to create and register one. Push provider connections are established on first use, not at server startup. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Main.md b/spec/modules/Simplex/Messaging/Notifications/Server/Main.md new file mode 100644 index 000000000..3719dcd97 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Main.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Notifications.Server.Main + +> CLI interface and INI configuration parsing for the NTF server. + +**Source**: [`Notifications/Server/Main.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Main.hs) + +No non-obvious behavior. Standard CLI/config boilerplate. Notable defaults: `subsBatchSize = 900`, `periodicNtfsInterval = 5 minutes`, `pushQSize = 32768`, `persistErrorInterval = 0` (disables SMP client reconnection error persistence). diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md new file mode 100644 index 000000000..2a6d8c0b1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md @@ -0,0 +1,35 @@ +# Simplex.Messaging.Notifications.Server.Push.APNS + +> Apple Push Notification Service (APNS) client: JWT authentication, HTTP/2 delivery, and e2e encryption. + +**Source**: [`Notifications/Server/Push/APNS.hs`](../../../../../../../src/Simplex/Messaging/Notifications/Server/Push/APNS.hs) + +## Non-obvious behavior + +### 1. PNCheckMessages is not encrypted + +`PNVerification` and `PNMessage` notifications are encrypted with the shared DH secret (`C.cbEncrypt`) and padded to `paddedNtfLength` (3072 bytes) to prevent metadata leakage. `PNCheckMessages` is sent as a plain `{"checkMessages": true}` background notification — it carries no sensitive data and doesn't need e2e encryption. + +### 2. Fixed-length encryption padding + +All encrypted notifications are padded to `paddedNtfLength` (3072 bytes) regardless of actual content size. This prevents notification size from revealing whether it's a verification code (small) or a message batch (larger). + +### 3. JWT token caching with TTL refresh + +`getApnsJWTToken` caches the signed JWT and only regenerates it when the token age exceeds `tokenTTL` (30 minutes). No locking is used — if two threads race to refresh, last writer wins, which is acceptable since both produce valid tokens. + +### 4. HTTP/2 reconnect-on-use + +`createAPNSPushClient` registers a disconnect callback that sets `https2Client` to `Nothing`. `getApnsHTTP2Client` lazily reconnects on the next push delivery attempt. The connection is not proactively maintained. + +### 5. 503 triggers active disconnect before retry + +When APNS returns 503 (Service Unavailable), the client actively closes the HTTP/2 connection (`disconnectApnsHTTP2Client`) before throwing `PPRetryLater`. This ensures a fresh connection is established on retry rather than reusing a potentially degraded connection. + +### 6. ExpiredProviderToken is permanent + +403 errors for `ExpiredProviderToken` and `InvalidProviderToken` are classified as `PPPermanentError` rather than retryable. Since `getApnsJWTToken` just refreshed the JWT before the request, retrying with the same key would produce the same error. This indicates a configuration problem (wrong key/team ID). + +### 7. EC key type assumption + +`readECPrivateKey` uses a specific pattern match for EC keys (`PrivKeyEC_Named`). It will crash at runtime if the APNS key file contains a different key type. The comment acknowledges this limitation. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS/Internal.md b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS/Internal.md new file mode 100644 index 000000000..b42753e98 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS/Internal.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Notifications.Server.Push.APNS.Internal + +> APNS HTTP header constants and JSON encoding options. + +**Source**: [`Notifications/Server/Push/APNS/Internal.hs`](../../../../../../../../src/Simplex/Messaging/Notifications/Server/Push/APNS/Internal.hs) + +No non-obvious behavior. See source. Defines APNS header names and JSON options (`UntaggedValue` sum encoding, `camelTo2 '-'` for hyphenated field names like `content-available`, `mutable-content`). diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md new file mode 100644 index 000000000..971419abf --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md @@ -0,0 +1,19 @@ +# Simplex.Messaging.Notifications.Server.Stats + +> NTF server statistics collection with own-server breakdown and backward-compatible persistence. + +**Source**: [`Notifications/Server/Stats.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Stats.hs) + +## Non-obvious behavior + +### 1. incServerStat double lookup + +`incServerStat` performs a non-STM IO lookup first, then only enters an STM transaction on cache miss. The STM block re-checks the map to handle races (another thread may have inserted between the IO lookup and STM entry). This avoids contention on the shared TMap in the common case where the server's counter TVar already exists. + +### 2. setNtfServerStats is not thread safe + +`setNtfServerStats` is explicitly documented as non-thread-safe and intended for server startup only (restoring from backup file). + +### 3. Backward-compatible parsing + +The `strP` parser uses `opt` which defaults missing fields to 0. This allows reading stats files from older server versions that don't include newer fields (`ntfReceivedAuth`, `ntfFailed`, `ntfVrf*`, etc.). diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md new file mode 100644 index 000000000..33acdaad9 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md @@ -0,0 +1,23 @@ +# Simplex.Messaging.Notifications.Server.Store + +> STM-based in-memory store for notification tokens, subscriptions, and last-notification accumulation. + +**Source**: [`Notifications/Server/Store.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Store.hs) + +## Non-obvious behavior + +### 1. Two-level token registration index + +`tokenRegistrations` uses a nested TMap: `DeviceToken -> TMap ByteString NtfTokenId`, where the inner key is the serialized verify key. This allows **multiple concurrent registrations** per device token (with different keys), protecting against malicious registration attempts if a token is compromised. The inner key is derived via `C.toPubKey C.pubKeyBytes`. + +### 2. stmRemoveInactiveTokenRegistrations cleans up rivals + +When a token is activated, `stmRemoveInactiveTokenRegistrations` removes ALL other registrations for the same device token, including their token records, last notifications, and all subscriptions. Only the activating token's registration survives. + +### 3. stmStoreTokenLastNtf guards against stale tokens + +`stmStoreTokenLastNtf` performs a non-STM IO lookup first, then enters STM. Within the STM block, it re-checks the map to handle the race where another thread modified the map between the IO lookup and STM entry. It only inserts for tokens that exist in the `tokens` map — stale token IDs are silently ignored. + +### 4. tokenLastNtfs accumulates via prepend + +New notifications are prepended to the `NonEmpty PNMessageData` list via `(<|)`. The list is unbounded in the STM store — bounding is handled at the push delivery layer (the Postgres store limits to 6). diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md new file mode 100644 index 000000000..3cb5c9083 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md @@ -0,0 +1,54 @@ +# Simplex.Messaging.Notifications.Server.Store.Postgres + +> PostgreSQL-backed persistent store for notification tokens, subscriptions, and last-notification delivery. + +**Source**: [`Notifications/Server/Store/Postgres.hs`](../../../../../../../src/Simplex/Messaging/Notifications/Server/Store/Postgres.hs) + +## Non-obvious behavior + +### 1. deleteNtfToken exclusive row lock + +`deleteNtfToken` acquires `FOR UPDATE` on the token row before cascading deletes. This prevents concurrent subscription inserts for this token during the deletion window. The subscriptions are aggregated by SMP server and returned for in-memory subscription cleanup. + +### 2. addTokenLastNtf atomic CTE + +`addTokenLastNtf` executes a single SQL statement with three CTEs that atomically: +1. **Upserts** the new notification into `last_notifications` (one row per token+subscription) +2. **Collects** the most recent notifications for the token (limited to `maxNtfs = 6`) +3. **Deletes** any older notifications beyond the limit + +This ensures the push notification always contains the most recent notifications across all of a token's subscriptions, with bounded storage. + +### 3. setTokenActive cleans duplicate registrations + +After activating a token, `setTokenActive` deletes all other tokens with the same `push_provider` + `push_provider_token` but different `token_id`. This cleans up incomplete or duplicate registration attempts. + +### 4. setTknStatusConfirmed conditional update + +Updates to `NTConfirmed` only if the current status is not already `NTConfirmed` or `NTActive`. This prevents downgrading an already-active token back to confirmed state when a delayed verification push arrives. + +### 5. Silent token date tracking + +`updateTokenDate` is called on every token read (`getNtfToken_`, `findNtfSubscription`, `getNtfSubscription`). It updates `updated_at` only when the current date differs from the stored date. This tracks token activity without explicit client action. + +### 6. getServerNtfSubscriptions marks as pending + +After reading subscriptions for resubscription, `getServerNtfSubscriptions` batch-updates their status to `NSPending`. This prevents the same subscriptions from being picked up by a concurrent resubscription pass — it acts as a "claim" mechanism. + +Only non-service-associated subscriptions (`NOT ntf_service_assoc`) are returned for individual resubscription. + +### 7. Approximate subscription count + +`getEntityCounts` uses `pg_class.reltuples` for the subscription count instead of `count(*)`. This returns an approximate value from PostgreSQL's statistics catalog, avoiding a full table scan on potentially large subscription tables. + +### 8. withFastDB vs withDB priority pools + +`withFastDB` uses `withTransactionPriority ... True` to run on the priority connection pool. Client-facing operations (token registration, subscription commands) use the priority pool, while background operations (batch status updates, resubscription) use the regular pool. + +### 9. Server upsert optimization + +`addNtfSubscription` first tries a plain SELECT for the SMP server, then falls back to INSERT with ON CONFLICT only if the server doesn't exist. This avoids the upsert overhead in the common case where the server already exists. + +### 10. Service association tracking + +`batchUpdateSrvSubStatus` atomically updates both subscription status and `ntf_service_assoc` flag. When notifications arrive via a service subscription (`newServiceId` is `Just`), all affected subscriptions are marked as service-associated. `removeServiceAndAssociations` resets all subscriptions for a server to `NSInactive` with `ntf_service_assoc = FALSE`. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Types.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Types.md new file mode 100644 index 000000000..97f0fce46 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Notifications.Server.Store.Types + +> Pure record types and STM conversion for notification tokens and subscriptions. + +**Source**: [`Notifications/Server/Store/Types.hs`](../../../../../../../src/Simplex/Messaging/Notifications/Server/Store/Types.hs) + +No non-obvious behavior. `mkTknData`/`mkTknRec` convert between pure records and TVar-based STM data. `tknUpdatedAt` is parsed as optional for backward compatibility with store logs that predate it. diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md index 7c7955154..263b2459a 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Transport.md +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -1,36 +1,30 @@ # Simplex.Messaging.Notifications.Transport -> Notification Router Protocol transport: manages push notification subscriptions between client and NTF Router. +> NTF protocol version negotiation, TLS handshake, and transport handle setup. **Source**: [`Notifications/Transport.hs`](../../../../../src/Simplex/Messaging/Notifications/Transport.hs) -**Protocol spec**: [`protocol/push-notifications.md`](../../../../../protocol/push-notifications.md) — SimpleX Notification Router protocol. +## Non-obvious behavior -## Overview +### 1. ALPN-dependent version range -This module implements the transport layer for the **Notification Router Protocol**. Per the protocol spec: "To manage notification subscriptions to SMP routers, SimpleX Notification Router provides an RPC protocol with a similar design to SimpleX Messaging Protocol router." +`ntfServerHandshake` advertises `legacyServerNTFVRange` (v1 only) when ALPN is not available (`getSessionALPN` returns `Nothing`). When ALPN is present, it advertises the full `supportedServerNTFVRange`. This is the backward-compatibility mechanism for pre-ALPN clients that cannot negotiate newer protocol features. -The protocol spec diagram shows three separate protocols in the notification flow: -1. **Notification Router Protocol** (this module): client ↔ SimpleX Notification Router — subscription management -2. **SMP protocol**: SMP Router → SimpleX Notifications Subscriber — notification signals -3. **Push provider** (e.g., APN): SimpleX Push Router → device — per the spec: "the notifications are e2e encrypted between SimpleX Notification Router and the user's device" +### 2. Version-gated features -## Differences from SMP transport +Two feature gates exist in the NTF protocol: -The NTF protocol reuses SMP's transport infrastructure but with reduced parameters: +| Version | Feature | Effect | +|---------|---------|--------| +| v2 (`authBatchCmdsNTFVersion`) | Auth key exchange + batching | `authPubKey` sent in handshake, `implySessId` and `batch` enabled | +| v3 (`invalidReasonNTFVersion`) | Token invalid reasons | `NTInvalid` responses include the reason enum | -| Property | SMP | NTF | -|----------|-----|-----| -| Block size | 16384 | 512 | -| Block encryption | Yes (v11+) | No (`encryptBlock = Nothing`) | -| Service certificates | Yes (v16+) | No (`serviceAuth = False`) | -| Version range | 6–19 | 1–3 | -| Handshake messages | 2–3 | 2 | +Pre-v2 connections have no command encryption or batching — commands are sent in plaintext within TLS. -## Same ALPN/legacy fallback pattern as SMP +### 3. Unused Protocol typeclass parameters -`ntfServerHandshake` uses the same pattern as `smpServerHandshake`: if ALPN is not negotiated (`getSessionALPN` returns `Nothing`), the notification router offers only `legacyServerNTFVRange` (v1 only). +`ntfClientHandshake` accepts `_proxyServer` and `_serviceKeys` parameters that are ignored. These exist because the `Protocol` typeclass (shared with SMP) requires `protocolClientHandshake` to accept them. The NTF protocol does not support proxy routing or service authentication. -## NTF handshake uses SMP shared types +### 4. Block size -The handshake reuses SMP's `THandle`, `THandleParams`, `THandleAuth` types. The `encodeAuthEncryptCmds` and `authEncryptCmdsP` helper functions are defined locally in this module (with NTF-specific version thresholds). NTF never sets `sessSecret` / `sessSecret'`, `peerClientService`, or `clientService` — these are always `Nothing`. +NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. Notification commands and responses are short — the main payload is the `PNMessageData` which contains encrypted message metadata. diff --git a/spec/modules/Simplex/Messaging/Notifications/Types.md b/spec/modules/Simplex/Messaging/Notifications/Types.md new file mode 100644 index 000000000..bb05ccefb --- /dev/null +++ b/spec/modules/Simplex/Messaging/Notifications/Types.md @@ -0,0 +1,19 @@ +# Simplex.Messaging.Notifications.Types + +> Agent-side notification token and subscription types with action state machines. + +**Source**: [`Notifications/Types.hs`](../../../../../src/Simplex/Messaging/Notifications/Types.hs) + +## Non-obvious behavior + +### 1. NASDeleted is a transient race condition artifact + +`NASDeleted` can only exist when the notification supervisor updates a subscription record while a worker is mid-operation on that same subscription. The worker's post-operation database update hits a record that was already modified by the supervisor, resulting in an update to `NASDeleted` status instead of a full deletion. This status should not persist — it is cleaned up on the next supervisor pass. + +### 2. Action space split across two worker types + +`NtfSubAction` is an `Either`-like sum of `NtfSubNTFAction` (handled by NTF router workers) and `NtfSubSMPAction` (handled by SMP router workers). The supervisor writes these to the database, and each worker pool only reads its own action type. `isDeleteNtfSubAction` classifies actions across both types for the supervisor's reset logic. + +### 3. NSADelete and NSARotate are deprecated + +These `NtfSubNTFAction` values are no longer generated by current code but are retained in the type for processing legacy database records. `NSARotate` is logically "delete + recreate" while `NSADelete` is "delete notifier on NTF router + delete credentials on SMP router". From 546ee1a0e10db2de4efe3992e02ece7f8de2e416 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 12:43:02 +0000 Subject: [PATCH 22/61] update specs --- .../Messaging/Agent/NtfSubSupervisor.md | 24 ++++++- spec/modules/Simplex/Messaging/Agent/Store.md | 28 +++++++++ .../Messaging/Agent/Store/AgentStore.md | 62 +++++++++++++++++-- .../Simplex/Messaging/Agent/Store/SQLite.md | 8 ++- .../Simplex/Messaging/Notifications/Client.md | 8 +++ .../Messaging/Notifications/Protocol.md | 12 ++++ .../Simplex/Messaging/Notifications/Server.md | 56 ++++++++++++++++- .../Messaging/Notifications/Server/Env.md | 26 +++++++- .../Notifications/Server/Push/APNS.md | 40 ++++++++++++ .../Messaging/Notifications/Server/Stats.md | 22 ++++++- .../Messaging/Notifications/Server/Store.md | 32 ++++++++++ .../Notifications/Server/Store/Postgres.md | 40 ++++++++++++ .../Messaging/Notifications/Transport.md | 14 ++++- .../Simplex/Messaging/Notifications/Types.md | 2 +- 14 files changed, 357 insertions(+), 17 deletions(-) diff --git a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md index 33cd3eacb..d55cfd746 100644 --- a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md +++ b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md @@ -33,7 +33,7 @@ The key decision point: when `subAction_` is `Nothing` (set by `workerErrors` af `retrySubActions` holds the list of subs-to-retry in a `TVar`. Each iteration, the action function returns only the subs that got temporary errors (via `splitResults`). The `TVar` is overwritten with this shrinking list. On success or permanent error, subs drop out. This means retry batches get smaller over time. -`splitResults` implements a three-way partition: temporary errors → retry, permanent errors → null the action + notify, successes → continue pipeline. +`splitResults` implements a three-way partition: temporary or host errors → retry, permanent errors → null the action + notify, successes → continue pipeline. ### 3. rescheduleWork deferred wake-up @@ -48,7 +48,7 @@ This is the mechanism for time-scheduled subscription health checks. When the notification router returns `AUTH` for a subscription check, the subscription is not simply marked as failed — it is fully recreated from scratch by resetting to `NSASMP NSASmpKey` state. This handles the case where the notification router has lost its subscription state (restart, data loss). The SMP worker is kicked to re-establish notifier credentials. -Non-AUTH failure statuses that are not in `subscribeNtfStatuses` also trigger recreation. +Successful check responses with statuses not in `subscribeNtfStatuses` also trigger recreation via `recreateNtfSub`. ### 5. deleteToken two-phase with restart survival @@ -73,3 +73,23 @@ These NTF worker actions are no longer generated by current code but are kept fo ### 8. Stats counting groups by userId `incStatByUserId` groups batch subscriptions by `userId` before incrementing stats counters, ensuring per-user counts are accurate even when a single batch contains subscriptions from multiple users. + +### 9. sendNtfSubCommand — gated on instant mode + +`sendNtfSubCommand` only enqueues work if instant notifications are active (`hasInstantNotifications` checks `NTActive` status + `NMInstant` mode). In periodic mode, the entire subscription creation pipeline is dormant — no commands reach the supervisor. + +### 10. deleteNotifierKeys — credential reset before disable + +`resetCredsGetQueue` clears the queue's notification credentials in the store *before* sending the disable command to the SMP router. This "clean first" ordering means local state is already consistent even if the network call fails. + +### 11. runNtfTknDelWorker — permanent error discards record + +When token deletion gets a permanent (non-temporary, non-host) error, the deletion record is removed from the queue rather than retried. This prevents stuck deletion records from blocking the worker. The error is reported to the client. + +### 12. getNtfServer — random selection from multiple + +When multiple notification routers are configured, one is selected randomly using `randomR` with a session-stable `TVar` generator. Single-server configurations skip the randomness. + +### 13. closeNtfSupervisor — atomic swap then cancel + +`swapTVar` atomically replaces the workers map with empty, then cancels all extracted workers. This ensures all existing workers at the point of shutdown are captured for cancellation. Prevention of new work is handled by the supervisor loop termination and operation bracket lifecycle, not by the swap itself. diff --git a/spec/modules/Simplex/Messaging/Agent/Store.md b/spec/modules/Simplex/Messaging/Agent/Store.md index 0eecbf8d1..ca3dabd1f 100644 --- a/spec/modules/Simplex/Messaging/Agent/Store.md +++ b/spec/modules/Simplex/Messaging/Agent/Store.md @@ -42,3 +42,31 @@ See comment on `InvShortLink`. Stored separately from the connection because 1-t ## RcvQueueSub — subscription-optimized projection `RcvQueueSub` strips cryptographic fields from `RcvQueue`, keeping only what's needed for subscription tracking in [TSessionSubs](./TSessionSubs.md). This reduces memory pressure when tracking thousands of subscriptions in STM. + +## rcvSMPQueueAddress exposes sender-facing ID + +`rcvSMPQueueAddress` constructs the `SMPQueueAddress` from a receive queue using `sndId` (not `rcvId`). The address shared with senders in connection requests contains the sender ID, the public key derived from `e2ePrivKey`, and `queueMode`. The `rcvId` is never exposed externally. + +## enableNtfs is duplicated between queue and connection + +`enableNtfs` exists on both `StoredRcvQueue` and `ConnData`. The comment marks it as "duplicated from ConnData." The queue-level copy enables subscription operations (which work at the queue level) to check notification status without loading the full connection. + +## deleteErrors — queue deletion retry counter + +`StoredRcvQueue` has a `deleteErrors :: Int` field that counts failed deletion attempts. This allows the agent to give up on queue deletion after repeated failures rather than retrying indefinitely. + +## Two-level message preparation + +`SndMsgData` optionally carries `SndMsgPrepData` with a `sndMsgBodyId` reference to a separately stored message body. `PendingMsgData` optionally carries `PendingMsgPrepData` with the actual `AMessage` body. This split allows large message bodies to be stored once and referenced by ID during the send pipeline, avoiding redundant serialization. + +## Per-message retry backoff + +`PendingMsgData` includes `msgRetryState :: Maybe RI2State` — each pending message independently tracks its retry backoff state. This means messages that fail to send don't reset the retry timers of other pending messages in the same connection. + +## Soft deletion and optional contact connection + +`ConnData` has `deleted :: Bool` for soft deletion — connections are marked deleted before queue cleanup completes. `Invitation` has `contactConnId_ :: Maybe ConnId` (note the trailing underscore) — invitations can outlive their originating contact connection. + +## SEBadQueueStatus is vestigial + +`SEBadQueueStatus` is documented in the source as "Currently not used." It was intended for queue status transition validation but was never implemented. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md index 9fbc2beb3..d1271a6d3 100644 --- a/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md +++ b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md @@ -8,7 +8,7 @@ At ~3700 lines, this is the largest module in the codebase. It implements all database operations for the agent, compiled with CPP for both SQLite and PostgreSQL backends. Most functions are straightforward SQL CRUD, but several patterns are non-obvious. -The module re-exports `withConnection`, `withTransaction`, `withTransactionPriority`, `firstRow`, `firstRow'`, `maybeFirstRow`, and `fromOnlyBI` from the backend-specific Common module. +The module re-exports `withConnection`, `withTransaction`, `withTransactionPriority`, `firstRow`, `firstRow'`, and `maybeFirstRow` from the backend-specific Common module. It also exports `fromOnlyBI` (a local helper) and `getWorkItem`/`getWorkItems`. ## Dual-backend compilation @@ -19,11 +19,11 @@ The module uses `#if defined(dbPostgres)` throughout. Key behavioral differences ## getWorkItem / getWorkItems — worker store pattern -`getWorkItem` implements the store-side pattern for the [worker framework](../Client.md): `getId → getItem → markFailed`. If `getId` or `getItem` throws an IO exception, `handleWrkErr` wraps it as `SEWorkItemError` (via `mkWorkItemError`), which signals the worker to suspend rather than retry. This prevents crash loops on corrupt data. +`getWorkItem` implements the store-side pattern for the [worker framework](../Client.md): `getId → getItem → markFailed`. If `getId` throws an IO exception, `handleWrkErr` wraps it as `SEWorkItemError` (via `mkWorkItemError`), which signals the worker to suspend rather than retry. If `getItem` fails (returning Left or throwing), `tryGetItem` calls `markFailed` (also wrapped by `handleWrkErr`) and rethrows the original error. This prevents crash loops on corrupt data. `getWorkItems` extends this to batch work items, where each item failure is independent. -**Consumed by**: `getPendingQueueMsg`, `getPendingServerCommand`, `getNextNtfSubNTFActions`, `getNextNtfSubSMPActions`, `getNextDeletedSndChunkReplica`, `getNextNtfTokenToDelete`. +**Consumed by**: `getPendingQueueMsg`, `getPendingServerCommand`, `getNextNtfSubNTFActions`, `getNextNtfSubSMPActions`, `getNextDeletedSndChunkReplica`, `getNextNtfTokenToDelete`, `getNextRcvChunkToDownload`, `getNextRcvFileToDecrypt`, `getNextSndChunkToUpload`, `getNextSndFileToPrepare`. ## Notification subscription — supervisor/worker coordination @@ -43,11 +43,11 @@ Both functions include `AND last_internal_*_msg_id = ?` in their UPDATE WHERE cl ## deleteConn — conditional delivery wait -Three deletion paths: +Four paths: 1. No timeout: immediate delete. 2. Timeout + no pending deliveries: immediate delete. 3. Timeout + pending deliveries + `deleted_at_wait_delivery` expired: delete. -4. Timeout + pending deliveries + not expired: return `Nothing` (skip). +4. Timeout + pending deliveries + not expired: return `Nothing` (skip deletion). This allows graceful delivery completion before connection cleanup. @@ -74,3 +74,55 @@ Generates random 12-byte IDs (base64url encoded) and retries up to 3 times on co ## setRcvQueuePrimary / setSndQueuePrimary — two-step primary swap First clears primary flag on all queues in the connection, then sets it on the target queue. Also clears `replace_*_queue_id` on the new primary — this completes the queue rotation by removing the "replacing" marker. + +## checkConfirmedSndQueueExists_ — dpPostgres typo + +The CPP guard reads `#if defined(dpPostgres)` (note `dp` instead of `db`). This means the `FOR UPDATE` clause is never included for any backend. The check still works correctly for SQLite (single-writer model) but on PostgreSQL the query runs without row locking, which could allow a TOCTOU race between checking and inserting. + +## createCommand — silent drop for deleted connections + +When `createCommand` encounters a constraint violation (the referenced connection was already deleted), it logs the error and returns successfully rather than throwing. This means commands targeting deleted connections are silently dropped. The rationale: the connection is already gone, so there's nothing useful to do with the error. + +## updateNewConnRcv — retry tolerance + +`updateNewConnRcv` accepts both `NewConnection` and `RcvConnection` connection states. The `RcvConnection` case is explicitly commented as "to allow retries" — if the initial queue insertion succeeded but the caller didn't get the response, a retry would find the connection already upgraded. `updateNewConnSnd` does not have this tolerance. + +## setLastBrokerTs — monotonic advance + +The WHERE clause includes `AND (last_broker_ts IS NULL OR last_broker_ts < ?)`, which ensures the timestamp only moves forward. Out-of-order message processing (e.g., from different queues) cannot regress the broker timestamp. + +## deleteDeliveredSndMsg — FOR UPDATE + count zero check + +On PostgreSQL, acquires a `FOR UPDATE` lock on the message row before counting pending deliveries. This prevents a race where two concurrent delivery completions both see count > 0 before either deletes, then both try to delete. Only deletes the message when the count reaches exactly 0. + +## createWithRandomId' — savepoint-based retry + +Uses `withSavepoint` around each insertion attempt rather than bare execute. This is critical for PostgreSQL: a failed statement within a transaction aborts the entire transaction, but savepoints allow rolling back just the failed INSERT and retrying with a new ID. + +## Explicit row-lock functions + +`lockConnForUpdate`, `lockRcvFileForUpdate`, and `lockSndFileForUpdate` are PostgreSQL-only explicit lock acquisition that compile to no-ops on SQLite. They acquire `FOR UPDATE` locks on rows that need serialized access without modifying them. + +## XFTP work item retry ordering + +`getNextRcvChunkToDownload` and `getNextSndChunkToUpload` order by `retries ASC, created_at ASC`. This prioritizes chunks with fewer retries, ensuring a repeatedly-failing chunk doesn't starve others. Same pattern for `getNextDeletedSndChunkReplica`. + +## getRcvFileRedirects — error resilience + +When loading redirect chains, errors loading individual redirect files are silently swallowed (`either (const $ pure Nothing) (pure . Just)`). This prevents a corrupt redirect from blocking access to the main file. + +## enableNtfs defaults to True when NULL + +Both `toRcvQueue` and `rowToConnData` default `enableNtfs` to `True` when the database value is NULL (`maybe True unBI enableNtfs_`). This is a backward-compatibility default for connections created before the field existed. + +## primaryFirst — queue ordering + +The `primaryFirst` comparator sorts queues with the primary queue first (`Down` on primary flag), then by `dbReplaceQId` to place the "replacing" queue second. This ensures all queue lists are consistently ordered for connection reconstruction. + +## getAnyConn_ — connection GADT reconstruction + +Reconstructs the type-level `Connection'` GADT by combining connection mode with the presence/absence of receive and send queues. The `CMContact` mode only maps to `ContactConnection` (receive-only); all other combinations use `CMInvitation` mode. When neither rcv nor snd queues exist, the result is always `NewConnection` regardless of mode. + +## deleteNtfSubscription — soft delete when supervisor active + +When `updated_by_supervisor` is true, `deleteNtfSubscription` doesn't actually delete the row. Instead, it nulls out the IDs and sets status to `NASDeleted`, preserving the row for the supervisor to observe. Only when the supervisor has not intervened does it perform a real DELETE. diff --git a/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md b/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md index 2513882ff..14bd97f8e 100644 --- a/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md +++ b/spec/modules/Simplex/Messaging/Agent/Store/SQLite.md @@ -15,7 +15,7 @@ These are set per-connection, not per-database — every new connection (includi ## simplex_xor_md5_combine — custom SQLite function -A C-exported SQLite function registered at connection time. Takes an existing `IdsHash` and a `RecipientId`, XORs the hash with the MD5 of the ID. This is the SQLite implementation of the accumulative IdsHash used by service subscriptions (see [TSessionSubs.md](../TSessionSubs.md#updateActiveService--accumulative-xor-merge)). PostgreSQL uses its native `md5()` and `decode()` functions instead. +A C-exported SQLite function registered at connection time. Takes an existing `IdsHash` and a `RecipientId`, XORs the hash with the MD5 of the ID. This is the SQLite implementation of the accumulative IdsHash used by service subscriptions (see [TSessionSubs.md](../TSessionSubs.md#updateActiveService--accumulative-xor-merge)). PostgreSQL uses `pgcrypto`'s `digest()` function for MD5 and a custom `xor_combine` PL/pgSQL function for the XOR. ## openSQLiteStore_ — connection swap under MVar @@ -23,4 +23,8 @@ Uses `bracketOnError` with `takeMVar`/`tryPutMVar`: takes the connection MVar, c ## storeKey — conditional key retention -`storeKey key keepKey` stores the encryption key in the `dbKey` TVar only if `keepKey` is true. This allows `reopenDBStore` to re-open without the caller re-supplying the key. If `keepKey` is false and the store is closed, `reopenDBStore` fails with "no key". +`storeKey key keepKey` stores the encryption key in the `dbKey` TVar if `keepKey` is true or if the key is empty (no encryption). This means unencrypted stores can always be reopened. If `keepKey` is false and the key is non-empty, `reopenDBStore` fails with "no key". + +## dbBusyLoop — initial connection retry + +`connectSQLiteStore` wraps `connectDB` in `dbBusyLoop` to handle database locking during initial connection. All transactions (`withTransactionPriority`) are also wrapped in `dbBusyLoop` as a retry layer on top of the `busy_timeout` PRAGMA. diff --git a/spec/modules/Simplex/Messaging/Notifications/Client.md b/spec/modules/Simplex/Messaging/Notifications/Client.md index d2c3eef0e..ffecbcad0 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Client.md +++ b/spec/modules/Simplex/Messaging/Notifications/Client.md @@ -13,3 +13,11 @@ ### 2. Batch operations return per-item errors `ntfCreateSubscriptions` and `ntfCheckSubscriptions` return `NonEmpty (Either NtfClientError result)` — individual items in a batch can fail independently. Callers must handle partial success (some created, some failed). The singular variants throw on any error. + +### 3. Default port is 443 + +`defaultNTFClientConfig` sets the default transport to `("443", transport @TLS)`. Unlike the SMP protocol which typically uses port 5223, the NTF protocol defaults to the standard HTTPS port. + +### 4. okNtfCommand parameter ordering + +`okNtfCommand` has an unusual parameter order — the command comes first, then client, mode, key, entityId. This enables partial application in the `ntfDeleteToken`, `ntfVerifyToken` etc. definitions, where the command is fixed and the remaining parameters flow through. diff --git a/spec/modules/Simplex/Messaging/Notifications/Protocol.md b/spec/modules/Simplex/Messaging/Notifications/Protocol.md index 9354e2086..71daf771d 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Protocol.md +++ b/spec/modules/Simplex/Messaging/Notifications/Protocol.md @@ -41,3 +41,15 @@ Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL), which ### 7. DeviceToken hex validation `DeviceToken` string parsing has two paths: a hardcoded literal match for `"apns_null test_ntf_token"` (test tokens), and hex string validation for real tokens (must be even-length hex). The wire encoding (`smpP`) does not perform this validation — it accepts any `ByteString`. + +### 8. SMPQueueNtf parsing applies updateSMPServerHosts + +Both `smpP` and `strP` for `SMPQueueNtf` apply `updateSMPServerHosts` to the parsed SMP server. This normalizes server host addresses on deserialization, ensuring consistent comparison even if the on-wire format uses different host representations. + +### 9. NRTknId response tag comment + +The `NRTknId_` tag encodes as `"IDTKN"` with a source comment: "it should be 'TID', 'SID'". This indicates a naming inconsistency that was preserved for backward compatibility — the tag names don't follow the pattern of other NTF protocol tags. + +### 10. useServiceAuth is False + +The `Protocol` instance explicitly returns `False` for `useServiceAuth`, meaning the NTF protocol never uses service-level authentication. All authentication is entity-level (per token/subscription). diff --git a/spec/modules/Simplex/Messaging/Notifications/Server.md b/spec/modules/Simplex/Messaging/Notifications/Server.md index 9c88cf7a0..5c74878d7 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server.md @@ -27,12 +27,12 @@ When `verifyNtfTransmission` encounters an AUTH error (entity not found), it cal ### 2. TNEW idempotent re-registration When TNEW is received for an already-registered token, the server: -1. Looks up the existing token via `findNtfTokenRegistration` +1. Looks up the existing token via `findNtfTokenRegistration` (matches on push provider, device token, AND verify key) 2. Verifies the DH secret matches (recomputed from the new `dhPubKey` and stored `tknDhPrivKey`) 3. If DH secrets differ → AUTH error (prevents token hijacking) 4. If they match → re-sends verification push notification -This makes TNEW safe for client retransmission after connection drops. +If the verify key doesn't match in step 1, the lookup returns `Nothing` and a new token is created instead — the DH secret check never runs. This makes TNEW safe for client retransmission after connection drops. ### 3. SNEW idempotent subscription @@ -77,3 +77,55 @@ Cron notification interval has a hard minimum of 20 minutes. `TCRN 0` disables c ### 11. receive separates error responses from commands The `receive` function processes incoming transmissions and partitions results: malformed/unauthorized requests are written directly to `sndQ` as error responses, while valid commands go to `rcvQ` for processing. This ensures protocol errors get immediate responses without competing for the command processing queue. + +### 12. Maintenance mode saves state then exits immediately + +When `maintenance` is set in `startOptions`, the server restores stats, calls `stopServer` (closes DB, saves stats), and exits with `exitSuccess`. It never starts transport listeners, subscriber threads, or resubscription. This provides a way to run database migrations without the server serving traffic. + +### 13. Resubscription runs as a detached fork + +`resubscribe` is launched via `forkIO` before `raceAny_` starts — it is **not part of the `raceAny_` group**. Most exceptions are silently lost per `forkIO` semantics. However, `ExitCode` exceptions (like `exitFailure` from pattern 20) are special-cased by GHC's runtime and propagate to the main thread, terminating the process. + +### 14. TNEW re-registration resets status for non-verifiable tokens + +When a re-registration TNEW matches on DH secret but `allowTokenVerification tknStatus` is `False` (token is `NTNew`, `NTInvalid`, or `NTExpired`), the server resets status to `NTRegistered` before sending the verification push. This makes TNEW a "status repair" mechanism — clients with stuck tokens can restart the verification flow by re-registering with the same DH key. + +### 15. DELD unconditionally updates status (no session validation) + +Unlike `SMP.END` which checks `activeClientSession'` to prevent stale session messages from changing state, `SMP.DELD` updates subscription status to `NSDeleted` unconditionally. This is correct because DELD means the queue was permanently deleted on the SMP router — the information is valid regardless of which session reports it. + +### 16. TRPL generates new code but reuses the DH key + +`TRPL` (token replace) creates a new registration code and resets status to `NTRegistered`, but does NOT generate a new server DH key pair. The existing `tknDhPrivKey` and `tknDhSecret` are preserved — only the push provider token and registration code change. The encrypted channel between client and NTF router persists across device token replacements. + +### 17. PNMessage delivery requires NTActive, verification and cron do not + +`ntfPush` applies `checkActiveTkn` only to `PNMessage` notifications. Verification pushes (`PNVerification`) and cron check-messages pushes (`PNCheckMessages`) are delivered regardless of token status. This is necessary because verification pushes must be sent before NTActive, and cron pushes are already filtered at the database level. + +### 18. CAServiceSubscribed validates count and hash with warning-only behavior + +When a service subscription is confirmed, the NTF router compares expected and confirmed subscription count and IDs hash. Mismatches in either are logged as warnings but no corrective action is taken. Only when both match is an informational message logged. + +### 19. subscribeLoop uses 100x database batch multiplier + +`dbBatchSize = batchSize * 100` reads subscriptions from the database in chunks 100 times larger than the SMP subscription batches. This reduces database round-trips during resubscription while keeping individual SMP batches small enough to avoid overwhelming SMP routers. + +### 20. subscribeLoop calls exitFailure on database error + +If `getServerNtfSubscriptions` returns `Left _` during startup resubscription, the server terminates via `exitFailure`. Since `resubscribe` runs in a forked thread (pattern 13), this `exitFailure` terminates the entire process — a transient database error during startup resubscription kills the server. + +### 21. Stats log aligns to wall-clock time of day + +The stats logging thread calculates an `initialDelay` to synchronize the first flush to `logStatsStartTime`. If the target time already passed today, it adds 86400 seconds to schedule for the next day. Subsequent flushes occur at exact `logInterval` cadence from that aligned start point. + +### 22. NMSG AUTH errors silently counted, not logged + +When `addTokenLastNtf` returns `Left AUTH` (notification for a queue whose subscription/token association is invalid), the server increments `ntfReceivedAuth` but takes no corrective action. Other error types are silently ignored. This is expected — subscriptions may be deleted while messages are in-flight. + +### 23. PNVerification delivery transitions token to NTConfirmed + +When a verification push is successfully delivered to the push provider, `setTknStatusConfirmed` transitions the token to `NTConfirmed`, but only if not already `NTConfirmed` or `NTActive`. This creates a two-phase confirmation: push delivery confirms the channel works (`NTConfirmed`), then TVFY confirms the client received it (`NTActive`). + +### 24. disconnectTransport always passes noSubscriptions = True + +Unlike the SMP router which checks active subscriptions before disconnecting idle clients, the NTF router always returns `True` for the "no subscriptions" check. NTF clients are disconnected purely on inactivity timeout — the NTF protocol has no long-lived client subscriptions. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Env.md b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md index 96221a012..c266390d2 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Env.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md @@ -8,7 +8,7 @@ ### 1. Service credentials are lazily generated -`mkDbService` in `newNtfServerEnv` generates service credentials on demand: when `getCredentials` is called for an SMP server, it first checks the database. If credentials exist, they are used. If not (`Nothing`), new credentials are generated via `genCredentials`, stored in the database, and returned. This happens per SMP server on first connection. +`mkDbService` in `newNtfServerEnv` generates service credentials on demand: when `getCredentials` is called for an SMP server, it checks the database. If the server is known and already has credentials, they are reused. If the server is known but has no credentials yet (first connection), new credentials are generated via `genCredentials`, stored in the database, and returned. If the server is not in the database at all, `PCEServiceUnavailable` is thrown (this case should not occur in practice, as clients only connect to servers already tracked in the database). Service credentials are only used when `useServiceCreds` is enabled in the config. @@ -19,3 +19,27 @@ Service credentials are only used when `useServiceCreds` is enabled in the confi ### 3. getPushClient lazy initialization `getPushClient` looks up the push client by provider in `pushClients` TMap. If not found, it calls `newPushClient` to create and register one. Push provider connections are established on first use, not at server startup. + +### 4. Service credential validity: 25h backdating, ~2700yr forward + +`genCredentials` creates self-signed Ed25519 certificates valid from 25 hours in the past to `24 * 999999` hours (~2,739 years) in the future. The 25-hour backdating protects against clock skew between NTF and SMP routers. The near-permanent forward validity avoids the need for credential rotation infrastructure. + +### 5. newPushClient race creates duplicate clients + +`newPushClient` atomically inserts into `pushClients` after creating the client. A concurrent `getPushClient` call between creation start and TMap insert will see `Nothing`, create a second client, and overwrite the first. This race is tolerable — APNS connections are cheap and the overwritten client is garbage collected. + +### 6. Bidirectional activity timestamps + +`NtfServerClient` has separate `rcvActiveAt` and `sndActiveAt` TVars, both initialized to connection time and updated independently. `disconnectTransport` considers both — a client that only receives (or only sends) is still considered active. + +### 7. pushQ bounded TBQueue creates backpressure + +`pushQ` in `NtfPushServer` is a `TBQueue` sized by `pushQSize`. When full, any thread writing to it (NMSG processing, periodic cron, verification) blocks in STM until space is available. This prevents the push delivery pipeline from being overwhelmed. + +### 8. subscriberSeq provides monotonic session variable ordering + +The `subscriberSeq` TVar is used by `getSessVar` to assign monotonically increasing IDs to subscriber session variables. `removeSessVar` uses compare-and-swap with this ID — only the variable with the matching ID can be removed, preventing stale removal when a new subscriber has already replaced the old one. + +### 9. SMPSubscriber holds Weak ThreadId for GC-based cleanup + +`subThreadId` is `Weak ThreadId`, not `ThreadId`. Using `Weak ThreadId` allows the GC to collect thread resources when no strong references remain. `stopSubscriber` uses `deRefWeak` to obtain the `ThreadId` (if the thread hasn't been GC'd) before calling `killThread`. The `Nothing` case (thread already collected) is simply skipped. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md index 2a6d8c0b1..d2a49471d 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md @@ -33,3 +33,43 @@ When APNS returns 503 (Service Unavailable), the client actively closes the HTTP ### 7. EC key type assumption `readECPrivateKey` uses a specific pattern match for EC keys (`PrivKeyEC_Named`). It will crash at runtime if the APNS key file contains a different key type. The comment acknowledges this limitation. + +### 8. JWT signature uses DER-encoded ASN.1, not raw r||s + +`signedJWTToken` serializes the ECDSA signature as a DER-encoded ASN.1 SEQUENCE of two INTEGERs, then base64url-encodes it. RFC 7518 Section 3.4 requires raw concatenation of fixed-length r and s values instead. This deviation works because Apple's APNS server accepts DER-encoded signatures, but it would break if Apple enforced strict JWS compliance. + +### 9. Two different base64url encodings + +The encryption path uses `U.encode` (base64url **with** padding `=`), while the JWT path uses `U.encodeUnpadded` (base64url **without** padding). JWT requires unpadded base64url per RFC 7515, but the encrypted notification ciphertext is padded before being embedded as a JSON text value. + +### 10. Error response defaults to empty string on parse failure + +If the APNS error response body is empty, malformed, or not JSON, `decodeStrict'` returns `Nothing` and the reason defaults to `""`. This empty string never matches named error patterns, so unparseable error bodies fall through to the catch-all of whichever status code branch matches. For 410, this means a malformed body is treated as `PPRetryLater` rather than a token invalidation. + +### 11. 410 unknown reasons are retryable, unlike 400/403 unknowns + +Unknown 410 (Gone) reasons fall through to `PPRetryLater`, while unknown 400 and 403 reasons fall through to `PPResponseError`. This means an unexpected APNS 410 reason string triggers retry behavior rather than permanent failure. + +### 12. 429 TooManyRequests is not explicitly handled + +There is a commented-out note but no actual 429 handler. A rate-limiting response falls through to the `otherwise` branch and becomes `PPResponseError`, surfacing as a generic error rather than a retryable condition. + +### 13. Nonce generation is STM-atomic, separate from encryption + +The per-notification nonce is generated inside `atomically` using the `ChaChaDRG` TVar, guaranteeing uniqueness under concurrent delivery. The nonce is then used by `cbEncrypt` outside STM. This separation means the nonce is committed to the DRG state even if encryption or send subsequently fails — correct behavior since nonce reuse would be catastrophic. + +### 14. Background notifications use priority 5, alerts use default 10 + +`apnsRequest` conditionally appends `apns-priority: 5` only for `APNSBackground` notifications. Alert and mutable-content notifications omit the header, relying on APNS's default priority of 10. Apple requires background pushes to use priority 5 — using 10 can cause APNS to reject them. + +### 15. APNSErrorResponse is data, not newtype + +The comment explicitly states `APNSErrorResponse` is `data` rather than `newtype` "to have a correct JSON encoding as a record." With `deriveFromJSON`, a newtype around `Text` would serialize as a bare string, not `{"reason": "..."}`. The `data` wrapper forces record encoding matching APNS's JSON error format. + +### 16. HTTP/2 requests go through a serializing queue + +`sendRequest` routes through the HTTP2Client's `reqQ` (a `TBQueue`), serializing all requests through a single sender thread. Concurrent push deliveries are implicitly serialized at the HTTP/2 layer, meaning high-throughput scenarios bottleneck on this queue rather than utilizing HTTP/2's multiplexing. + +### 17. Connection initialization is fire-and-forget + +`createAPNSPushClient` calls `connectHTTPS2` and discards the result with `void`. If the initial connection fails, the error is only logged — the client is still created. The first push delivery triggers `getApnsHTTP2Client` which reconnects. This means the server can start even if APNS is unreachable. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md index 971419abf..4a4439f54 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md @@ -8,7 +8,7 @@ ### 1. incServerStat double lookup -`incServerStat` performs a non-STM IO lookup first, then only enters an STM transaction on cache miss. The STM block re-checks the map to handle races (another thread may have inserted between the IO lookup and STM entry). This avoids contention on the shared TMap in the common case where the server's counter TVar already exists. +`incServerStat` performs a non-STM IO lookup first. On cache hit, the STM transaction only touches the per-server `TVar Int` without reading the shared TMap, avoiding contention. On cache miss, the STM block re-checks the map to handle races (another thread may have inserted between the IO lookup and STM entry). ### 2. setNtfServerStats is not thread safe @@ -17,3 +17,23 @@ ### 3. Backward-compatible parsing The `strP` parser uses `opt` which defaults missing fields to 0. This allows reading stats files from older server versions that don't include newer fields (`ntfReceivedAuth`, `ntfFailed`, `ntfVrf*`, etc.). + +### 4. getNtfServerStatsData is a non-atomic snapshot + +`getNtfServerStatsData` reads each `IORef` and `TMap` field sequentially in plain `IO`, not inside a single STM transaction. The returned `NtfServerStatsData` is not a consistent point-in-time snapshot — invariants like "received >= delivered" may not hold. The same applies to `getStatsByServer`, which does one `readTVarIO` for the map root TVar, then a separate `readTVarIO` for each per-server TVar. This is acceptable for periodic reporting where approximate consistency suffices. + +### 5. Mixed IORef/TVar concurrency primitives + +Aggregate counters (`ntfReceived`, `ntfDelivered`, etc.) use `IORef Int` incremented via `atomicModifyIORef'_`, while per-server breakdowns use `TMap Text (TVar Int)` incremented atomically via STM in `incServerStat`. Although both individual operations are atomic, the aggregate and per-server increments are separate operations, so their values can drift: a thread could increment the aggregate `IORef` before `incServerStat` runs, or vice versa. + +### 6. setStatsByServer replaces TMap atomically but orphans old TVars + +`setStatsByServer` builds a fresh `Map Text (TVar Int)` in IO via `newTVarIO`, then atomically replaces the TMap's root TVar. Old per-server TVars are not reused — any other thread holding a reference from a prior `TM.lookupIO` would modify an orphaned counter. Safe only because it's called at startup (like `setNtfServerStats`), but lacks the explicit "not thread safe" comment. + +### 7. Positional parser format despite key=value appearance + +The parser is strictly positional: fields must appear in exactly the serialization order. The `opt` alternatives only handle entirely absent fields (defaulting to 0), not reordered fields. Despite the `key=value` on-disk appearance, this is a sequential format — the named prefixes are for human readability, not key-lookup parsing. + +### 8. B.unlines trailing newline asymmetry + +`strEncode` uses `B.unlines`, which appends `\n` after every element including the last. The parser compensates with `optional A.endOfLine` on the last field. The file always ends with `\n`, but the parser tolerates its absence. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md index 33acdaad9..05a7e70e2 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md @@ -21,3 +21,35 @@ When a token is activated, `stmRemoveInactiveTokenRegistrations` removes ALL oth ### 4. tokenLastNtfs accumulates via prepend New notifications are prepended to the `NonEmpty PNMessageData` list via `(<|)`. The list is unbounded in the STM store — bounding is handled at the push delivery layer (the Postgres store limits to 6). + +### 5. stmDeleteNtfToken prunes empty registration maps + +When `stmDeleteNtfToken` removes a token, it deletes the entry from the inner `TMap` of `tokenRegistrations`, then checks whether that inner map is now empty via `TM.null`. If empty, it removes the outer `DeviceToken` key entirely, preventing unbounded growth of empty inner maps. In contrast, `stmRemoveInactiveTokenRegistrations` does **not** perform this cleanup — the surviving active token's registration always remains. + +### 6. stmRemoveTokenRegistration is identity-guarded + +`stmRemoveTokenRegistration` looks up the registration entry for the token's own verify key and only deletes it if the stored `NtfTokenId` matches the token's own ID. This guard prevents a token from accidentally removing a **different** token's registration that was inserted under the same `(DeviceToken, verifyKey)` pair due to a re-registration race. + +### 7. stmDeleteNtfToken silently succeeds on missing tokens + +`stmDeleteNtfToken` uses `lookupDelete` chained with monadic bind over `Maybe`. If the token ID does not exist in the `tokens` map, the registration-cleanup branch is silently skipped, and the function still proceeds to delete from `tokenLastNtfs` and `deleteTokenSubs`. It returns an empty list rather than signaling an error — the caller cannot distinguish "deleted a token with no subscriptions" from "token never existed." + +### 8. deleteTokenSubs returns SMP queues for upstream unsubscription + +`deleteTokenSubs` atomically collects all `SMPQueueNtf` values from the deleted subscriptions and returns them. This is how the server layer knows which SMP notifier subscriptions to tear down. `stmRemoveInactiveTokenRegistrations` discards this list (`void $`), meaning rival-token cleanup does **not** trigger SMP unsubscription — only explicit token deletion does. + +### 9. stmAddNtfSubscription always returns Just (vestigial Maybe) + +`stmAddNtfSubscription` has return type `STM (Maybe ())` with a comment "return Nothing if subscription existed before," but **unconditionally returns `Just ()`**. `TM.insert` overwrites any existing subscription silently. The `Maybe` return type is vestigial — the function never detects duplicates. + +### 10. stmDeleteNtfSubscription leaves empty tokenSubscriptions entries + +When `stmDeleteNtfSubscription` removes a subscription, it deletes the `subId` from the token's `Set NtfSubscriptionId` in `tokenSubscriptions` but never checks whether the set became empty. Tokens with all subscriptions individually deleted accumulate empty set entries — these are only cleaned up when the token itself is deleted via `deleteTokenSubs`. + +### 11. stmSetNtfService — asymmetric cleanup with Postgres store + +`stmSetNtfService` uses `maybe TM.delete TM.insert` to either remove or set the service association for an SMP server. This is purely a key-value update with no cascading effects on subscriptions. The Postgres store's `removeServiceAndAssociations` handles subscription cleanup separately, meaning the STM and Postgres stores have **different cleanup semantics** for service removal. + +### 12. Subscription index triple-write invariant + +`stmAddNtfSubscription` writes to three maps atomically: `subscriptions` (subId → data), `subscriptionLookup` (smpQueue → subId), and `tokenSubscriptions` (tokenId → Set subId). Single-subscription deletion (`stmDeleteNtfSubscription`) cleans the first two but only removes from the Set in the third. Bulk-token deletion (`deleteTokenSubs`) deletes the outer `tokenSubscriptions` entry entirely. Different deletion paths have different completeness guarantees. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md index 3cb5c9083..440797539 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md @@ -52,3 +52,43 @@ Only non-service-associated subscriptions (`NOT ntf_service_assoc`) are returned ### 10. Service association tracking `batchUpdateSrvSubStatus` atomically updates both subscription status and `ntf_service_assoc` flag. When notifications arrive via a service subscription (`newServiceId` is `Just`), all affected subscriptions are marked as service-associated. `removeServiceAndAssociations` resets all subscriptions for a server to `NSInactive` with `ntf_service_assoc = FALSE`. + +### 11. uninterruptibleMask_ wraps most store operations + +`withDB_` and `withClientDB` wrap the database transaction in `E.uninterruptibleMask_`. This prevents async exceptions from interrupting a PostgreSQL transaction mid-flight, which could leave a connection in a half-committed state and corrupt the pool. Functions that take a raw `DB.Connection` parameter (`getNtfServiceCredentials`, `setNtfServiceCredentials`, `updateNtfServiceId`) operate within a caller-managed transaction and are not independently wrapped. `getUsedSMPServers` uses `withTransaction` directly (intentionally: it is expected to crash on error at startup). + +### 12. Silent error swallowing with sentinel returns + +`withDB_` catches all `SomeException`, logs the error, and returns `Left (STORE msg)` — callers never see database failures as exceptions. Additionally, `batchUpdateSrvSubStatus` and `batchUpdateSrvSubErrors` use `fromRight (-1)` to convert database errors into a `-1` count, and `withPeriodicNtfTokens` uses `fromRight 0`, making database failures indistinguishable from "zero results" at the call site. + +### 13. getUsedSMPServers uncorrelated EXISTS + +The `EXISTS` subquery in `getUsedSMPServers` has no join condition to the outer `smp_servers` table — it returns ALL servers if ANY subscription anywhere has a subscribable status. This is intentional for server startup: the server needs all SMP server records (including `ServiceSub` data) to rebuild in-memory state, and the EXISTS clause is a cheap guard against an empty subscription table. + +### 14. Trigger-maintained XOR hash aggregates + +Subscription insert, update, and delete trigger functions incrementally maintain `smp_notifier_count` and `smp_notifier_ids_hash` on `smp_servers` using XOR-based hash aggregation of MD5 digests. Every `batchUpdateSrvSubStatus` or cascade-delete from token deletion implicitly fires these triggers. The XOR hash is self-inverting: adding and removing the same notifier ID restores the previous hash. `updateNtfServiceId` resets these counters to zero when the service ID changes, invalidating the previous aggregate. + +### 15. updateNtfServiceId asymmetric credential cleanup + +Setting a new service ID preserves existing TLS credentials (`ntf_service_cert`, etc.) while only resetting aggregate counters. Setting service ID to `NULL` clears both credentials AND counters. In both cases, if a previous service ID existed, all subscription associations are reset first via `removeServiceAssociation_`, and a `logError` is emitted — treating a service ID change as anomalous. + +### 16. Server upsert no-op DO UPDATE for RETURNING + +The `insertServer` fallback uses `ON CONFLICT ... DO UPDATE SET smp_host = EXCLUDED.smp_host` — a no-op update solely to make `RETURNING smp_server_id` work. PostgreSQL's `ON CONFLICT DO NOTHING` does not support `RETURNING` for conflicting rows, so this pattern forces a row to always be "affected" and thus returnable. This handles races where two concurrent `addNtfSubscription` calls both miss the initial SELECT. + +### 17. getNtfServiceCredentials FOR UPDATE serializes provisioning + +`getNtfServiceCredentials` acquires `FOR UPDATE` on the server row even though it is a read operation. The caller needs to atomically check whether credentials exist and then set them in the same transaction. Without `FOR UPDATE`, two concurrent provisioning attempts could both see `Nothing` and both provision, resulting in credential mismatch. + +### 18. deleteNtfToken string_agg with hex parsing + +`deleteNtfToken` uses `string_agg(s.smp_notifier_id :: TEXT, ',')` to aggregate `BYTEA` notifier IDs into comma-separated text, then parses with `parseByteaString` which drops the `\x` prefix and hex-decodes. `mapMaybe` silently drops any IDs that fail hex decoding, which could mask data corruption. + +### 19. withPeriodicNtfTokens streams with DB.fold + +`withPeriodicNtfTokens` uses `DB.fold` to stream token rows one at a time through a callback that performs IO (sending push notifications), meaning the database transaction and connection are held open for the entire duration of all notifications. This is deliberately routed through the non-priority pool to avoid blocking client-facing operations. + +### 20. Cursor-based pagination with byte-ordering + +`getServerNtfSubscriptions` uses `subscription_id > ?` with `ORDER BY subscription_id LIMIT ?`. Since `subscription_id` is `BYTEA`, ordering is by raw byte comparison. The batch status update uses `FROM (VALUES ...)` pattern instead of `WHERE IN (...)`, and the `s.status != upd.status` guard prevents no-op writes from firing XOR hash triggers. diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md index 263b2459a..df4021475 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Transport.md +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -8,7 +8,7 @@ ### 1. ALPN-dependent version range -`ntfServerHandshake` advertises `legacyServerNTFVRange` (v1 only) when ALPN is not available (`getSessionALPN` returns `Nothing`). When ALPN is present, it advertises the full `supportedServerNTFVRange`. This is the backward-compatibility mechanism for pre-ALPN clients that cannot negotiate newer protocol features. +`ntfServerHandshake` advertises `legacyServerNTFVRange` (v1 only) when ALPN is not available (`getSessionALPN` returns `Nothing`). When ALPN is present, it advertises the caller-provided `ntfVRange`. This is the backward-compatibility mechanism for pre-ALPN clients that cannot negotiate newer protocol features. ### 2. Version-gated features @@ -23,8 +23,16 @@ Pre-v2 connections have no command encryption or batching — commands are sent ### 3. Unused Protocol typeclass parameters -`ntfClientHandshake` accepts `_proxyServer` and `_serviceKeys` parameters that are ignored. These exist because the `Protocol` typeclass (shared with SMP) requires `protocolClientHandshake` to accept them. The NTF protocol does not support proxy routing or service authentication. +`ntfClientHandshake` accepts `_proxyServer` and `_serviceKeys` parameters that are ignored. These are passed through from the `Protocol` typeclass's `protocolClientHandshake` method for consistency with SMP. A third parameter (`Maybe C.KeyPairX25519` for key agreement) is discarded at the Protocol instance wrapper level. The NTF protocol does not support proxy routing or service authentication. ### 4. Block size -NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. Notification commands and responses are short — the main payload is the `PNMessageData` which contains encrypted message metadata. +NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. This is sufficient because NTF protocol commands (TNEW, SNEW, TCHK, etc.) and their responses are short. `PNMessageData` (which contains encrypted message metadata) is not sent over the NTF transport — it is delivered via APNS push notifications. + +### 5. Initial THandle has version 0 + +`ntfTHandle` creates a THandle with `thVersion = VersionNTF 0` — a version that no real protocol supports. This is a placeholder value that gets overwritten during version negotiation. All feature gates check `v >= authBatchCmdsNTFVersion` (v2), so the v0 placeholder disables all optional features. + +### 6. Server handshake always sends authPubKey + +`ntfServerHandshake` always includes `authPubKey = Just sk` in the server handshake, regardless of the advertised version range. The encoding functions (`encodeAuthEncryptCmds`) then decide whether to actually serialize it based on the max version. This means the key is computed even when it won't be sent. diff --git a/spec/modules/Simplex/Messaging/Notifications/Types.md b/spec/modules/Simplex/Messaging/Notifications/Types.md index bb05ccefb..97cc66913 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Types.md +++ b/spec/modules/Simplex/Messaging/Notifications/Types.md @@ -16,4 +16,4 @@ ### 3. NSADelete and NSARotate are deprecated -These `NtfSubNTFAction` values are no longer generated by current code but are retained in the type for processing legacy database records. `NSARotate` is logically "delete + recreate" while `NSADelete` is "delete notifier on NTF router + delete credentials on SMP router". +These `NtfSubNTFAction` values are no longer generated by current code but are retained in the type for processing legacy database records. `NSARotate` is logically "delete + recreate" while `NSADelete` is "delete subscription on NTF server + delete notifier credentials on SMP server". From f131531f5aefe51d0525fab59087affefb1172ee Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 16:12:20 +0000 Subject: [PATCH 23/61] xftp specs --- spec/modules/Simplex/FileTransfer/Agent.md | 86 +++++++++++++++++++ spec/modules/Simplex/FileTransfer/Client.md | 37 ++++++++ .../Simplex/FileTransfer/Client/Agent.md | 27 ++++++ .../Simplex/FileTransfer/Client/Main.md | 43 ++++++++++ spec/modules/Simplex/FileTransfer/Crypto.md | 31 +++++++ .../Simplex/FileTransfer/Description.md | 43 ++++++++++ spec/modules/Simplex/FileTransfer/Protocol.md | 36 ++++++++ spec/modules/Simplex/FileTransfer/Server.md | 85 ++++++++++++++++++ .../Simplex/FileTransfer/Server/Env.md | 24 ++++++ .../Simplex/FileTransfer/Server/Main.md | 28 ++++++ .../Simplex/FileTransfer/Server/Stats.md | 19 ++++ .../Simplex/FileTransfer/Server/Store.md | 39 +++++++++ .../Simplex/FileTransfer/Server/StoreLog.md | 33 +++++++ spec/modules/Simplex/FileTransfer/Types.md | 27 ++++++ 14 files changed, 558 insertions(+) create mode 100644 spec/modules/Simplex/FileTransfer/Agent.md create mode 100644 spec/modules/Simplex/FileTransfer/Client.md create mode 100644 spec/modules/Simplex/FileTransfer/Client/Agent.md create mode 100644 spec/modules/Simplex/FileTransfer/Client/Main.md create mode 100644 spec/modules/Simplex/FileTransfer/Crypto.md create mode 100644 spec/modules/Simplex/FileTransfer/Description.md create mode 100644 spec/modules/Simplex/FileTransfer/Protocol.md create mode 100644 spec/modules/Simplex/FileTransfer/Server.md create mode 100644 spec/modules/Simplex/FileTransfer/Server/Env.md create mode 100644 spec/modules/Simplex/FileTransfer/Server/Main.md create mode 100644 spec/modules/Simplex/FileTransfer/Server/Stats.md create mode 100644 spec/modules/Simplex/FileTransfer/Server/Store.md create mode 100644 spec/modules/Simplex/FileTransfer/Server/StoreLog.md create mode 100644 spec/modules/Simplex/FileTransfer/Types.md diff --git a/spec/modules/Simplex/FileTransfer/Agent.md b/spec/modules/Simplex/FileTransfer/Agent.md new file mode 100644 index 000000000..fd2a361d0 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Agent.md @@ -0,0 +1,86 @@ +# Simplex.FileTransfer.Agent + +> XFTP agent: worker-based file send/receive/delete with retry, encryption, redirect chains, and file description generation. + +**Source**: [`FileTransfer/Agent.hs`](../../../../src/Simplex/FileTransfer/Agent.hs) + +## Architecture + +The XFTP agent uses five worker types organized in three categories: + +| Worker | Key (server) | Purpose | +|--------|-------------|---------| +| `xftpRcvWorker` | `Just server` | Download chunks from a specific XFTP server | +| `xftpRcvLocalWorker` | `Nothing` | Decrypt completed downloads locally | +| `xftpSndPrepareWorker` | `Nothing` | Encrypt files and create chunks on servers | +| `xftpSndWorker` | `Just server` | Upload chunks to a specific XFTP server | +| `xftpDelWorker` | `Just server` | Delete chunks from a specific XFTP server | + +Workers are created on-demand via `getAgentWorker` and keyed by server address. The local workers (keyed by `Nothing`) handle CPU-bound operations that don't require network access. + +## Non-obvious behavior + +### 1. startXFTPWorkers vs startXFTPSndWorkers + +`startXFTPWorkers` starts all three worker categories (rcv, snd, del). `startXFTPSndWorkers` starts only snd workers. This distinction exists because receiving and deleting require a full agent context, while sending can operate with a partial setup (used when the agent is in send-only mode). + +### 2. Download completion triggers local worker + +When `downloadFileChunk` determines that all chunks are received (`all chunkReceived chunks`), it calls `getXFTPRcvWorker True c Nothing` to wake the local decryption worker. The `True` parameter signals that work is available. Without this, the local worker would sleep until the next `waitForWork` check. + +### 3. Decryption verifies both digest and size before decrypting + +`decryptFile` first computes the total size of all encrypted chunk files, then their SHA-512 digest. If either mismatches the expected values, it throws an error *before* starting decryption. This prevents wasting CPU on corrupted or tampered downloads. + +### 4. Redirect chain with depth limit + +When a received file has a `redirect`, the local worker: +1. Decrypts the redirect file (a YAML file description) +2. Validates the inner description's size and digest against `RedirectFileInfo` +3. Registers the inner file's chunks and starts downloading them + +The redirect chain is implicitly limited to depth 1: `createRcvFileRedirect` creates the destination file entry with `redirect = Nothing`, and `updateRcvFileRedirect` does not update the redirect column. So even if the decoded inner description contains a redirect field, the database record for the destination file has no redirect, preventing further chaining. + +### 5. Decrypting worker resumes from RFSDecrypting + +If the agent restarts while a file is in `RFSDecrypting` status, the local worker detects this and deletes the partially-decrypted output file before restarting decryption. This prevents corrupted output from a previous incomplete decryption attempt. + +### 6. Encryption worker resumes from SFSEncrypting + +Similarly, `prepareFile` checks `status /= SFSEncrypted` and deletes the partial encrypted file if status is `SFSEncrypting`. This allows clean restart of interrupted encryption. + +### 7. Redirect files must be single-chunk + +`encryptFileForUpload` for redirect files calls `singleChunkSize` instead of `prepareChunkSizes`. If the redirect file description doesn't fit in a single chunk, it throws `FILE SIZE`. This ensures redirect files are atomic — they either download completely or not at all. + +### 8. addRecipients recursive batching + +During upload, `addRecipients` recursively calls itself if a chunk needs more recipients than `xftpMaxRecipientsPerRequest`. Each iteration sends an FADD command for up to `maxRecipients` new recipients, accumulates the results, and recurses until all recipients are registered. + +### 9. File description generation cross-product + +`createRcvFileDescriptions` (in both `Agent.hs` and `Client/Main.hs`) performs a cross-product transformation: M chunks × R replicas × N recipients → N file descriptions, each containing M chunks with R replicas. The `addRcvChunk` accumulator builds a `Map rcvNo (Map chunkNo FileChunk)` to correctly distribute replicas across recipient descriptions. + +### 10. withRetryIntervalLimit caps consecutive retries + +`withRetryIntervalLimit maxN` allows at most `maxN` total attempts (initial attempt at `n=0` plus `maxN-1` retries). When all attempts are exhausted for temporary errors, the operation is silently abandoned for this work cycle — the chunk remains in pending state and may be retried on the next cycle. Only permanent errors (handled by `retryDone`) mark the file as errored. + +### 11. Retry distinguishes temporary from permanent errors + +`retryOnError` checks `temporaryOrHostError`: temporary/host errors trigger retry with exponential backoff; permanent errors (AUTH, SIZE, etc.) immediately mark the file as failed. On host errors during retry, a warning notification is sent to the client. + +### 12. Delete workers skip files older than rcvFilesTTL + +`runXFTPDelWorker` uses `rcvFilesTTL` (not a dedicated delete TTL) to filter pending deletions. Files older than this TTL would already be expired on the server, so attempting deletion is pointless. This reuses the receive TTL as a proxy for server-side expiration. + +### 13. closeXFTPAgent atomically swaps worker maps + +`closeXFTPAgent` uses `swapTVar workers M.empty` to atomically replace each worker map with an empty map, then cancels all retrieved workers. This prevents races where a new worker could be inserted between reading and clearing the map. + +### 14. assertAgentForeground dual check + +`assertAgentForeground` both throws if the agent is inactive (`throwWhenInactive`) and blocks until it's in the foreground (`waitUntilForeground`). This is called before every chunk operation to ensure the agent isn't suspended or backgrounded during file transfers. + +### 15. Per-server stats tracking + +Every chunk download, upload, and delete operation increments per-server statistics (`downloads`, `uploads`, `deletions`, `downloadAttempts`, `uploadAttempts`, `deleteAttempts`, and error variants). Size-based stats (`downloadsSize`, `uploadsSize`) track throughput in kilobytes. diff --git a/spec/modules/Simplex/FileTransfer/Client.md b/spec/modules/Simplex/FileTransfer/Client.md new file mode 100644 index 000000000..5cf87a594 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Client.md @@ -0,0 +1,37 @@ +# Simplex.FileTransfer.Client + +> XFTP client: connection management, handshake, chunk upload/download with forward secrecy. + +**Source**: [`FileTransfer/Client.hs`](../../../../src/Simplex/FileTransfer/Client.hs) + +## Non-obvious behavior + +### 1. ALPN-based handshake version selection + +`getXFTPClient` checks the ALPN result after TLS negotiation: +- **`xftpALPNv1` or `httpALPN11`**: performs v1 handshake with key exchange (`httpALPN11` is used for web port connections) +- **No ALPN or unrecognized**: uses legacy v1 transport parameters without handshake + +### 2. Server certificate chain validation + +`xftpClientHandshakeV1` validates the server's identity by checking that the CA fingerprint from the certificate chain matches the expected `keyHash` from the server address. The server signs an authentication public key (X25519) with its long-term key. The client verifies this signature against the certificate chain, then extracts the X25519 key for HMAC-based command authentication. This authentication key is distinct from the per-download ephemeral DH keys. + +### 3. Ephemeral DH key pair per download + +`downloadXFTPChunk` generates a fresh X25519 key pair for each chunk download. The public key is sent with the FGET command; the server responds with its own ephemeral key. The derived shared secret encrypts the file data in transit. This provides forward secrecy — compromising a past DH key doesn't decrypt other downloads. + +### 4. Chunk-size-proportional download timeout + +`downloadXFTPChunk` calculates the timeout as `baseTimeout + (sizeInKB * perKbTimeout)`, where `baseTimeout` is the base TCP timeout and `perKbTimeout` is a per-kilobyte timeout from the network config. Larger chunks get proportionally more time. This prevents premature timeouts on large chunks over slow connections. + +### 5. prepareChunkSizes threshold algorithm + +`prepareChunkSizes` selects chunk sizes using a 75% threshold: if the remaining payload exceeds 75% of the next larger chunk size, it uses the larger size. Otherwise, it uses the smaller size. `singleChunkSize` returns `Just size` only if the payload fits in a single chunk (used for redirect files which must be single-chunk). + +### 6. Upload sends file body after command response + +`uploadXFTPChunk` sends the FPUT command and file body in the same streaming HTTP/2 request: the protocol command block is sent first, followed immediately by the raw file data via `hSendFile`. The server response (`FROk` or error) is received only after both the command and file body have been fully sent. This is a single HTTP/2 round trip, not a two-phase interaction. + +### 7. Empty corrId as nonce + +`sendXFTPCommand` uses `""` (empty bytestring) as the correlation ID for all commands. XFTP is strictly request-response within a single HTTP/2 stream, so correlation IDs are unnecessary. The empty value is passed to `C.cbNonce` to produce a constant nonce for command authentication (HMAC/signing), not encryption — XFTP authenticates commands but does not encrypt them within the TLS tunnel. diff --git a/spec/modules/Simplex/FileTransfer/Client/Agent.md b/spec/modules/Simplex/FileTransfer/Client/Agent.md new file mode 100644 index 000000000..6ff1eebb7 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Client/Agent.md @@ -0,0 +1,27 @@ +# Simplex.FileTransfer.Client.Agent + +> XFTP client connection management with TMVar-based sharing, async retry, and connection lifecycle. + +**Source**: [`FileTransfer/Client/Agent.hs`](../../../../../src/Simplex/FileTransfer/Client/Agent.hs) + +## Non-obvious behavior + +### 1. TMVar-based connection sharing + +`getXFTPServerClient` first checks the `TMap XFTPServer (TMVar (Either XFTPClientAgentError XFTPClient))`. If no entry exists, it atomically inserts an empty `TMVar` and initiates connection. Other threads requesting the same server block on `readTMVar` until the connection is established or fails. This prevents duplicate connections to the same server. + +### 2. Async retry on temporary errors + +When `newXFTPClient` encounters a temporary error, it launches an async retry loop that attempts reconnection with backoff. The `TMVar` remains in the map but is empty until the retry succeeds. Other threads waiting on `readTMVar` block until either the retry succeeds or a permanent error occurs. + +### 3. Permanent error cleanup + +On permanent error, `newXFTPClient` puts the `Left error` into the `TMVar` (unblocking waiters) AND deletes the entry from the `TMap`. This means the next caller will see no entry and create a fresh connection attempt, rather than reading a stale error. Waiters that already read the `Left` receive the error. + +### 4. Connection timeout + +`waitForXFTPClient` wraps `readTMVar` in a timeout. If the connection establishment takes too long (e.g., server unreachable and retry loop is slow), the caller gets a timeout error rather than blocking indefinitely. The underlying connection attempt continues in the background. + +### 5. closeXFTPServerClient removes from TMap + +Closing a server client deletes its entry from the TMap, so the next request will establish a fresh connection. This is called on connection errors during file operations to force reconnection. diff --git a/spec/modules/Simplex/FileTransfer/Client/Main.md b/spec/modules/Simplex/FileTransfer/Client/Main.md new file mode 100644 index 000000000..5f7b45af4 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Client/Main.md @@ -0,0 +1,43 @@ +# Simplex.FileTransfer.Client.Main + +> XFTP CLI client: send, receive, delete files with parallel chunk operations and web URI encoding. + +**Source**: [`FileTransfer/Client/Main.hs`](../../../../../src/Simplex/FileTransfer/Client/Main.hs) + +## Non-obvious behavior + +### 1. Web URI encoding: base64url(deflate(YAML)) + +`encodeWebURI` compresses the YAML-encoded file description with raw DEFLATE, then base64url-encodes the result. `decodeWebURI` reverses this. The compressed description goes in the URL fragment (after `#`), which is never sent to the server — the file description stays client-side. + +### 2. CLI receive accepts both file paths and URLs + +`getInputFileDescription` checks if the input starts with `http://` or `https://`. If so, it extracts the URL fragment, decodes it via `decodeWebURI`, and uses the resulting file description. Otherwise, it reads a YAML file from disk. This allows receiving files via web links without a browser. + +### 3. Redirect chain depth limited to 1 + +`receive` tracks a `depth` parameter starting at 1. After following one redirect, `depth` becomes 0. A second redirect throws "Redirect chain too long". This prevents infinite redirect loops from malicious file descriptions. + +### 4. Parallel chunk uploads with server grouping + +`uploadFile` groups chunks by server via `groupAllOn`, then uses `pooledForConcurrentlyN 16` to process up to 16 server-groups concurrently. Within each group, chunks are uploaded sequentially (`mapM`). Errors from any chunk are collected and the first one is thrown. + +### 5. Random server selection + +`getXFTPServer` selects a random server from the provided list for each chunk. With a single server, it's deterministic. With multiple servers, it uses `StdGen` in a TVar for thread-safe random selection via `stateTVar`. + +### 6. withReconnect nests retry with reconnection + +`withReconnect` wraps `withRetry` twice: the outer retry reconnects to the server, and the inner operation runs against the connection. On failure, the server connection is explicitly closed before retrying, forcing a fresh connection on the next attempt. + +### 7. withRetry rejects zero retries + +`withRetry' 0` returns an "internal: no retry attempts" error. `withRetry' 1` executes the action once without retry. This off-by-one convention means `retryCount = 3` (the default) gives 3 total attempts (1 initial + 2 retries). + +### 8. File description auto-deletion prompt + +After successful receive or delete, `removeFD` either auto-deletes the file description (if `--yes` flag) or prompts the user. This prevents accidental reuse of one-time file descriptions — each receive consumes the description by ACKing chunks on the server. + +### 9. Sender description uses first replica's server + +`createSndFileDescription` takes the server from the first replica of each chunk for the sender's `FileChunkReplica`. This reflects the current limitation that each chunk is uploaded to exactly one server — the sender description records that single server. diff --git a/spec/modules/Simplex/FileTransfer/Crypto.md b/spec/modules/Simplex/FileTransfer/Crypto.md new file mode 100644 index 000000000..1911de60e --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Crypto.md @@ -0,0 +1,31 @@ +# Simplex.FileTransfer.Crypto + +> File encryption and decryption with streaming, padding, and auth tag verification. + +**Source**: [`FileTransfer/Crypto.hs`](../../../../src/Simplex/FileTransfer/Crypto.hs) + +## Non-obvious behavior + +### 1. Embedded file header in encrypted stream + +`encryptFile` prepends the `FileHeader` (containing filename and optional `fileExtra`) to the plaintext before encryption. A total data size field (8 bytes, `fileSizeLen`) is prepended before the header, encoding the combined size of header + file content. The decryptor uses this to distinguish real data from padding. The recipient must parse the header after decryption to recover the original filename — the header is not transmitted separately. + +### 2. Fixed-size padding hides actual file size + +The encrypted output is padded to `encSize` (the sum of chunk sizes). Since chunk sizes are fixed powers of 2 (64KB, 256KB, 1MB, 4MB), the encrypted file size reveals only which chunk size bucket the file falls into, not the actual size. The encryption streams data with `LC.sbEncryptChunk` in a loop, pads the remaining space, then manually appends the auth tag via `LC.sbAuth`. This manual streaming approach (rather than using the all-at-once `LC.sbEncryptTailTag`) is necessary because encryption is interleaved with file I/O. + +### 3. Dual decrypt paths: single-chunk vs multi-chunk + +`decryptChunks` takes different paths based on chunk count: +- **Single chunk**: reads the entire file into memory via `LB.readFile`, decrypts in-memory with `LC.sbDecryptTailTag` +- **Multiple chunks**: opens the destination file for writing and streams through each chunk file with `LC.sbDecryptChunkLazy` (lazy bytestring variant), verifying the auth tag from the final chunk + +The single-chunk path avoids file handle management overhead for small files. + +### 4. Auth tag failure deletes output file + +In the multi-chunk streaming path, if `BA.constEq` detects an auth tag mismatch after decrypting all chunks, the partially-written output file is deleted before returning `FTCEInvalidAuthTag`. This prevents consumers from using a file whose integrity is unverified. + +### 5. Streaming encryption uses 64KB blocks + +`encryptFile` reads plaintext in 65536-byte blocks (`LC.sbEncryptChunk`), regardless of the XFTP chunk size. These are encryption blocks within a single continuous stream — not to be confused with XFTP protocol chunks which are much larger (64KB–4MB). diff --git a/spec/modules/Simplex/FileTransfer/Description.md b/spec/modules/Simplex/FileTransfer/Description.md new file mode 100644 index 000000000..b4c7e2fe9 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Description.md @@ -0,0 +1,43 @@ +# Simplex.FileTransfer.Description + +> File description: YAML encoding/decoding, validation, URI format, and replica optimization. + +**Source**: [`FileTransfer/Description.hs`](../../../../src/Simplex/FileTransfer/Description.hs) + +## Non-obvious behavior + +### 1. ValidFileDescription non-exported constructor + +`ValidFileDescription` is a newtype with a non-exported data constructor (`ValidFD`), but the module exports a bidirectional pattern synonym `ValidFileDescription` that can be used as a constructor. Despite this, `validateFileDescription` provides the canonical validation path, checking: +- Chunk numbers are sequential starting from 1 +- Total chunk sizes equal the declared file size + +Note: an empty chunk list with size 0 passes validation — there is no explicit "at least one chunk" check. + +### 2. First-replica-only digest and chunkSize + +When encoding chunks to YAML via `unfoldChunksToReplicas`, the `digest` and non-default `chunkSize` fields are only included on the first replica of each chunk. Subsequent replicas of the same chunk omit these fields. `foldReplicasToChunks` reconstructs them by carrying forward the digest/size from the first replica. If replicas have conflicting digests or sizes, validation fails. + +### 3. Default chunkSize elision + +The top-level `FileDescription` has a `chunkSize` field. Individual chunk replicas only serialize their `chunkSize` if it differs from this default. This saves space in the common case where most chunks are the same size (only the last chunk may be smaller). + +### 4. YAML encoding groups replicas by server + +`groupReplicasByServer` groups all chunk replicas by their server, producing `FileServerReplica` records. This is the serialization format — replicas are organized by server, not by chunk. The parser (`foldReplicasToChunks`) reverses this grouping back to per-chunk replica lists. + +### 5. FileDescriptionURI uses query-string encoding + +`FileDescriptionURI` serializes file descriptions into a compact query-string format (key=value pairs separated by `&`) with `QEscape` encoding for binary values. This is distinct from the YAML format used for file-based descriptions. The URI format is designed for embedding in links. + +### 6. QR code size limit + +`qrSizeLimit = 1002` bytes limits the maximum size of a file description URI that can be encoded as a QR code. Descriptions exceeding this limit cannot be shared via QR code and require alternative transport. + +### 7. Soft and hard file size limits + +Two limits exist: `maxFileSize = 1GB` (soft limit, checked by CLI client) and `maxFileSizeHard = 5GB` (hard limit, checked during agent-side encryption). The soft limit is a user-facing guard; the hard limit prevents resource exhaustion during encryption. + +### 8. Redirect file descriptions + +A `FileDescription` can contain a `redirect` field pointing to another file's metadata (`RedirectFileInfo` with size and digest). The outer description downloads an encrypted YAML file that, once decrypted, yields the actual `FileDescription` for the real file. This adds one level of indirection for privacy — the relay servers hosting the redirect don't know the actual file's servers. diff --git a/spec/modules/Simplex/FileTransfer/Protocol.md b/spec/modules/Simplex/FileTransfer/Protocol.md new file mode 100644 index 000000000..f31c90561 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Protocol.md @@ -0,0 +1,36 @@ +# Simplex.FileTransfer.Protocol + +> XFTP protocol types, commands, responses, and credential verification. + +**Source**: [`FileTransfer/Protocol.hs`](../../../../src/Simplex/FileTransfer/Protocol.hs) + +## Non-obvious behavior + +### 1. Asymmetric credential checks by command + +`checkCredentials` enforces different rules per command: +- **FNEW**: requires `auth` (signature) but must NOT have a `fileId` — the sender key from the command body is used for verification +- **PING**: must have NEITHER `auth` NOR `fileId` — actively rejects their presence +- **All others** (FADD, FPUT, FDEL, FGET, FACK): require both `fileId` AND auth key + +This asymmetry means FNEW and PING bypass the standard entity-lookup path entirely — they are handled as separate `XFTPRequest` constructors (`XFTPReqNew`, `XFTPReqPing`). + +### 2. BLOCKED response downgraded to AUTH for old clients + +`encodeProtocol` checks the protocol version: if `v < blockedFilesXFTPVersion`, a `BLOCKED` response is encoded as `AUTH` instead. This prevents old clients that don't understand `BLOCKED` from receiving an unknown error type. The blocking information is silently lost for these clients. + +### 3. Single-transmission batch enforcement + +`xftpDecodeTServer` calls `xftpDecodeTransmission` which rejects batches containing more than one transmission. Despite using the batch framing format (length-prefixed), XFTP requires exactly one command per request. This differs from SMP where true batching is supported. + +### 4. xftpEncodeBatch1 always uses batch framing + +Even for single transmissions, `xftpEncodeBatch1` wraps the encoded transmission in batch format (1-byte count prefix + 2-byte length-prefixed transmission). There is no "non-batch" mode in XFTP — all protocol messages use the batch wire format regardless of the negotiated version. + +### 5. FileParty GADT partitions command space + +Commands are indexed by `FileParty` (`SFSender` / `SFRecipient`) at the type level via `FileCmd`. This ensures at compile time that sender commands (FNEW, FADD, FPUT, FDEL) and recipient commands (FGET, FACK, PING) cannot be confused. The server pattern-matches on `SFileParty` to determine which index (sender vs recipient) to look up in the file store. + +### 6. Empty corrId and implicit session ID + +`sendXFTPCommand` in the client uses an empty bytestring as `corrId`. This empty value is passed to `C.cbNonce` to produce a constant nonce for command authentication (HMAC/signing). With `implySessId = False` in the default XFTP transport setup, the session ID is not prepended to entity IDs during parsing. Session identity is provided by the TLS connection itself. diff --git a/spec/modules/Simplex/FileTransfer/Server.md b/spec/modules/Simplex/FileTransfer/Server.md new file mode 100644 index 000000000..f3a01314d --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server.md @@ -0,0 +1,85 @@ +# Simplex.FileTransfer.Server + +> XFTP server: HTTP/2 request handling, handshake state machine, file operations, and statistics. + +**Source**: [`FileTransfer/Server.hs`](../../../../src/Simplex/FileTransfer/Server.hs) + +## Architecture + +The XFTP server runs several concurrent threads via `raceAny_`: + +| Thread | Purpose | +|--------|---------| +| `runServer` | HTTP/2 server accepting file transfer requests | +| `expireFiles` | Periodic file expiration with throttling | +| `logServerStats` | Periodic stats flush to CSV | +| `savePrometheusMetrics` | Periodic Prometheus metrics dump | +| `runCPServer` | Control port for admin commands | + +## Non-obvious behavior + +### 1. Three-state handshake with session caching + +The server maintains a `TMap SessionId Handshake` with three states: +- **No entry**: first request — for non-SNI or `xftp-web-hello` requests, `processHello` generates DH key pair and sends server handshake; for SNI requests without `xftp-web-hello`, returns `SESSION` error +- **`HandshakeSent pk`**: server hello sent, waiting for client handshake with version negotiation +- **`HandshakeAccepted thParams`**: handshake complete, subsequent requests use cached params + +Web clients can re-send hello (`xftp-web-hello` header) even in `HandshakeSent` or `HandshakeAccepted` states — the server reuses the existing private key rather than generating a new one. + +### 2. Web identity proof via challenge-response + +When a web client sends a hello with a non-empty body, the server parses an `XFTPClientHello` containing a `webChallenge`. The server signs `challenge <> sessionId` with its long-term key and includes the signature in the handshake response. This proves server identity to web clients that cannot verify TLS certificates directly. + +### 3. skipCommitted drains request body on re-upload + +If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is `Just`), it cannot simply ignore the request body — the HTTP/2 client would block waiting for the server to consume it. Instead, `skipCommitted` reads and discards the entire body in `fileBlockSize` increments, returning `FROk` when complete. This makes FPUT idempotent from the client's perspective. + +### 4. Atomic quota reservation with rollback + +`receiveServerFile` uses `stateTVar` to atomically check and reserve storage quota before receiving the file. If the upload fails (timeout, size mismatch, IO error), the reserved size is subtracted from `usedStorage` and the partial file is deleted. This prevents failed uploads from permanently consuming quota. + +### 5. retryAdd generates new IDs on collision + +`createFile` and `addRecipient` use `retryAdd` which generates a random ID and makes up to 3 total attempts (initial + 2 retries) on `DUPLICATE_` errors. This handles the astronomically unlikely case of random ID collision without requiring uniqueness checking before insertion. + +### 6. Timing attack mitigation on entity lookup + +`verifyXFTPTransmission` calls `dummyVerifyCmd` (imported from SMP server) when a file entity is not found. This equalizes response timing to prevent attackers from distinguishing "entity doesn't exist" from "signature invalid" based on latency. + +### 7. BLOCKED vs EntityOff distinction + +When `verifyXFTPTransmission` reads `fileStatus`: +- `EntityActive` → proceed with command +- `EntityBlocked info` → return `BLOCKED` with blocking reason +- `EntityOff` → return `AUTH` (same as entity-not-found) + +`EntityOff` is treated identically to missing entities for information-hiding purposes. + +### 8. blockServerFile deletes the physical file + +Despite the name suggesting it only marks a file as blocked, `blockServerFile` also deletes the physical file from disk via `deleteOrBlockServerFile_`. The `deleted = True` parameter to `blockFile` in the store adjusts `usedStorage`. A blocked file returns `BLOCKED` errors on access but has no data on disk. + +### 9. Stats restore overrides counts from live store + +`restoreServerStats` loads stats from the backup file but overrides `_filesCount` and `_filesSize` with values computed from the live file store (TMap size and `usedStorage` TVar). If the backup values differ, warnings are logged. This handles cases where files were expired or deleted while the server was down. + +### 10. File expiration with configurable throttling + +`expireServerFiles` accepts an optional `itemDelay` (100ms when called from the periodic thread, `Nothing` at startup). Between each file check, `threadDelay itemDelay` prevents expiration from monopolizing IO. At startup, files are expired without delay to clean up quickly. + +### 11. Stats log aligns to wall-clock midnight + +`logServerStats` computes an `initialDelay` to align the first stats flush to `logStatsStartTime` (default 0 = midnight UTC). If the target time already passed today, it adds 86400 seconds for the next day. Subsequent flushes use exact `logInterval` cadence. + +### 12. Physical file deleted before store cleanup + +`deleteOrBlockServerFile_` removes the physical file first, then runs the STM store action. If the process crashes between these two operations, the store will reference a file that no longer exists on disk. The next access would return `AUTH` (file not found on disk), and eventual expiration would clean the store entry. + +### 13. SNI-dependent CORS and web serving + +CORS headers require both `sniUsed = True` and `addCORSHeaders = True` in the transport config. Static web page serving is enabled when `sniUsed = True`. Non-SNI connections (direct TLS without hostname) skip both CORS and web serving. This separates the web-facing and protocol-facing behaviors of the same port. + +### 14. Control port file operations use recipient index + +`CPDelete` and `CPBlock` commands look up files via `getFile fs SFRecipient fileId`, meaning the control port takes a recipient ID, not a sender ID. This is the ID visible to recipients and contained in file descriptions. diff --git a/spec/modules/Simplex/FileTransfer/Server/Env.md b/spec/modules/Simplex/FileTransfer/Server/Env.md new file mode 100644 index 000000000..e9f509a1a --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server/Env.md @@ -0,0 +1,24 @@ +# Simplex.FileTransfer.Server.Env + +> XFTP server environment: configuration, storage quota tracking, and request routing. + +**Source**: [`FileTransfer/Server/Env.hs`](../../../../../src/Simplex/FileTransfer/Server/Env.hs) + +## Non-obvious behavior + +### 1. Startup storage accounting with quota warning + +`newXFTPServerEnv` computes `usedStorage` by summing file sizes from the in-memory store at startup. If the computed usage exceeds the configured `fileSizeQuota`, a warning is logged but the server still starts. This allows the server to come up even if it's over quota (e.g., after a quota reduction), relying on expiration to reclaim space. + +### 2. XFTPRequest ADT separates new files from commands + +`XFTPRequest` has three constructors: +- `XFTPReqNew`: file creation (carries `FileInfo`, recipient keys, optional basic auth) +- `XFTPReqCmd`: command on an existing file (carries file ID, `FileRec`, and the command) +- `XFTPReqPing`: health check + +This separation occurs after credential verification in `Server.hs`. `XFTPReqNew` bypasses entity lookup entirely since the file doesn't exist yet. + +### 3. fileTimeout for upload deadline + +`fileTimeout` in `XFTPServerConfig` sets the maximum time allowed for a single file upload (FPUT). The server wraps the receive operation in `timeout fileTimeout`. Default is 5 minutes (for 4MB chunks). This prevents slow or stalled uploads from holding server resources indefinitely. diff --git a/spec/modules/Simplex/FileTransfer/Server/Main.md b/spec/modules/Simplex/FileTransfer/Server/Main.md new file mode 100644 index 000000000..54a45751f --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server/Main.md @@ -0,0 +1,28 @@ +# Simplex.FileTransfer.Server.Main + +> XFTP server CLI: INI configuration parsing, TLS setup, and default constants. + +**Source**: [`FileTransfer/Server/Main.hs`](../../../../../src/Simplex/FileTransfer/Server/Main.hs) + +## Non-obvious behavior + +### 1. Key server constants + +| Constant | Value | Purpose | +|----------|-------|---------| +| `fileIdSize` | 16 bytes | Random file/recipient ID length | +| `fileTimeout` | 5 minutes | Maximum upload duration per chunk | +| `logStatsInterval` | 86400s (daily) | Stats CSV flush interval | +| `logStatsStartTime` | 0 (midnight UTC) | First stats flush time-of-day | + +### 2. allowedChunkSizes defaults to all four sizes + +If not configured, `allowedChunkSizes` defaults to `[kb 64, kb 256, mb 1, mb 4]`. The INI file can restrict this to a subset, controlling which chunk sizes the server accepts. + +### 3. Storage quota from INI with unit parsing + +`fileSizeQuota` is parsed from the INI `[STORE_LOG]` section using `FileSize` parsing, which accepts byte values with optional unit suffixes (KB, MB, GB). Absence means unlimited quota (`Nothing`). + +### 4. Dual TLS credential support + +The server supports both primary TLS credentials (`caCertificateFile`/`certificateFile`/`privateKeyFile`) and optional HTTP-specific credentials (`httpCaCertificateFile`/etc.). When HTTP credentials are present, the server uses `defaultSupportedParamsHTTPS` which enables broader TLS compatibility for web clients. diff --git a/spec/modules/Simplex/FileTransfer/Server/Stats.md b/spec/modules/Simplex/FileTransfer/Server/Stats.md new file mode 100644 index 000000000..7e684c58a --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server/Stats.md @@ -0,0 +1,19 @@ +# Simplex.FileTransfer.Server.Stats + +> XFTP server statistics: IORef-based counters with backward-compatible persistence. + +**Source**: [`FileTransfer/Server/Stats.hs`](../../../../../src/Simplex/FileTransfer/Server/Stats.hs) + +## Non-obvious behavior + +### 1. setFileServerStats is not thread safe + +`setFileServerStats` directly writes to IORefs without synchronization. It is explicitly intended for server startup only (restoring from backup file), before any concurrent threads are running. + +### 2. Backward-compatible parsing + +The `strP` parser uses `opt` for newer fields, defaulting missing fields to 0. This allows reading stats files from older server versions that don't include fields like `filesBlocked` or `fileDownloadAcks`. + +### 3. PeriodStats for download tracking + +`filesDownloaded` uses `PeriodStats` (not a simple `IORef Int`) to track unique file downloads over time periods (day/week/month). This enables the CSV stats log to report distinct files downloaded per period, not just total download count. diff --git a/spec/modules/Simplex/FileTransfer/Server/Store.md b/spec/modules/Simplex/FileTransfer/Server/Store.md new file mode 100644 index 000000000..89b0c3b36 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server/Store.md @@ -0,0 +1,39 @@ +# Simplex.FileTransfer.Server.Store + +> STM-based in-memory file store with dual indices, storage accounting, and privacy-preserving expiration. + +**Source**: [`FileTransfer/Server/Store.hs`](../../../../../src/Simplex/FileTransfer/Server/Store.hs) + +## Non-obvious behavior + +### 1. Dual-index lookup by sender and recipient + +The file store maintains two indices: `files :: TMap SenderId FileRec` (by sender ID) and `recipients :: TMap RecipientId (SenderId, RcvPublicAuthKey)` (by recipient ID, storing the sender ID and the recipient's public auth key). `getFile` dispatches on `SFileParty`: sender lookups use `files` directly, recipient lookups use `recipients` to find the `SenderId` then look up the `FileRec` in `files`. This means recipient operations require two TMap lookups. + +### 2. addRecipient checks both inner Set and global TMap + +`addRecipient` first checks the per-file `recipientIds` Set for duplicates, then inserts into the global `recipients` TMap. If either has a collision, it returns `DUPLICATE_`. The dual check is necessary because the Set tracks per-file membership while the TMap enforces global uniqueness of recipient IDs. + +### 3. Storage accounting on upload completion + +`setFilePath` adds the file size to `usedStorage` and records the file path in the `filePath` TVar. However, during normal FPUT handling, `Server.hs` does NOT call `setFilePath` — it directly writes `filePath` via `writeTVar`. The quota reservation in `Server.hs` (`stateTVar` on `usedStorage`) is the sole `usedStorage` increment during upload. `setFilePath` IS called during store log replay (`StoreLog.hs`), where it increments `usedStorage`; `newXFTPServerEnv` then overwrites with the correct value computed from the live store. + +### 4. deleteFile removes all recipients atomically + +`deleteFile` atomically removes the sender entry from `files`, all recipient entries from the global `recipients` TMap, and unconditionally subtracts the file size from `usedStorage` (regardless of whether the file was actually uploaded). The entire operation runs in a single STM transaction. + +### 5. RoundedSystemTime for privacy-preserving expiration + +File timestamps use `RoundedFileTime` which is `RoundedSystemTime 3600` — system time rounded to 1-hour precision. This means files created within the same hour have identical timestamps. An observer with access to the store cannot determine exact file creation times, only the hour. + +### 6. expiredFilePath returns path only if expired + +`expiredFilePath` returns `STM (Maybe (Maybe FilePath))`. The outer `Maybe` is `Nothing` when the file doesn't exist or isn't expired; the inner `Maybe` is the file path (present only if the file was uploaded). The expiration check adds `fileTimePrecision` (one hour) to the creation timestamp before comparing, providing a grace period. The caller uses the inner path to decide whether to also delete the physical file. + +### 7. ackFile removes single recipient + +`ackFile` removes a specific recipient from both the global `recipients` TMap and the per-file `recipientIds` Set. Unlike `deleteFile` which removes the entire file, `ackFile` only removes one recipient's access. The file and other recipients remain intact. + +### 8. blockFile conditional storage adjustment + +`blockFile` takes a `deleted :: Bool` parameter. When `True` (file blocked with physical deletion), it subtracts the file size from `usedStorage`. When `False` (block without deletion), storage is unchanged. This allows blocking without physical deletion for audit purposes. Currently, both the server's `blockServerFile` and the store log replay path pass `True`. diff --git a/spec/modules/Simplex/FileTransfer/Server/StoreLog.md b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md new file mode 100644 index 000000000..35a339515 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md @@ -0,0 +1,33 @@ +# Simplex.FileTransfer.Server.StoreLog + +> Append-only store log for XFTP file operations with error-resilient replay and compaction. + +**Source**: [`FileTransfer/Server/StoreLog.hs`](../../../../../src/Simplex/FileTransfer/Server/StoreLog.hs) + +## Non-obvious behavior + +### 1. Error-resilient replay + +`readFileStore` parses the store log line-by-line. Lines that fail to parse or fail to process (e.g., referencing a nonexistent sender ID) are logged as errors but do not halt replay. The store is reconstructed from whatever valid entries exist. This allows the server to recover from partial log corruption. + +### 2. Sender ID validation on recipient writes + +`writeFileStore` during compaction validates that each recipient's sender ID in the `recipients` TMap matches the `senderId` of the corresponding `FileRec`. This guards against in-memory state corruption (e.g., if a bug caused the `recipients` TMap and `FileRec.recipientIds` to get out of sync), not log corruption — the validation happens before writing the compacted log. + +### 3. Backward-compatible status parsing + +`AddFile` log entries include an `EntityStatus` field. The parser uses `<|> pure EntityActive` as a fallback, defaulting to `EntityActive` when the status field is missing. This allows reading store logs from older server versions that didn't record entity status. + +### 4. Compaction on restart + +`readFileStore` replays the full log to rebuild the in-memory store. The caller (in `Server/Env.hs`) then writes a fresh, compacted store log containing only the current state. This eliminates deleted entries and redundant operations, keeping the log size proportional to active state rather than total history. + +### 5. Log entry types track operation lifecycle + +Six log entry types capture the complete file lifecycle: +- `AddFile`: file creation with sender ID, file info, timestamp, and status +- `AddRecipients`: recipient registration (batched as `NonEmpty FileRecipient`) with sender ID association +- `PutFile`: upload completion with file path +- `DeleteFile`: file deletion by sender ID +- `AckFile`: single recipient acknowledgment +- `BlockFile`: file blocking with blocking info diff --git a/spec/modules/Simplex/FileTransfer/Types.md b/spec/modules/Simplex/FileTransfer/Types.md new file mode 100644 index 000000000..814e65195 --- /dev/null +++ b/spec/modules/Simplex/FileTransfer/Types.md @@ -0,0 +1,27 @@ +# Simplex.FileTransfer.Types + +> Agent-side file transfer types: receive/send file records, status state machines, chunk/replica structures. + +**Source**: [`FileTransfer/Types.hs`](../../../../src/Simplex/FileTransfer/Types.hs) + +## Non-obvious behavior + +### 1. Receive file status state machine + +`RcvFileStatus` progresses: `RFSReceiving` → `RFSReceived` → `RFSDecrypting` → `RFSComplete`, with `RFSError` as a terminal state reachable from any non-complete state. The `RFSReceived` → `RFSDecrypting` transition is significant: all chunks are downloaded but decryption hasn't started. The local worker (server=Nothing) picks up files in `RFSReceived` status. + +### 2. Send file status state machine + +`SndFileStatus` progresses: `SFSNew` → `SFSEncrypting` → `SFSEncrypted` → `SFSUploading` → `SFSComplete`, with `SFSError` as terminal. The prepare worker handles `SFSNew` → `SFSEncrypted` (including retry from `SFSEncrypting`), while per-server upload workers handle `SFSUploading` → `SFSComplete`. + +### 3. Encrypted file path convention + +`sndFileEncPath` constructs the path as `prefixPath "xftp.encrypted"`. This is a convention shared between the agent (`Agent.hs`) and this module — both must agree on where the encrypted intermediate file lives relative to the prefix directory. + +### 4. FileHeader fileExtra for future extension + +`FileHeader` contains `fileName` and an optional `fileExtra :: Maybe Text` field. Currently unused (`Nothing` in all callers), it provides a forward-compatible extension point embedded in the encrypted file header without requiring protocol version changes. + +### 5. authTagSize = 16 bytes + +`authTagSize` is defined as `fromIntegral C.authTagSize` (16 bytes). This is the AES-GCM authentication tag appended to the encrypted file stream. It is included in the payload size calculation (`payloadSize = fileSize' + fileSizeLen + authTagSize`), which is then passed to `prepareChunkSizes` to determine chunk allocation. From ceeeeec4765d644490dde625bb152bce42ff6bf0 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 16:33:45 +0000 Subject: [PATCH 24/61] more topics --- spec/TOPICS.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/spec/TOPICS.md b/spec/TOPICS.md index c29e61705..0489d79df 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -35,3 +35,35 @@ - **Queue rotation protocol**: Four agent messages (QADD → QKEY → QUSE → QTEST) on top of SMP commands, with asymmetric state machines on receiver side (`RcvSwitchStatus`: 4 states) and sender side (`SndSwitchStatus`: 2 states). Receiver initiates, creates new queue, sends QADD. Sender responds with QKEY. Receiver sends QUSE. Sender sends QTEST to complete. State types in Agent/Protocol.hs, orchestration in Agent.hs, queue creation/deletion in Agent/Client.hs. Protocol spec in agent-protocol.md. The fast variant (v9+ SMP with SKEY) skips the KEY command step. - **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. + +- **NTF token lifecycle**: Token registration (TNEW) → verification push → NTConfirmed → TVFY → NTActive, with idempotent re-registration (DH secret check), TRPL (device token replacement reusing DH key), status repair for stuck tokens, and `PPApnsNull` test tokens suppressing stats. The lifecycle spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (command handling, verification push delivery), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (conditional status updates, duplicate registration cleanup), [Types.hs](modules/Simplex/Messaging/Notifications/Types.md) (NtfTknStatus state machine), and [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (push client lazy initialization). + +- **NTF push delivery pipeline**: Bounded TBQueue (`pushQ`) creates backpressure → `ntfPush` thread reads → `checkActiveTkn` gates PNMessage (but not PNVerification or PNCheckMessages) → APNS delivery with single retry on connection errors (new push client on retry) → PPTokenInvalid marks token NTInvalid. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [APNS.hs](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) (DER JWT signing, HTTP/2 serializing queue, fire-and-forget connection), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (push client caching with race tolerance). + +- **NTF service subscription model**: Service-level subscriptions (SUBS/NSUBS on SMP) vs individual queue subscriptions, with fallback from service to individual when `CAServiceUnavailable`. Service credentials are lazily generated per SMP server with 25h backdating and ~2700yr validity. XOR hash triggers on PostgreSQL maintain subscription aggregate counts. Subscription status tracking uses `ntf_service_assoc` flag to distinguish service-associated from individually-subscribed queues. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (subscriber thread, service fallback), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (lazy credential generation, Weak ThreadId subscriber cleanup), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (XOR hash triggers, batch status updates, cursor-based pagination). + +- **NTF startup resubscription**: `resubscribe` runs as detached `forkIO` (not in `raceAny_` group), uses `mapConcurrently` across SMP servers, each with `subscribeLoop` using 100x database batch multiplier and cursor-based pagination. `ExitCode` exceptions from `exitFailure` on DB error propagate to main thread despite `forkIO`. `getServerNtfSubscriptions` claims subscriptions by batch-updating to `NSPending`. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md). + +- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-server chunk creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-server upload (command + file body in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). + +- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-server chunk download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, chunk-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with server grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). + +- **XFTP handshake state machine**: Three-state session-cached handshake (`No entry` → `HandshakeSent` → `HandshakeAccepted`) per HTTP/2 session. Web clients use `xftp-web-hello` header and challenge-response identity proof; native clients use standard ALPN. SNI presence gates CORS headers, web serving, and SESSION error for unrecognized connections. Key reuse on re-hello preserves existing DH keys. Spans [Server.hs](modules/Simplex/FileTransfer/Server.md) (handshake logic, CORS, web serving), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ALPN selection, cert chain validation), [Transport.hs](modules/Simplex/FileTransfer/Transport.md) (block size, version). + +- **XFTP storage lifecycle**: Quota reservation via atomic `stateTVar` before upload → rollback on failure (subtract + delete partial file) → physical file deleted before store cleanup (crash risk: store references missing file) → `RoundedSystemTime 3600` for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between files) → startup storage reconciliation (override stats from live store). Spans [Server.hs](modules/Simplex/FileTransfer/Server.md), [Server/Store.hs](modules/Simplex/FileTransfer/Server/Store.md), [Server/Env.hs](modules/Simplex/FileTransfer/Server/Env.md), [Server/StoreLog.hs](modules/Simplex/FileTransfer/Server/StoreLog.md) (error-resilient replay, compaction). + +- **XFTP worker architecture**: Five worker types in three categories: rcv (per-server download + local decryption), snd (local prepare/encrypt + per-server upload), del (per-server delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every chunk operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). + +- **SessionVar protocol client lifecycle**: Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern: `getSessVar` atomically checks TMap → `newProtocolClient` fills TMVar on success/failure → `waitForProtocolClient` reads with timeout. Error caching via `persistErrorInterval` prevents connection storms (failed connections cache the error with expiry; callers receive cached error without reconnecting). `removeSessVar` uses monotonic `sessionVarId` compare-and-swap to prevent stale disconnect callbacks from removing newer clients. SMP has additional complexity: `SMPConnectedClient` wraps client with per-connection proxied relay map, `updateClientService` synchronizes service credentials post-connect, disconnect callback moves subscriptions to pending with session-ID matching. XFTP always uses `NRMBackground` timing regardless of caller request. Spans [Session.md](modules/Simplex/Messaging/Session.md), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (lifecycle, disconnect callbacks, reconnection workers), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop consuming events). + +- **Dual-backend agent store**: The agent store (~3700 lines in AgentStore.hs) compiles for both SQLite and PostgreSQL via `#if defined(dbPostgres)` CPP guards. Key behavioral differences: PostgreSQL uses `FOR UPDATE` row locking on reads preceding writes (SQLite relies on single-writer model); PostgreSQL uses `IN ?` with `In` wrapper for batch queries (SQLite falls back to per-row `forM` loops); PostgreSQL uses `constraintViolation` (SQLite checks `SQL.ErrorConstraint`); `createWithRandomId'` uses savepoints on PostgreSQL (failed statement aborts entire transaction without them). One known bug: `checkConfirmedSndQueueExists_` uses `#if defined(dpPostgres)` (typo: `dp` not `db`), so the `FOR UPDATE` clause is never included on any backend. Spans [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md), [SQLite.md](modules/Simplex/Messaging/Agent/Store/SQLite.md). + +- **Deferred message encryption**: Message bodies are NOT encrypted at enqueue time. `enqueueMessageB` advances the ratchet header and validates padding, but stores only the body reference (`sndMsgBodyId`) and encryption key. Actual encryption (`rcEncryptMsg`) happens at delivery time in `runSmpQueueMsgDelivery`. This enables body deduplication via `VRValue`/`VRRef` — identical bodies (common for group messages) share one database row, but each connection's delivery encrypts independently with its own ratchet. Confirmation and ratchet key messages bypass deferred encryption (pre-encrypted at enqueue time). Spans [Agent.md](modules/Simplex/Messaging/Agent.md) (enqueue + delivery), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (`snd_message_bodies` storage). + +- **NTF agent subscription lifecycle**: The agent-side notification subscription system uses a supervisor-worker architecture with three worker pools (NTF server, SMP server, token deletion). `NSCCreate` triggers a four-way partition (`partitionQueueSubActions`): new sub, reset sub (credential mismatch or null action), continue SMP work, continue NTF work. Workers coordinate with the supervisor via `updated_by_supervisor` flag — workers only update local fields when the flag is set, preventing overwrite of supervisor decisions. The null-action sentinel (`workerErrors` sets action to NULL on permanent failure) bridges worker failure recovery to supervisor-driven re-creation. `retrySubActions` uses a shrinking TVar — each iteration only retries subs with temporary errors, so batches get smaller over time. `rescheduleWork` handles time-scheduled health checks by forking a sleep thread that re-signals `doWork`. Spans [NtfSubSupervisor.md](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) (supervisor, worker pools), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (updated_by_supervisor, null-action sentinel), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (worker framework). + +- **Session-aware SMP subscription management**: SMP queue subscriptions are tracked per transport session with session-ID validation at multiple points. `subscribeQueues` groups queues by transport session, subscribes concurrently, then validates `activeClientSession` post-RPC — if the client was replaced during the RPC, results are discarded and converted to temporary errors for retry. `removeClientAndSubs` (disconnect cleanup) only demotes subscriptions whose session ID matches the disconnecting client. Batch UP notifications are accumulated across transmissions and deduplicated against already-active subscriptions. When ALL results are temporary errors and no connections were already active, the SMP client is closed to force fresh connection. `maxPending` throttles concurrent pending subscriptions with STM retry backpressure. Spans [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (subscription state, session validation), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop, processSMPTransmissions, UP accumulation). + +- **Agent message envelope**: Agent messages use a two-layer format — outer `AgentMsgEnvelope` (version + type tag C/M/I/R + payload) and inner `AgentMessage` (after double-ratchet decryption, tags I/D/R/M + AMessage). Tag characters deliberately overlap between layers (disambiguated by context). `AgentInvitation` uses only per-queue E2E encryption (no ratchet established yet); `AgentRatchetKey` uses per-queue E2E (can't use ratchet to renegotiate ratchet); `AgentConfirmation` uses double ratchet. PQ support *shrinks* message size budgets (ratchet header + reply link grow with SNTRUP761 keys). `AEvent` is a GADT indexed by `AEntity` — prevents file events on connection entities at the type level. Spans [Agent/Protocol.md](modules/Simplex/Messaging/Agent/Protocol.md) (types, encoding, size budgets), [Agent.md](modules/Simplex/Messaging/Agent.md) (four e2e key states dispatch, message processing). + +- **Ratchet synchronization protocol**: When the double ratchet gets out of sync (backup restoration, message loss), both parties exchange `AgentRatchetKey` messages with fresh DH keys. Role determination uses hash-ordering: `rkHash(k1, k2)` is computed by both sides — the party with the lower hash initializes the receiving ratchet, the other initializes sending and sends EREADY. This breaks symmetry when both parties simultaneously initiate. State machine: `RSOk`/`RSAllowed`/`RSRequired` → generate keys + reply; `RSStarted` → use stored keys; `RSAgreed` → error (reset to `RSRequired`). EREADY carries `lastExternalSndId` so the peer knows which messages used the old ratchet. `checkRatchetKeyHashExists` prevents processing the same key twice. Successful message decryption resets sync state to `RSOk` (the recovery signal). Spans [Agent.md](modules/Simplex/Messaging/Agent.md) (newRatchetKey, ereadyMsg, resetRatchetSync), [Agent/Protocol.md](modules/Simplex/Messaging/Agent/Protocol.md) (AgentRatchetKey type, cryptoErrToSyncState classification). From b9288544ed34994f9d951e56db13ef5d1b6a808c Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 16:46:36 +0000 Subject: [PATCH 25/61] xrcp specs --- spec/modules/Simplex/RemoteControl/Client.md | 65 +++++++++++++++++++ .../Simplex/RemoteControl/Discovery.md | 25 +++++++ .../RemoteControl/Discovery/Multicast.md | 7 ++ .../Simplex/RemoteControl/Invitation.md | 26 ++++++++ spec/modules/Simplex/RemoteControl/Types.md | 31 +++++++++ 5 files changed, 154 insertions(+) create mode 100644 spec/modules/Simplex/RemoteControl/Client.md create mode 100644 spec/modules/Simplex/RemoteControl/Discovery.md create mode 100644 spec/modules/Simplex/RemoteControl/Discovery/Multicast.md create mode 100644 spec/modules/Simplex/RemoteControl/Invitation.md create mode 100644 spec/modules/Simplex/RemoteControl/Types.md diff --git a/spec/modules/Simplex/RemoteControl/Client.md b/spec/modules/Simplex/RemoteControl/Client.md new file mode 100644 index 000000000..55fd05bc1 --- /dev/null +++ b/spec/modules/Simplex/RemoteControl/Client.md @@ -0,0 +1,65 @@ +# Simplex.RemoteControl.Client + +> XRCP session establishment: controller-host handshake with KEM hybrid key exchange, multicast discovery, and session encryption. + +**Source**: [`RemoteControl/Client.hs`](../../../../../../src/Simplex/RemoteControl/Client.hs) + +## Overview + +This module implements the two sides of the XRCP remote control protocol: the **controller** side (`connectRCHost`) and the **host** side (`connectRCCtrl`). The naming follows [Types.md](./Types.md) — "host" means connecting **to** the host (controller's perspective). + +The handshake is a multi-step flow using `RCStepTMVar` — a `TMVar (Either RCErrorType a)` that allows each phase to be observed and controlled by the application. The application receives the session code (TLS channel binding) for user verification before the session proceeds. + +## Handshake flow + +1. **Controller** starts TLS server, creates invitation with ephemeral session key + DH key + identity key +2. **Host** connects via TLS (with mutual certificate authentication), receives invitation out-of-band or via multicast +3. **Host** sends `RCHostEncHello`: ephemeral DH public key + nonce + encrypted hello body (containing KEM public key, CA fingerprint, app info) +4. **Controller** decrypts hello, verifies CA fingerprint matches TLS certificate, performs KEM encapsulation, derives hybrid key (DH + KEM), sends `RCCtrlEncHello` with KEM ciphertext + encrypted response +5. **Host** decrypts with KEM hybrid key, session established with `TSbChainKeys` + +## KEM hybrid key derivation + +The session key combines DH and post-quantum KEM via `kemHybridSecret`: `SHA3_256(dhSecret || kemSharedKey)`. This is used to initialize `sbcInit` chain keys. The chain keys are **swapped** between controller and host — `prepareCtrlSession` explicitly calls `swap` on the `sbcInit` result so that the controller's send key matches the host's receive key. + +## Two-phase session with user confirmation + +`connectRCCtrl` (host side) splits the session into two phases via `confirmSession` TMVar: + +1. TLS connection established → first `RCStepTMVar` resolved with session code +2. Application displays session code for user verification → calls `confirmCtrlSession` with `True`/`False` +3. If confirmed, `runSession` proceeds with hello exchange → second `RCStepTMVar` resolved with session + +`confirmCtrlSession` does a double `putTMVar` — the first signals the decision, the second blocks until the session thread does `takeTMVar` (synchronization point). See TODO in source: no timeout on this wait. + +## TLS hooks — single-session enforcement + +`tlsHooks` on the controller side enforces at most one TLS session: `onNewHandshake` checks if the result TMVar is still empty (`isNothing <$> tryReadTMVar r`). A second TLS connection attempt is rejected because `r` is already filled. Similarly, `onClientCertificate` validates the host's CA certificate chain (must be exactly 2 certs: leaf + CA) and checks the CA fingerprint against the known host pairing. + +## Multicast discovery — prevDhPrivKey fallback + +`findRCCtrlPairing` tries to decrypt the multicast announcement with each known pairing's current DH key, falling back to `prevDhPrivKey` if present. This handles the case where the host rotated its DH key (in `updateCtrlPairing` during `connectRCCtrl`) but the controller still has the old public key — the announcement is encrypted with the host's old DH public key, so the host needs its old private key to decrypt. + +`discoverRCCtrl` wraps this in a 30-second timeout (`timeoutThrow RCENotDiscovered 30000000`) and an error-recovery loop — failed decryption attempts are logged and retried rather than aborting discovery. + +After decryption, the invitation's `dh` field is verified against the announcement's `dhPubKey` to prevent a relay attack where someone re-encrypts a legitimate invitation with a different DH key. + +## announceRC — fire-and-forget loop + +Sends the signed invitation encrypted to the known host's DH key, repeated `maxCount` times (default 60) with 1-second intervals via UDP multicast. The announcement is padded to `encInvitationSize` (900 bytes). The announcer runs as a separate async that is cancelled when the session is established (`uninterruptibleCancel` in `runSession`). + +## Session encryption — no padding + +`rcEncryptBody` / `rcDecryptBody` use `sbEncryptTailTagNoPad` / `sbDecryptTailTagNoPad` — lazy streaming encryption without padding. This is for application-level data after the handshake, where message sizes are variable and padding would be wasteful. The auth tag is appended at the tail (not prepended). + +## putRCError — error propagation to TMVar + +`putRCError` is an error combinator that catches all errors from an `ExceptT` action and writes them to the step TMVar before re-throwing. This ensures the application observes the error via the TMVar even if the async thread terminates. Uses `tryPutTMVar` (not `putTMVar`) so the TMVar write is idempotent — if already filled, the write is skipped, but the error is still re-thrown via `throwE`. + +## Asymmetric hello encryption + +The two directions of the hello exchange use different encryption primitives. The host encrypts `RCHostEncHello` with `cbEncrypt` using the DH shared key directly (classical DH only). The controller encrypts `RCCtrlEncHello` with `sbEncrypt` using a key derived from `sbcHkdf` on the KEM-hybrid chain key (post-quantum protected). This asymmetry means the host's initial hello is only protected by classical DH, while the controller's response has post-quantum protection. + +## Packet framing + +`sendRCPacket` / `receiveRCPacket` use fixed-size 16384-byte blocks with `C.pad`/`C.unPad` (2-byte length prefix + '#' padding). The hello exchange uses a smaller 12288-byte block size (`helloBlockSize`) for the encrypted hello bodies within the padded packet. diff --git a/spec/modules/Simplex/RemoteControl/Discovery.md b/spec/modules/Simplex/RemoteControl/Discovery.md new file mode 100644 index 000000000..52c861c79 --- /dev/null +++ b/spec/modules/Simplex/RemoteControl/Discovery.md @@ -0,0 +1,25 @@ +# Simplex.RemoteControl.Discovery + +> Network discovery: local address enumeration, multicast group management, and TLS server startup. + +**Source**: [`RemoteControl/Discovery.hs`](../../../../../../src/Simplex/RemoteControl/Discovery.hs) + +## getLocalAddress — filtered interface enumeration + +Enumerates network interfaces and filters out non-routable addresses (0.0.0.0, broadcast, link-local 169.254.x.x). Results are sorted: `mkLastLocalHost` moves localhost (127.x.x.x) to the end. If a preferred address is provided, `preferAddress` moves the matching entry to the front — matches by address first, falling back to interface name. + +## Multicast subscriber counting + +`joinMulticast` / `partMulticast` use a shared `TMVar Int` counter to track active listeners. Multicast group membership is per-host (not per-process — see comment in Multicast.hsc), so the counter ensures `IP_ADD_MEMBERSHIP` is called only when transitioning from 0→1 listeners and `IP_DROP_MEMBERSHIP` only when transitioning from 1→0. If `setMembership` fails, the counter is restored to its previous value and the error is logged (not thrown). + +**TMVar hazard**: Both functions take the counter from the TMVar unconditionally but only put it back in the 0-or-1 branches. If `joinMulticast` is called when the counter is already >0, or `partMulticast` when >1, the TMVar is left empty and subsequent accesses will deadlock. In practice this is safe because `withListener` serializes access through a single `TMVar Int`, but the abstraction does not protect against concurrent use. + +## startTLSServer — ephemeral port support + +When `port_` is `Nothing`, passes `"0"` to `startTCPServer`, which causes the OS to assign an ephemeral port. The assigned port is read via `socketPort` and communicated back through the `startedOnPort` TMVar. On any startup error, `setPort Nothing` is signalled so callers don't block indefinitely on the TMVar. + +The TLS server requires client certificates (`serverWantClientCert = True`) and delegates certificate validation to the caller-provided `TLS.ServerHooks`. + +## withListener — bracket with subscriber tracking + +`openListener` increments the multicast subscriber counter; `closeListener` decrements it in a `finally` block (ensuring cleanup even on exception). The `UDP.stop` call that closes the socket runs after the multicast part — if `partMulticast` fails, the socket is still closed. diff --git a/spec/modules/Simplex/RemoteControl/Discovery/Multicast.md b/spec/modules/Simplex/RemoteControl/Discovery/Multicast.md new file mode 100644 index 000000000..97b9886b8 --- /dev/null +++ b/spec/modules/Simplex/RemoteControl/Discovery/Multicast.md @@ -0,0 +1,7 @@ +# Simplex.RemoteControl.Discovery.Multicast + +> FFI binding for IPv4 multicast group membership via `setsockopt`. + +**Source**: [`Discovery/Multicast.hsc`](../../../../../../../src/Simplex/RemoteControl/Discovery/Multicast.hsc) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/RemoteControl/Invitation.md b/spec/modules/Simplex/RemoteControl/Invitation.md new file mode 100644 index 000000000..3f65ec46c --- /dev/null +++ b/spec/modules/Simplex/RemoteControl/Invitation.md @@ -0,0 +1,26 @@ +# Simplex.RemoteControl.Invitation + +> XRCP invitation creation, dual-signature scheme, and URI encoding. + +**Source**: [`RemoteControl/Invitation.hs`](../../../../../../src/Simplex/RemoteControl/Invitation.hs) + +## Dual-signature chain + +`signInvitation` applies two Ed25519 signatures in a specific order that creates a chain: + +1. `ssig` signs the invitation URI with the **session** private key +2. `idsig` signs the URI **with `ssig` appended** using the **identity** private key + +Verification in `verifySignedInvitation` mirrors this: `ssig` is verified against the bare URI, `idsig` against the URI+ssig concatenation. This chain means `idsig` covers both the invitation content and the session key's signature — a compromised session key cannot forge an identity-valid invitation. + +## Invitation URI format + +The `xrcp:/` scheme uses the SMP-style pattern: CA fingerprint as userinfo (`ca@host:port`), query parameters after `#/?`. The `app` field is raw JSON encoded in a query parameter. `RCInvitation`'s parser uses `parseSimpleQuery` + `lookup` (order-independent), but `RCSignedInvitation`'s parser uses `B.breakSubstring "&ssig="` which assumes the signatures appear at a fixed position — see TODO in source on `RCSignedInvitation`'s `strP`. + +## RCVerifiedInvitation — newtype trust boundary + +`RCVerifiedInvitation` is a newtype wrapper. The constructor is exported (via `RCVerifiedInvitation (..)`), so it can be constructed without validation — the trust boundary is conventional, not enforced by the type system. `verifySignedInvitation` is the intended smart constructor. [Client.hs](./Client.md) accepts only `RCVerifiedInvitation` for `connectRCCtrl`. + +## RCEncInvitation — multicast envelope + +`RCEncInvitation` wraps a signed invitation for UDP multicast: ephemeral DH public key + nonce + encrypted body. The encryption uses a DH shared secret between the host's DH public key (known to the controller from the pairing) and the controller's ephemeral DH private key. Uses `Tail` encoding for the ciphertext (no length prefix — consumes remaining bytes). diff --git a/spec/modules/Simplex/RemoteControl/Types.md b/spec/modules/Simplex/RemoteControl/Types.md new file mode 100644 index 000000000..ad165f442 --- /dev/null +++ b/spec/modules/Simplex/RemoteControl/Types.md @@ -0,0 +1,31 @@ +# Simplex.RemoteControl.Types + +> Type definitions for the XRCP remote control protocol: pairing records, session state, hello messages, and error taxonomy. + +**Source**: [`RemoteControl/Types.hs`](../../../../../../src/Simplex/RemoteControl/Types.hs) + +## Overview + +This module defines the data types for the XRCP (remote control) protocol, which connects a "host" (mobile device) to a "controller" (desktop). Key architectural point: the naming is from the **controller's perspective** — the controller connects to the host, so: +- `RCHostPairing` / `RCHostSession` are the controller-side records (connecting **to** the host) +- `RCCtrlPairing` / `RCCtrlSession` are the host-side records (connecting **from** the controller) + +## Asymmetric pairing records + +`RCHostPairing` (controller side) stores the CA key pair (private key + certificate), identity private key, and optionally a `KnownHostPairing` (fingerprint + last DH public key of the host). `RCCtrlPairing` (host side) stores the CA key pair, controller's fingerprint and identity public key, current DH private key, and `prevDhPrivKey` — the previous DH key retained so that announcements encrypted with the old key can still be decrypted during key rotation. + +## Asymmetric session keys + +`HostSessKeys` stores private keys (identity + session) — the controller needs to sign commands. `CtrlSessKeys` stores public keys (identity + session) — the host needs to verify commands. Both store `TSbChainKeys` for the symmetric session encryption, but note that the chain key direction is swapped between the two sides (see `prepareCtrlSession` in [Client.md](./Client.md)). + +## RCCtrlEncHello — two variants + +`RCCtrlEncHello` is a sum type with two variants: `RCCtrlEncHello` (success: KEM ciphertext + encrypted hello body) and `RCCtrlEncError` (failure: nonce + encrypted error message). The error variant uses the original DH shared key for encryption (not the KEM hybrid key), since the error occurs before KEM exchange completes. + +## AnyError instance — TLS UnknownCa promotion + +`fromSomeException` promotes TLS `Terminated` / `Error_Protocol` / `UnknownCa` to `RCEIdentity` rather than the generic `RCEException`. This maps a TLS-level certificate rejection (either side's CA not recognized by the peer) to a meaningful XRCP error. + +## IpProbe — unused discovery type + +`IpProbe` is defined with `Encoding` instance but not used anywhere in the current codebase. It appears to be a placeholder for a planned IP discovery mechanism. Note: the `smpP` parser has a precedence bug — `IpProbe <$> (smpP <* "I") *> smpP` parses as `(IpProbe <$> (smpP <* "I")) *> smpP`, which discards the `IpProbe` wrapper. This has never manifested because the type is unused. From 3bde77da10a428f66264692f76a7b604aa23bfc7 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 17:17:31 +0000 Subject: [PATCH 26/61] update terms --- spec/TOPICS.md | 20 ++++++------- spec/modules/Simplex/FileTransfer/Agent.md | 18 +++++------ spec/modules/Simplex/FileTransfer/Client.md | 8 ++--- .../Simplex/FileTransfer/Client/Agent.md | 6 ++-- .../Simplex/FileTransfer/Client/Main.md | 18 +++++------ .../Simplex/FileTransfer/Description.md | 6 ++-- spec/modules/Simplex/FileTransfer/Protocol.md | 2 +- spec/modules/Simplex/FileTransfer/Server.md | 28 ++++++++--------- .../Simplex/FileTransfer/Server/Env.md | 6 ++-- .../Simplex/FileTransfer/Server/Main.md | 8 ++--- .../Simplex/FileTransfer/Server/Stats.md | 6 ++-- .../Simplex/FileTransfer/Server/Store.md | 2 +- .../Simplex/FileTransfer/Server/StoreLog.md | 6 ++-- spec/modules/Simplex/FileTransfer/Types.md | 2 +- spec/modules/Simplex/Messaging/Agent.md | 2 +- .../modules/Simplex/Messaging/Agent/Client.md | 30 +++++++++---------- .../Simplex/Messaging/Agent/Env/SQLite.md | 2 +- .../Messaging/Agent/NtfSubSupervisor.md | 10 +++---- .../Simplex/Messaging/Agent/Protocol.md | 4 +-- spec/modules/Simplex/Messaging/Agent/Stats.md | 2 +- .../Messaging/Notifications/Protocol.md | 6 ++-- .../Simplex/Messaging/Notifications/Server.md | 28 ++++++++--------- .../Messaging/Notifications/Server/Control.md | 2 +- .../Messaging/Notifications/Server/Env.md | 6 ++-- .../Messaging/Notifications/Server/Main.md | 2 +- .../Notifications/Server/Push/APNS.md | 2 +- .../Messaging/Notifications/Server/Stats.md | 14 ++++----- .../Messaging/Notifications/Server/Store.md | 4 +-- .../Notifications/Server/Store/Postgres.md | 8 ++--- .../Messaging/Notifications/Transport.md | 4 +-- .../Simplex/Messaging/Notifications/Types.md | 2 +- 31 files changed, 132 insertions(+), 132 deletions(-) diff --git a/spec/TOPICS.md b/spec/TOPICS.md index 0489d79df..a62e23c29 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -2,7 +2,7 @@ > Cross-cutting patterns noticed during module documentation. Each entry may become a topic doc in `spec/` after all module docs are complete. -- **Exception handling strategy**: `catchOwn`/`catchAll`/`tryAllErrors` pattern (defined in Util.hs) used across server, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site. +- **Exception handling strategy**: `catchOwn`/`catchAll`/`tryAllErrors` pattern (defined in Util.hs) used across router, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site. - **Padding schemes**: Three different padding formats across the codebase — Crypto.hs uses 2-byte Word16 length prefix (max ~65KB), Crypto/Lazy.hs uses 8-byte Int64 prefix (file-sized), and both use '#' fill character. Ratchet header padding uses fixed sizes (88 or 2310 bytes). All use `pad`/`unPad` but with incompatible formats. The relationship between padding, encryption, and message size limits spans Crypto, Lazy, Ratchet, and the protocol layer. @@ -16,17 +16,17 @@ - **Service certificate subscription model**: Service subscriptions (SUBS/NSUBS) and per-queue subscriptions (SUB/NSUB) coexist with complex state transitions. Client/Agent.hs manages dual active/pending subscription maps with session-aware cleanup. Protocol.hs defines useServiceAuth (only NEW/SUB/NSUB). Client.hs implements authTransmission with dual signing (entity key over cert hash + transmission, service key over transmission only). Transport.hs handles the service certificate handshake extension (v16+). The full subscription lifecycle — from DBService credentials through handshake to service subscription to disconnect/reconnect — spans all four modules. -- **Two agent layers**: Client/Agent.hs ("small agent") is used only in servers — SMP proxy and notification server — to manage client connections to other SMP servers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences. +- **Two agent layers**: Client/Agent.hs ("small agent") is used only in routers — SMP proxy and notification router — to manage client connections to other SMP routers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences. - **Handshake protocol family**: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols. -- **Server subscription architecture**: The SMP server's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module. +- **Router subscription architecture**: The SMP router's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module. - **Duplex connection handshake**: The SMP duplex connection procedure (standard 10-step and fast 7-step) spans Agent.hs (orchestration, state machine), Agent/Protocol.hs (message types: AgentConfirmation/AgentConnInfoReply/AgentInvitation/HELLO, queue status types), Client.hs (SMP command dispatch), Protocol.hs (SMP-level KEY/SKEY commands). The handshake involves two-layer encryption (per-queue E2E + double ratchet), version-dependent paths (v2+ duplex, v6+ sender auth key, v7+ ratchet on confirmation, v9+ fast handshake with SKEY), and the asymmetry between initiating and accepting parties (different message types, different confirmation processing). The protocol spec (`agent-protocol.md`) defines the procedure but the implementation details — error handling, state persistence across restarts, race conditions between confirmation and message delivery — are only visible by reading the code across these modules. - **Connection links**: Full connection links (URI format with `#/?` query parameters) and binary-encoded links (`Encoding` instances) serve different contexts — URIs for out-of-band sharing, binary for agent-to-agent messages. Each has independent version-conditional encoding with different backward-compat rules (URI parser adjusts agent version ranges for old contact links, binary parser patches `queueMode` for forward compat). The `VersionI`/`VersionRangeI` typeclasses convert between `SMPQueueInfo` (versioned, in confirmations) and `SMPQueueUri` (version-ranged, in links). Full picture requires Agent/Protocol.hs, Protocol.hs, and agent-protocol.md. -- **Short links**: Short links are a compact representation for sharing via URLs, not a replacement for full connection links — both are used. Short links store encrypted link data on the router and encode only a server hostname, link type character, and key hash in the URL. The link data lifecycle (creation, encryption with key derivation, owner chain-of-trust validation, mutable user data updates) spans Agent/Protocol.hs (types, serialization, owner validation, server shortening/restoration), Agent.hs (link creation and resolution API), and the router-side link storage. The `FixedLinkData`/`ConnLinkData` split (immutable vs mutable), `OwnerAuth` chain validation, and `PreparedLinkParams` pre-computation are not visible from any single module. +- **Short links**: Short links are a compact representation for sharing via URLs, not a replacement for full connection links — both are used. Short links store encrypted link data on the router and encode only a router hostname, link type character, and key hash in the URL. The link data lifecycle (creation, encryption with key derivation, owner chain-of-trust validation, mutable user data updates) spans Agent/Protocol.hs (types, serialization, owner validation, router shortening/restoration), Agent.hs (link creation and resolution API), and the router-side link storage. The `FixedLinkData`/`ConnLinkData` split (immutable vs mutable), `OwnerAuth` chain validation, and `PreparedLinkParams` pre-computation are not visible from any single module. - **Agent worker framework**: `getAgentWorker` (lifecycle, restart rate limiting, crash recovery) + `withWork`/`withWork_`/`withWorkItems` (task retrieval with doWork flag atomics) defined in Agent/Client.hs, consumed by Agent.hs (async commands, message delivery), NtfSubSupervisor.hs (notification workers), FileTransfer/Agent.hs (XFTP workers), and simplex-chat. The framework separates two concerns: worker lifecycle (create-or-reuse, fork async, rate-limit restarts, escalate to CRITICAL) and task pattern (get next task, do task, as separate parameters). The doWork TMVar flag choreography (clear before query to prevent race) and the work-item-error vs store-error distinction are not obvious from any single consumer. @@ -40,19 +40,19 @@ - **NTF push delivery pipeline**: Bounded TBQueue (`pushQ`) creates backpressure → `ntfPush` thread reads → `checkActiveTkn` gates PNMessage (but not PNVerification or PNCheckMessages) → APNS delivery with single retry on connection errors (new push client on retry) → PPTokenInvalid marks token NTInvalid. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [APNS.hs](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) (DER JWT signing, HTTP/2 serializing queue, fire-and-forget connection), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (push client caching with race tolerance). -- **NTF service subscription model**: Service-level subscriptions (SUBS/NSUBS on SMP) vs individual queue subscriptions, with fallback from service to individual when `CAServiceUnavailable`. Service credentials are lazily generated per SMP server with 25h backdating and ~2700yr validity. XOR hash triggers on PostgreSQL maintain subscription aggregate counts. Subscription status tracking uses `ntf_service_assoc` flag to distinguish service-associated from individually-subscribed queues. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (subscriber thread, service fallback), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (lazy credential generation, Weak ThreadId subscriber cleanup), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (XOR hash triggers, batch status updates, cursor-based pagination). +- **NTF service subscription model**: Service-level subscriptions (SUBS/NSUBS on SMP) vs individual queue subscriptions, with fallback from service to individual when `CAServiceUnavailable`. Service credentials are lazily generated per SMP router with 25h backdating and ~2700yr validity. XOR hash triggers on PostgreSQL maintain subscription aggregate counts. Subscription status tracking uses `ntf_service_assoc` flag to distinguish service-associated from individually-subscribed queues. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (subscriber thread, service fallback), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (lazy credential generation, Weak ThreadId subscriber cleanup), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (XOR hash triggers, batch status updates, cursor-based pagination). -- **NTF startup resubscription**: `resubscribe` runs as detached `forkIO` (not in `raceAny_` group), uses `mapConcurrently` across SMP servers, each with `subscribeLoop` using 100x database batch multiplier and cursor-based pagination. `ExitCode` exceptions from `exitFailure` on DB error propagate to main thread despite `forkIO`. `getServerNtfSubscriptions` claims subscriptions by batch-updating to `NSPending`. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md). +- **NTF startup resubscription**: `resubscribe` runs as detached `forkIO` (not in `raceAny_` group), uses `mapConcurrently` across SMP routers, each with `subscribeLoop` using 100x database batch multiplier and cursor-based pagination. `ExitCode` exceptions from `exitFailure` on DB error propagate to main thread despite `forkIO`. `getServerNtfSubscriptions` claims subscriptions by batch-updating to `NSPending`. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md). -- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-server chunk creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-server upload (command + file body in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). +- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-router chunk creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-router upload (command + file body in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). -- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-server chunk download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, chunk-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with server grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). +- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-router chunk download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, chunk-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with router grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). - **XFTP handshake state machine**: Three-state session-cached handshake (`No entry` → `HandshakeSent` → `HandshakeAccepted`) per HTTP/2 session. Web clients use `xftp-web-hello` header and challenge-response identity proof; native clients use standard ALPN. SNI presence gates CORS headers, web serving, and SESSION error for unrecognized connections. Key reuse on re-hello preserves existing DH keys. Spans [Server.hs](modules/Simplex/FileTransfer/Server.md) (handshake logic, CORS, web serving), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ALPN selection, cert chain validation), [Transport.hs](modules/Simplex/FileTransfer/Transport.md) (block size, version). - **XFTP storage lifecycle**: Quota reservation via atomic `stateTVar` before upload → rollback on failure (subtract + delete partial file) → physical file deleted before store cleanup (crash risk: store references missing file) → `RoundedSystemTime 3600` for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between files) → startup storage reconciliation (override stats from live store). Spans [Server.hs](modules/Simplex/FileTransfer/Server.md), [Server/Store.hs](modules/Simplex/FileTransfer/Server/Store.md), [Server/Env.hs](modules/Simplex/FileTransfer/Server/Env.md), [Server/StoreLog.hs](modules/Simplex/FileTransfer/Server/StoreLog.md) (error-resilient replay, compaction). -- **XFTP worker architecture**: Five worker types in three categories: rcv (per-server download + local decryption), snd (local prepare/encrypt + per-server upload), del (per-server delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every chunk operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). +- **XFTP worker architecture**: Five worker types in three categories: rcv (per-router download + local decryption), snd (local prepare/encrypt + per-router upload), del (per-router delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every chunk operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). - **SessionVar protocol client lifecycle**: Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern: `getSessVar` atomically checks TMap → `newProtocolClient` fills TMVar on success/failure → `waitForProtocolClient` reads with timeout. Error caching via `persistErrorInterval` prevents connection storms (failed connections cache the error with expiry; callers receive cached error without reconnecting). `removeSessVar` uses monotonic `sessionVarId` compare-and-swap to prevent stale disconnect callbacks from removing newer clients. SMP has additional complexity: `SMPConnectedClient` wraps client with per-connection proxied relay map, `updateClientService` synchronizes service credentials post-connect, disconnect callback moves subscriptions to pending with session-ID matching. XFTP always uses `NRMBackground` timing regardless of caller request. Spans [Session.md](modules/Simplex/Messaging/Session.md), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (lifecycle, disconnect callbacks, reconnection workers), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop consuming events). @@ -60,7 +60,7 @@ - **Deferred message encryption**: Message bodies are NOT encrypted at enqueue time. `enqueueMessageB` advances the ratchet header and validates padding, but stores only the body reference (`sndMsgBodyId`) and encryption key. Actual encryption (`rcEncryptMsg`) happens at delivery time in `runSmpQueueMsgDelivery`. This enables body deduplication via `VRValue`/`VRRef` — identical bodies (common for group messages) share one database row, but each connection's delivery encrypts independently with its own ratchet. Confirmation and ratchet key messages bypass deferred encryption (pre-encrypted at enqueue time). Spans [Agent.md](modules/Simplex/Messaging/Agent.md) (enqueue + delivery), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (`snd_message_bodies` storage). -- **NTF agent subscription lifecycle**: The agent-side notification subscription system uses a supervisor-worker architecture with three worker pools (NTF server, SMP server, token deletion). `NSCCreate` triggers a four-way partition (`partitionQueueSubActions`): new sub, reset sub (credential mismatch or null action), continue SMP work, continue NTF work. Workers coordinate with the supervisor via `updated_by_supervisor` flag — workers only update local fields when the flag is set, preventing overwrite of supervisor decisions. The null-action sentinel (`workerErrors` sets action to NULL on permanent failure) bridges worker failure recovery to supervisor-driven re-creation. `retrySubActions` uses a shrinking TVar — each iteration only retries subs with temporary errors, so batches get smaller over time. `rescheduleWork` handles time-scheduled health checks by forking a sleep thread that re-signals `doWork`. Spans [NtfSubSupervisor.md](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) (supervisor, worker pools), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (updated_by_supervisor, null-action sentinel), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (worker framework). +- **NTF agent subscription lifecycle**: The agent-side notification subscription system uses a supervisor-worker architecture with three worker pools (NTF router, SMP router, token deletion). `NSCCreate` triggers a four-way partition (`partitionQueueSubActions`): new sub, reset sub (credential mismatch or null action), continue SMP work, continue NTF work. Workers coordinate with the supervisor via `updated_by_supervisor` flag — workers only update local fields when the flag is set, preventing overwrite of supervisor decisions. The null-action sentinel (`workerErrors` sets action to NULL on permanent failure) bridges worker failure recovery to supervisor-driven re-creation. `retrySubActions` uses a shrinking TVar — each iteration only retries subs with temporary errors, so batches get smaller over time. `rescheduleWork` handles time-scheduled health checks by forking a sleep thread that re-signals `doWork`. Spans [NtfSubSupervisor.md](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) (supervisor, worker pools), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (updated_by_supervisor, null-action sentinel), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (worker framework). - **Session-aware SMP subscription management**: SMP queue subscriptions are tracked per transport session with session-ID validation at multiple points. `subscribeQueues` groups queues by transport session, subscribes concurrently, then validates `activeClientSession` post-RPC — if the client was replaced during the RPC, results are discarded and converted to temporary errors for retry. `removeClientAndSubs` (disconnect cleanup) only demotes subscriptions whose session ID matches the disconnecting client. Batch UP notifications are accumulated across transmissions and deduplicated against already-active subscriptions. When ALL results are temporary errors and no connections were already active, the SMP client is closed to force fresh connection. `maxPending` throttles concurrent pending subscriptions with STM retry backpressure. Spans [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (subscription state, session validation), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop, processSMPTransmissions, UP accumulation). diff --git a/spec/modules/Simplex/FileTransfer/Agent.md b/spec/modules/Simplex/FileTransfer/Agent.md index fd2a361d0..e5f58e996 100644 --- a/spec/modules/Simplex/FileTransfer/Agent.md +++ b/spec/modules/Simplex/FileTransfer/Agent.md @@ -8,15 +8,15 @@ The XFTP agent uses five worker types organized in three categories: -| Worker | Key (server) | Purpose | +| Worker | Key (router) | Purpose | |--------|-------------|---------| -| `xftpRcvWorker` | `Just server` | Download chunks from a specific XFTP server | +| `xftpRcvWorker` | `Just server` | Download chunks from a specific XFTP router | | `xftpRcvLocalWorker` | `Nothing` | Decrypt completed downloads locally | -| `xftpSndPrepareWorker` | `Nothing` | Encrypt files and create chunks on servers | -| `xftpSndWorker` | `Just server` | Upload chunks to a specific XFTP server | -| `xftpDelWorker` | `Just server` | Delete chunks from a specific XFTP server | +| `xftpSndPrepareWorker` | `Nothing` | Encrypt files and create chunks on routers | +| `xftpSndWorker` | `Just server` | Upload chunks to a specific XFTP router | +| `xftpDelWorker` | `Just server` | Delete chunks from a specific XFTP router | -Workers are created on-demand via `getAgentWorker` and keyed by server address. The local workers (keyed by `Nothing`) handle CPU-bound operations that don't require network access. +Workers are created on-demand via `getAgentWorker` and keyed by router address. The local workers (keyed by `Nothing`) handle CPU-bound operations that don't require network access. ## Non-obvious behavior @@ -71,7 +71,7 @@ During upload, `addRecipients` recursively calls itself if a chunk needs more re ### 12. Delete workers skip files older than rcvFilesTTL -`runXFTPDelWorker` uses `rcvFilesTTL` (not a dedicated delete TTL) to filter pending deletions. Files older than this TTL would already be expired on the server, so attempting deletion is pointless. This reuses the receive TTL as a proxy for server-side expiration. +`runXFTPDelWorker` uses `rcvFilesTTL` (not a dedicated delete TTL) to filter pending deletions. Files older than this TTL would already be expired on the router, so attempting deletion is pointless. This reuses the receive TTL as a proxy for router-side expiration. ### 13. closeXFTPAgent atomically swaps worker maps @@ -81,6 +81,6 @@ During upload, `addRecipients` recursively calls itself if a chunk needs more re `assertAgentForeground` both throws if the agent is inactive (`throwWhenInactive`) and blocks until it's in the foreground (`waitUntilForeground`). This is called before every chunk operation to ensure the agent isn't suspended or backgrounded during file transfers. -### 15. Per-server stats tracking +### 15. Per-router stats tracking -Every chunk download, upload, and delete operation increments per-server statistics (`downloads`, `uploads`, `deletions`, `downloadAttempts`, `uploadAttempts`, `deleteAttempts`, and error variants). Size-based stats (`downloadsSize`, `uploadsSize`) track throughput in kilobytes. +Every chunk download, upload, and delete operation increments per-router statistics (`downloads`, `uploads`, `deletions`, `downloadAttempts`, `uploadAttempts`, `deleteAttempts`, and error variants). Size-based stats (`downloadsSize`, `uploadsSize`) track throughput in kilobytes. diff --git a/spec/modules/Simplex/FileTransfer/Client.md b/spec/modules/Simplex/FileTransfer/Client.md index 5cf87a594..27fb50bc3 100644 --- a/spec/modules/Simplex/FileTransfer/Client.md +++ b/spec/modules/Simplex/FileTransfer/Client.md @@ -12,13 +12,13 @@ - **`xftpALPNv1` or `httpALPN11`**: performs v1 handshake with key exchange (`httpALPN11` is used for web port connections) - **No ALPN or unrecognized**: uses legacy v1 transport parameters without handshake -### 2. Server certificate chain validation +### 2. Router certificate chain validation -`xftpClientHandshakeV1` validates the server's identity by checking that the CA fingerprint from the certificate chain matches the expected `keyHash` from the server address. The server signs an authentication public key (X25519) with its long-term key. The client verifies this signature against the certificate chain, then extracts the X25519 key for HMAC-based command authentication. This authentication key is distinct from the per-download ephemeral DH keys. +`xftpClientHandshakeV1` validates the router's identity by checking that the CA fingerprint from the certificate chain matches the expected `keyHash` from the router address. The router signs an authentication public key (X25519) with its long-term key. The client verifies this signature against the certificate chain, then extracts the X25519 key for HMAC-based command authentication. This authentication key is distinct from the per-download ephemeral DH keys. ### 3. Ephemeral DH key pair per download -`downloadXFTPChunk` generates a fresh X25519 key pair for each chunk download. The public key is sent with the FGET command; the server responds with its own ephemeral key. The derived shared secret encrypts the file data in transit. This provides forward secrecy — compromising a past DH key doesn't decrypt other downloads. +`downloadXFTPChunk` generates a fresh X25519 key pair for each chunk download. The public key is sent with the FGET command; the router responds with its own ephemeral key. The derived shared secret encrypts the file data in transit. This provides forward secrecy — compromising a past DH key doesn't decrypt other downloads. ### 4. Chunk-size-proportional download timeout @@ -30,7 +30,7 @@ ### 6. Upload sends file body after command response -`uploadXFTPChunk` sends the FPUT command and file body in the same streaming HTTP/2 request: the protocol command block is sent first, followed immediately by the raw file data via `hSendFile`. The server response (`FROk` or error) is received only after both the command and file body have been fully sent. This is a single HTTP/2 round trip, not a two-phase interaction. +`uploadXFTPChunk` sends the FPUT command and file body in the same streaming HTTP/2 request: the protocol command block is sent first, followed immediately by the raw file data via `hSendFile`. The router response (`FROk` or error) is received only after both the command and file body have been fully sent. This is a single HTTP/2 round trip, not a two-phase interaction. ### 7. Empty corrId as nonce diff --git a/spec/modules/Simplex/FileTransfer/Client/Agent.md b/spec/modules/Simplex/FileTransfer/Client/Agent.md index 6ff1eebb7..c03400d90 100644 --- a/spec/modules/Simplex/FileTransfer/Client/Agent.md +++ b/spec/modules/Simplex/FileTransfer/Client/Agent.md @@ -8,7 +8,7 @@ ### 1. TMVar-based connection sharing -`getXFTPServerClient` first checks the `TMap XFTPServer (TMVar (Either XFTPClientAgentError XFTPClient))`. If no entry exists, it atomically inserts an empty `TMVar` and initiates connection. Other threads requesting the same server block on `readTMVar` until the connection is established or fails. This prevents duplicate connections to the same server. +`getXFTPServerClient` first checks the `TMap XFTPServer (TMVar (Either XFTPClientAgentError XFTPClient))`. If no entry exists, it atomically inserts an empty `TMVar` and initiates connection. Other threads requesting the same router block on `readTMVar` until the connection is established or fails. This prevents duplicate connections to the same router. ### 2. Async retry on temporary errors @@ -20,8 +20,8 @@ On permanent error, `newXFTPClient` puts the `Left error` into the `TMVar` (unbl ### 4. Connection timeout -`waitForXFTPClient` wraps `readTMVar` in a timeout. If the connection establishment takes too long (e.g., server unreachable and retry loop is slow), the caller gets a timeout error rather than blocking indefinitely. The underlying connection attempt continues in the background. +`waitForXFTPClient` wraps `readTMVar` in a timeout. If the connection establishment takes too long (e.g., router unreachable and retry loop is slow), the caller gets a timeout error rather than blocking indefinitely. The underlying connection attempt continues in the background. ### 5. closeXFTPServerClient removes from TMap -Closing a server client deletes its entry from the TMap, so the next request will establish a fresh connection. This is called on connection errors during file operations to force reconnection. +Closing a router client deletes its entry from the TMap, so the next request will establish a fresh connection. This is called on connection errors during file operations to force reconnection. diff --git a/spec/modules/Simplex/FileTransfer/Client/Main.md b/spec/modules/Simplex/FileTransfer/Client/Main.md index 5f7b45af4..abb9eceb5 100644 --- a/spec/modules/Simplex/FileTransfer/Client/Main.md +++ b/spec/modules/Simplex/FileTransfer/Client/Main.md @@ -8,7 +8,7 @@ ### 1. Web URI encoding: base64url(deflate(YAML)) -`encodeWebURI` compresses the YAML-encoded file description with raw DEFLATE, then base64url-encodes the result. `decodeWebURI` reverses this. The compressed description goes in the URL fragment (after `#`), which is never sent to the server — the file description stays client-side. +`encodeWebURI` compresses the YAML-encoded file description with raw DEFLATE, then base64url-encodes the result. `decodeWebURI` reverses this. The compressed description goes in the URL fragment (after `#`), which is never sent to the router — the file description stays client-side. ### 2. CLI receive accepts both file paths and URLs @@ -18,17 +18,17 @@ `receive` tracks a `depth` parameter starting at 1. After following one redirect, `depth` becomes 0. A second redirect throws "Redirect chain too long". This prevents infinite redirect loops from malicious file descriptions. -### 4. Parallel chunk uploads with server grouping +### 4. Parallel chunk uploads with router grouping -`uploadFile` groups chunks by server via `groupAllOn`, then uses `pooledForConcurrentlyN 16` to process up to 16 server-groups concurrently. Within each group, chunks are uploaded sequentially (`mapM`). Errors from any chunk are collected and the first one is thrown. +`uploadFile` groups chunks by router via `groupAllOn`, then uses `pooledForConcurrentlyN 16` to process up to 16 router-groups concurrently. Within each group, chunks are uploaded sequentially (`mapM`). Errors from any chunk are collected and the first one is thrown. -### 5. Random server selection +### 5. Random router selection -`getXFTPServer` selects a random server from the provided list for each chunk. With a single server, it's deterministic. With multiple servers, it uses `StdGen` in a TVar for thread-safe random selection via `stateTVar`. +`getXFTPServer` selects a random router from the provided list for each chunk. With a single router, it's deterministic. With multiple routers, it uses `StdGen` in a TVar for thread-safe random selection via `stateTVar`. ### 6. withReconnect nests retry with reconnection -`withReconnect` wraps `withRetry` twice: the outer retry reconnects to the server, and the inner operation runs against the connection. On failure, the server connection is explicitly closed before retrying, forcing a fresh connection on the next attempt. +`withReconnect` wraps `withRetry` twice: the outer retry reconnects to the router, and the inner operation runs against the connection. On failure, the router connection is explicitly closed before retrying, forcing a fresh connection on the next attempt. ### 7. withRetry rejects zero retries @@ -36,8 +36,8 @@ ### 8. File description auto-deletion prompt -After successful receive or delete, `removeFD` either auto-deletes the file description (if `--yes` flag) or prompts the user. This prevents accidental reuse of one-time file descriptions — each receive consumes the description by ACKing chunks on the server. +After successful receive or delete, `removeFD` either auto-deletes the file description (if `--yes` flag) or prompts the user. This prevents accidental reuse of one-time file descriptions — each receive consumes the description by ACKing chunks on the router. -### 9. Sender description uses first replica's server +### 9. Sender description uses first replica's router -`createSndFileDescription` takes the server from the first replica of each chunk for the sender's `FileChunkReplica`. This reflects the current limitation that each chunk is uploaded to exactly one server — the sender description records that single server. +`createSndFileDescription` takes the router from the first replica of each chunk for the sender's `FileChunkReplica`. This reflects the current limitation that each chunk is uploaded to exactly one router — the sender description records that single router. diff --git a/spec/modules/Simplex/FileTransfer/Description.md b/spec/modules/Simplex/FileTransfer/Description.md index b4c7e2fe9..0edd0bee8 100644 --- a/spec/modules/Simplex/FileTransfer/Description.md +++ b/spec/modules/Simplex/FileTransfer/Description.md @@ -22,9 +22,9 @@ When encoding chunks to YAML via `unfoldChunksToReplicas`, the `digest` and non- The top-level `FileDescription` has a `chunkSize` field. Individual chunk replicas only serialize their `chunkSize` if it differs from this default. This saves space in the common case where most chunks are the same size (only the last chunk may be smaller). -### 4. YAML encoding groups replicas by server +### 4. YAML encoding groups replicas by router -`groupReplicasByServer` groups all chunk replicas by their server, producing `FileServerReplica` records. This is the serialization format — replicas are organized by server, not by chunk. The parser (`foldReplicasToChunks`) reverses this grouping back to per-chunk replica lists. +`groupReplicasByServer` groups all chunk replicas by their router, producing `FileServerReplica` records. This is the serialization format — replicas are organized by router, not by chunk. The parser (`foldReplicasToChunks`) reverses this grouping back to per-chunk replica lists. ### 5. FileDescriptionURI uses query-string encoding @@ -40,4 +40,4 @@ Two limits exist: `maxFileSize = 1GB` (soft limit, checked by CLI client) and `m ### 8. Redirect file descriptions -A `FileDescription` can contain a `redirect` field pointing to another file's metadata (`RedirectFileInfo` with size and digest). The outer description downloads an encrypted YAML file that, once decrypted, yields the actual `FileDescription` for the real file. This adds one level of indirection for privacy — the relay servers hosting the redirect don't know the actual file's servers. +A `FileDescription` can contain a `redirect` field pointing to another file's metadata (`RedirectFileInfo` with size and digest). The outer description downloads an encrypted YAML file that, once decrypted, yields the actual `FileDescription` for the real file. This adds one level of indirection for privacy — the relay routers hosting the redirect don't know the actual file's routers. diff --git a/spec/modules/Simplex/FileTransfer/Protocol.md b/spec/modules/Simplex/FileTransfer/Protocol.md index f31c90561..4bbcb8726 100644 --- a/spec/modules/Simplex/FileTransfer/Protocol.md +++ b/spec/modules/Simplex/FileTransfer/Protocol.md @@ -29,7 +29,7 @@ Even for single transmissions, `xftpEncodeBatch1` wraps the encoded transmission ### 5. FileParty GADT partitions command space -Commands are indexed by `FileParty` (`SFSender` / `SFRecipient`) at the type level via `FileCmd`. This ensures at compile time that sender commands (FNEW, FADD, FPUT, FDEL) and recipient commands (FGET, FACK, PING) cannot be confused. The server pattern-matches on `SFileParty` to determine which index (sender vs recipient) to look up in the file store. +Commands are indexed by `FileParty` (`SFSender` / `SFRecipient`) at the type level via `FileCmd`. This ensures at compile time that sender commands (FNEW, FADD, FPUT, FDEL) and recipient commands (FGET, FACK, PING) cannot be confused. The router pattern-matches on `SFileParty` to determine which index (sender vs recipient) to look up in the file store. ### 6. Empty corrId and implicit session ID diff --git a/spec/modules/Simplex/FileTransfer/Server.md b/spec/modules/Simplex/FileTransfer/Server.md index f3a01314d..99e17a427 100644 --- a/spec/modules/Simplex/FileTransfer/Server.md +++ b/spec/modules/Simplex/FileTransfer/Server.md @@ -1,16 +1,16 @@ # Simplex.FileTransfer.Server -> XFTP server: HTTP/2 request handling, handshake state machine, file operations, and statistics. +> XFTP router: HTTP/2 request handling, handshake state machine, file operations, and statistics. **Source**: [`FileTransfer/Server.hs`](../../../../src/Simplex/FileTransfer/Server.hs) ## Architecture -The XFTP server runs several concurrent threads via `raceAny_`: +The XFTP router runs several concurrent threads via `raceAny_`: | Thread | Purpose | |--------|---------| -| `runServer` | HTTP/2 server accepting file transfer requests | +| `runServer` | HTTP/2 router accepting file transfer requests | | `expireFiles` | Periodic file expiration with throttling | | `logServerStats` | Periodic stats flush to CSV | | `savePrometheusMetrics` | Periodic Prometheus metrics dump | @@ -20,24 +20,24 @@ The XFTP server runs several concurrent threads via `raceAny_`: ### 1. Three-state handshake with session caching -The server maintains a `TMap SessionId Handshake` with three states: -- **No entry**: first request — for non-SNI or `xftp-web-hello` requests, `processHello` generates DH key pair and sends server handshake; for SNI requests without `xftp-web-hello`, returns `SESSION` error -- **`HandshakeSent pk`**: server hello sent, waiting for client handshake with version negotiation +The router maintains a `TMap SessionId Handshake` with three states: +- **No entry**: first request — for non-SNI or `xftp-web-hello` requests, `processHello` generates DH key pair and sends router handshake; for SNI requests without `xftp-web-hello`, returns `SESSION` error +- **`HandshakeSent pk`**: router hello sent, waiting for client handshake with version negotiation - **`HandshakeAccepted thParams`**: handshake complete, subsequent requests use cached params -Web clients can re-send hello (`xftp-web-hello` header) even in `HandshakeSent` or `HandshakeAccepted` states — the server reuses the existing private key rather than generating a new one. +Web clients can re-send hello (`xftp-web-hello` header) even in `HandshakeSent` or `HandshakeAccepted` states — the router reuses the existing private key rather than generating a new one. ### 2. Web identity proof via challenge-response -When a web client sends a hello with a non-empty body, the server parses an `XFTPClientHello` containing a `webChallenge`. The server signs `challenge <> sessionId` with its long-term key and includes the signature in the handshake response. This proves server identity to web clients that cannot verify TLS certificates directly. +When a web client sends a hello with a non-empty body, the router parses an `XFTPClientHello` containing a `webChallenge`. The router signs `challenge <> sessionId` with its long-term key and includes the signature in the handshake response. This proves router identity to web clients that cannot verify TLS certificates directly. ### 3. skipCommitted drains request body on re-upload -If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is `Just`), it cannot simply ignore the request body — the HTTP/2 client would block waiting for the server to consume it. Instead, `skipCommitted` reads and discards the entire body in `fileBlockSize` increments, returning `FROk` when complete. This makes FPUT idempotent from the client's perspective. +If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is `Just`), it cannot simply ignore the request body — the HTTP/2 client would block waiting for the router to consume it. Instead, `skipCommitted` reads and discards the entire body in `fileBlockSize` increments, returning `FROk` when complete. This makes FPUT idempotent from the client's perspective. ### 4. Atomic quota reservation with rollback -`receiveServerFile` uses `stateTVar` to atomically check and reserve storage quota before receiving the file. If the upload fails (timeout, size mismatch, IO error), the reserved size is subtracted from `usedStorage` and the partial file is deleted. This prevents failed uploads from permanently consuming quota. +`receiveServerFile` uses `stateTVar` to atomically check and reserve storage quota before receiving the file. If the upload fails (timeout, size mismatch, IO error), the reserved size is subtracted from `usedStorage` and the partial file is deleted on the router. This prevents failed uploads from permanently consuming quota. ### 5. retryAdd generates new IDs on collision @@ -45,7 +45,7 @@ If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is ### 6. Timing attack mitigation on entity lookup -`verifyXFTPTransmission` calls `dummyVerifyCmd` (imported from SMP server) when a file entity is not found. This equalizes response timing to prevent attackers from distinguishing "entity doesn't exist" from "signature invalid" based on latency. +`verifyXFTPTransmission` calls `dummyVerifyCmd` (imported from SMP router) when a file entity is not found. This equalizes response timing to prevent attackers from distinguishing "entity doesn't exist" from "signature invalid" based on latency. ### 7. BLOCKED vs EntityOff distinction @@ -62,11 +62,11 @@ Despite the name suggesting it only marks a file as blocked, `blockServerFile` a ### 9. Stats restore overrides counts from live store -`restoreServerStats` loads stats from the backup file but overrides `_filesCount` and `_filesSize` with values computed from the live file store (TMap size and `usedStorage` TVar). If the backup values differ, warnings are logged. This handles cases where files were expired or deleted while the server was down. +`restoreServerStats` loads stats from the backup file but overrides `_filesCount` and `_filesSize` with values computed from the live file store (TMap size and `usedStorage` TVar). If the backup values differ, warnings are logged. This handles cases where files were expired or deleted while the router was down. ### 10. File expiration with configurable throttling -`expireServerFiles` accepts an optional `itemDelay` (100ms when called from the periodic thread, `Nothing` at startup). Between each file check, `threadDelay itemDelay` prevents expiration from monopolizing IO. At startup, files are expired without delay to clean up quickly. +`expireServerFiles` accepts an optional `itemDelay` (100ms when called from the periodic thread, `Nothing` at router startup). Between each file check, `threadDelay itemDelay` prevents expiration from monopolizing IO. At startup, files are expired without delay to clean up quickly. ### 11. Stats log aligns to wall-clock midnight @@ -78,7 +78,7 @@ Despite the name suggesting it only marks a file as blocked, `blockServerFile` a ### 13. SNI-dependent CORS and web serving -CORS headers require both `sniUsed = True` and `addCORSHeaders = True` in the transport config. Static web page serving is enabled when `sniUsed = True`. Non-SNI connections (direct TLS without hostname) skip both CORS and web serving. This separates the web-facing and protocol-facing behaviors of the same port. +CORS headers require both `sniUsed = True` and `addCORSHeaders = True` in the transport config. Static web page serving is enabled when `sniUsed = True`. Non-SNI connections (direct TLS without hostname) skip both CORS and web serving. This separates the web-facing and protocol-facing behaviors of the same router port. ### 14. Control port file operations use recipient index diff --git a/spec/modules/Simplex/FileTransfer/Server/Env.md b/spec/modules/Simplex/FileTransfer/Server/Env.md index e9f509a1a..0b3bba3ff 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Env.md +++ b/spec/modules/Simplex/FileTransfer/Server/Env.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Server.Env -> XFTP server environment: configuration, storage quota tracking, and request routing. +> XFTP router environment: configuration, storage quota tracking, and request routing. **Source**: [`FileTransfer/Server/Env.hs`](../../../../../src/Simplex/FileTransfer/Server/Env.hs) @@ -8,7 +8,7 @@ ### 1. Startup storage accounting with quota warning -`newXFTPServerEnv` computes `usedStorage` by summing file sizes from the in-memory store at startup. If the computed usage exceeds the configured `fileSizeQuota`, a warning is logged but the server still starts. This allows the server to come up even if it's over quota (e.g., after a quota reduction), relying on expiration to reclaim space. +`newXFTPServerEnv` computes `usedStorage` by summing file sizes from the in-memory store at startup. If the computed usage exceeds the configured `fileSizeQuota`, a warning is logged but the router still starts. This allows the router to come up even if it's over quota (e.g., after a quota reduction), relying on expiration to reclaim space. ### 2. XFTPRequest ADT separates new files from commands @@ -21,4 +21,4 @@ This separation occurs after credential verification in `Server.hs`. `XFTPReqNew ### 3. fileTimeout for upload deadline -`fileTimeout` in `XFTPServerConfig` sets the maximum time allowed for a single file upload (FPUT). The server wraps the receive operation in `timeout fileTimeout`. Default is 5 minutes (for 4MB chunks). This prevents slow or stalled uploads from holding server resources indefinitely. +`fileTimeout` in `XFTPServerConfig` sets the maximum time allowed for a single file upload (FPUT). The router wraps the receive operation in `timeout fileTimeout`. Default is 5 minutes (for 4MB chunks). This prevents slow or stalled uploads from holding router resources indefinitely. diff --git a/spec/modules/Simplex/FileTransfer/Server/Main.md b/spec/modules/Simplex/FileTransfer/Server/Main.md index 54a45751f..c892e6bf5 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Main.md +++ b/spec/modules/Simplex/FileTransfer/Server/Main.md @@ -1,12 +1,12 @@ # Simplex.FileTransfer.Server.Main -> XFTP server CLI: INI configuration parsing, TLS setup, and default constants. +> XFTP router CLI: INI configuration parsing, TLS setup, and default constants. **Source**: [`FileTransfer/Server/Main.hs`](../../../../../src/Simplex/FileTransfer/Server/Main.hs) ## Non-obvious behavior -### 1. Key server constants +### 1. Key router constants | Constant | Value | Purpose | |----------|-------|---------| @@ -17,7 +17,7 @@ ### 2. allowedChunkSizes defaults to all four sizes -If not configured, `allowedChunkSizes` defaults to `[kb 64, kb 256, mb 1, mb 4]`. The INI file can restrict this to a subset, controlling which chunk sizes the server accepts. +If not configured, `allowedChunkSizes` defaults to `[kb 64, kb 256, mb 1, mb 4]`. The INI file can restrict this to a subset, controlling which chunk sizes the router accepts. ### 3. Storage quota from INI with unit parsing @@ -25,4 +25,4 @@ If not configured, `allowedChunkSizes` defaults to `[kb 64, kb 256, mb 1, mb 4]` ### 4. Dual TLS credential support -The server supports both primary TLS credentials (`caCertificateFile`/`certificateFile`/`privateKeyFile`) and optional HTTP-specific credentials (`httpCaCertificateFile`/etc.). When HTTP credentials are present, the server uses `defaultSupportedParamsHTTPS` which enables broader TLS compatibility for web clients. +The router supports both primary TLS credentials (`caCertificateFile`/`certificateFile`/`privateKeyFile`) and optional HTTP-specific credentials (`httpCaCertificateFile`/etc.). When HTTP credentials are present, the router uses `defaultSupportedParamsHTTPS` which enables broader TLS compatibility for web clients. diff --git a/spec/modules/Simplex/FileTransfer/Server/Stats.md b/spec/modules/Simplex/FileTransfer/Server/Stats.md index 7e684c58a..7eb2ad47b 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Stats.md +++ b/spec/modules/Simplex/FileTransfer/Server/Stats.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Server.Stats -> XFTP server statistics: IORef-based counters with backward-compatible persistence. +> XFTP router statistics: IORef-based counters with backward-compatible persistence. **Source**: [`FileTransfer/Server/Stats.hs`](../../../../../src/Simplex/FileTransfer/Server/Stats.hs) @@ -8,11 +8,11 @@ ### 1. setFileServerStats is not thread safe -`setFileServerStats` directly writes to IORefs without synchronization. It is explicitly intended for server startup only (restoring from backup file), before any concurrent threads are running. +`setFileServerStats` directly writes to IORefs without synchronization. It is explicitly intended for router startup only (restoring from backup file), before any concurrent threads are running. ### 2. Backward-compatible parsing -The `strP` parser uses `opt` for newer fields, defaulting missing fields to 0. This allows reading stats files from older server versions that don't include fields like `filesBlocked` or `fileDownloadAcks`. +The `strP` parser uses `opt` for newer fields, defaulting missing fields to 0. This allows reading stats files from older router versions that don't include fields like `filesBlocked` or `fileDownloadAcks`. ### 3. PeriodStats for download tracking diff --git a/spec/modules/Simplex/FileTransfer/Server/Store.md b/spec/modules/Simplex/FileTransfer/Server/Store.md index 89b0c3b36..f2ded441e 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Store.md +++ b/spec/modules/Simplex/FileTransfer/Server/Store.md @@ -36,4 +36,4 @@ File timestamps use `RoundedFileTime` which is `RoundedSystemTime 3600` — syst ### 8. blockFile conditional storage adjustment -`blockFile` takes a `deleted :: Bool` parameter. When `True` (file blocked with physical deletion), it subtracts the file size from `usedStorage`. When `False` (block without deletion), storage is unchanged. This allows blocking without physical deletion for audit purposes. Currently, both the server's `blockServerFile` and the store log replay path pass `True`. +`blockFile` takes a `deleted :: Bool` parameter. When `True` (file blocked with physical deletion), it subtracts the file size from `usedStorage`. When `False` (block without deletion), storage is unchanged. This allows blocking without physical deletion for audit purposes. Currently, both the router's `blockServerFile` and the store log replay path pass `True`. diff --git a/spec/modules/Simplex/FileTransfer/Server/StoreLog.md b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md index 35a339515..6549c3666 100644 --- a/spec/modules/Simplex/FileTransfer/Server/StoreLog.md +++ b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Server.StoreLog -> Append-only store log for XFTP file operations with error-resilient replay and compaction. +> Append-only store log for XFTP router file operations with error-resilient replay and compaction. **Source**: [`FileTransfer/Server/StoreLog.hs`](../../../../../src/Simplex/FileTransfer/Server/StoreLog.hs) @@ -8,7 +8,7 @@ ### 1. Error-resilient replay -`readFileStore` parses the store log line-by-line. Lines that fail to parse or fail to process (e.g., referencing a nonexistent sender ID) are logged as errors but do not halt replay. The store is reconstructed from whatever valid entries exist. This allows the server to recover from partial log corruption. +`readFileStore` parses the store log line-by-line. Lines that fail to parse or fail to process (e.g., referencing a nonexistent sender ID) are logged as errors but do not halt replay. The store is reconstructed from whatever valid entries exist. This allows the router to recover from partial log corruption. ### 2. Sender ID validation on recipient writes @@ -16,7 +16,7 @@ ### 3. Backward-compatible status parsing -`AddFile` log entries include an `EntityStatus` field. The parser uses `<|> pure EntityActive` as a fallback, defaulting to `EntityActive` when the status field is missing. This allows reading store logs from older server versions that didn't record entity status. +`AddFile` log entries include an `EntityStatus` field. The parser uses `<|> pure EntityActive` as a fallback, defaulting to `EntityActive` when the status field is missing. This allows reading store logs from older router versions that didn't record entity status. ### 4. Compaction on restart diff --git a/spec/modules/Simplex/FileTransfer/Types.md b/spec/modules/Simplex/FileTransfer/Types.md index 814e65195..14abc7b21 100644 --- a/spec/modules/Simplex/FileTransfer/Types.md +++ b/spec/modules/Simplex/FileTransfer/Types.md @@ -12,7 +12,7 @@ ### 2. Send file status state machine -`SndFileStatus` progresses: `SFSNew` → `SFSEncrypting` → `SFSEncrypted` → `SFSUploading` → `SFSComplete`, with `SFSError` as terminal. The prepare worker handles `SFSNew` → `SFSEncrypted` (including retry from `SFSEncrypting`), while per-server upload workers handle `SFSUploading` → `SFSComplete`. +`SndFileStatus` progresses: `SFSNew` → `SFSEncrypting` → `SFSEncrypted` → `SFSUploading` → `SFSComplete`, with `SFSError` as terminal. The prepare worker handles `SFSNew` → `SFSEncrypted` (including retry from `SFSEncrypting`), while per-router upload workers handle `SFSUploading` → `SFSComplete`. ### 3. Encrypted file path convention diff --git a/spec/modules/Simplex/Messaging/Agent.md b/spec/modules/Simplex/Messaging/Agent.md index a52be2156..e2cac0638 100644 --- a/spec/modules/Simplex/Messaging/Agent.md +++ b/spec/modules/Simplex/Messaging/Agent.md @@ -15,7 +15,7 @@ This module is the top-level SimpleX agent, consumed by simplex-chat and other c ### Agent startup — backgroundMode `getSMPAgentClient_` accepts a `backgroundMode` flag that fundamentally changes agent capabilities: -- **Normal mode** (`backgroundMode = False`): starts four threads raced via `raceAny_` — `subscriber` (main event loop), `runNtfSupervisor` (notification management), `cleanupManager` (garbage collection), `logServersStats` (statistics). Also restores persisted server statistics. If any thread crashes, all are cancelled; statistics are saved in a `finally` block. +- **Normal mode** (`backgroundMode = False`): starts four threads raced via `raceAny_` — `subscriber` (main event loop), `runNtfSupervisor` (notification management), `cleanupManager` (garbage collection), `logServersStats` (statistics). Also restores persisted router statistics. If any thread crashes, all are cancelled; statistics are saved in a `finally` block. - **Background mode** (`backgroundMode = True`): starts only the `subscriber` thread. No cleanup, no notifications, no stats persistence. Used when the agent needs minimal receive-only operation. Thread crashes are caught by the `run` wrapper: if the agent is still active (`acThread` is set), the exception is reported as `CRITICAL True` to `subQ`. If the agent is being disposed, crashes are silently ignored. diff --git a/spec/modules/Simplex/Messaging/Agent/Client.md b/spec/modules/Simplex/Messaging/Agent/Client.md index f1f4965b6..0177b4f70 100644 --- a/spec/modules/Simplex/Messaging/Agent/Client.md +++ b/spec/modules/Simplex/Messaging/Agent/Client.md @@ -29,7 +29,7 @@ The module is consumed by Agent.hs (which passes specific worker bodies, task qu - **Operation states**: `ntfNetworkOp`, `rcvNetworkOp`, `msgDeliveryOp`, `sndNetworkOp`, `databaseOp` - **Locking**: `connLocks`, `invLocks`, `deleteLock`, `getMsgLocks`, `clientNoticesLock` - **Service state**: `useClientServices` (per-user boolean controlling whether service certificates are used) -- **Proxy routing**: `smpProxiedRelays` (maps destination transport session → proxy server used) +- **Proxy routing**: `smpProxiedRelays` (maps destination transport session → proxy router used) - **Network state**: `userNetworkInfo`, `userNetworkUpdated`, `useNetworkConfig` (slow/fast pair) All TVars are initialized in `newAgentClient`. The `active` TVar is the global kill switch — `closeAgentClient` sets it to `False`, and all protocol client getters check it first. @@ -56,11 +56,11 @@ When `newProtocolClient` fails and `persistErrorInterval > 0`, the error is cach 1. **Session ID registration**: `SS.setSessionId` records the TLS session ID in `currentSubs`, linking the transport session to the actual TLS connection for later session validation. -2. **Service credential synchronization** (`updateClientService`): After connecting, compares client-side and server-side service state. Four cases: +2. **Service credential synchronization** (`updateClientService`): After connecting, compares client-side and router-side service state. Four cases: - Both have service and IDs match → update DB (no-op if same) - Both have service but IDs differ → update DB and remove old queue-service associations - - Client has service, server doesn't → delete client service (handles server version downgrade) - - Server has service, client doesn't → log error (should not happen in normal flow) + - Client has service, router doesn't → delete client service (handles router version downgrade) + - Router has service, client doesn't → log error (should not happen in normal flow) On connection failure, `smpConnectClient` triggers `resubscribeSMPSession` before re-throwing the error. This ensures pending subscriptions get retry logic even when the initial connection attempt fails. @@ -182,7 +182,7 @@ The `clientNoticesLock` TMVar serializes notice processing across concurrent sub ### processSubResults — partitioning Subscription results are partitioned into five categories: -1. **Failed with client notice** — error has an associated server-side notice (e.g., queue status change). Queue is treated as failed (removed from pending, added to `removedSubs`) AND the notice is recorded for processing. +1. **Failed with client notice** — error has an associated router-side notice (e.g., queue status change). Queue is treated as failed (removed from pending, added to `removedSubs`) AND the notice is recorded for processing. 2. **Failed permanently** — non-temporary error without notice, queue is removed from pending and added to `removedSubs` 3. **Failed temporarily** — error is transient, queue stays in pending unchanged for retry on reconnect 4. **Subscribed** — moved from pending to active. Further split into: queues whose service ID matches the session service (added as service-associated) and others. If the queue had a tracked `clientNoticeId`, it is cleared (notice resolved by successful subscription). @@ -205,18 +205,18 @@ Subscription results are partitioned into five categories: Implements SMP proxy/direct routing with fallback: -1. `shouldUseProxy` checks `smpProxyMode` (Always/Unknown/Unprotected/Never) and whether the destination server is "known" (in the user's server list) +1. `shouldUseProxy` checks `smpProxyMode` (Always/Unknown/Unprotected/Never) and whether the destination router is "known" (in the user's router list) 2. If proxying: `getSMPProxyClient` creates or reuses a proxy connection, then `connectSMPProxiedRelay` establishes the relay session. On `NO_SESSION` error, re-creates the relay session through the same proxy. 3. If proxying fails with a host error and `smpProxyFallback` allows it: falls back to direct connection 4. `deleteRelaySession` carefully validates that the current relay session matches the one that failed before removing it (prevents removing a concurrently-created replacement session) -**NO_SESSION retry limit**: On `NO_SESSION`, `sendViaProxy` is called recursively with `Just proxySrv` to reuse the same proxy server. If the recursive call also gets `NO_SESSION`, it throws `proxyError` instead of recursing again — `proxySrv_` is `Just`, so the `Nothing` branch (which recurses) is not taken. This limits retry to exactly one attempt. +**NO_SESSION retry limit**: On `NO_SESSION`, `sendViaProxy` is called recursively with `Just proxySrv` to reuse the same proxy router. If the recursive call also gets `NO_SESSION`, it throws `proxyError` instead of recursing again — `proxySrv_` is `Just`, so the `Nothing` branch (which recurses) is not taken. This limits retry to exactly one attempt. **Proxy selection caching** (`smpProxiedRelays`): When `getSMPProxyClient` selects a proxy for a destination, it atomically inserts the proxy→destination mapping into `smpProxiedRelays`. If a mapping already exists (another thread selected a proxy for the same destination), the existing mapping is used. On relay creation failure with non-host errors, both the relay session and proxy mapping are removed. On host errors, they are preserved to allow fallback logic. ## Service credentials lifecycle -`getServiceCredentials` manages per-user, per-server service certificate credentials: +`getServiceCredentials` manages per-user, per-router service certificate credentials: 1. Checks `useClientServices` — if the user has services disabled, returns `Nothing` 2. Looks up existing credentials in DB via `getClientServiceCredentials` @@ -235,15 +235,15 @@ The generated credentials are Ed25519 self-signed certificates with `simplex` or `withStoreBatch` / `withStoreBatch'` run multiple DB operations in a single transaction, catching exceptions per-operation to report individual failures. The entire batch is within one `agentOperationBracket`. -## Server selection — getNextServer / withNextSrv +## Router selection — getNextServer / withNextSrv -Server selection has two-level diversity: -1. **Operator diversity**: prefer servers from operators not already used (tracked by `usedOperators` set) -2. **Host diversity**: prefer servers with hosts not already used (tracked by `usedHosts` set) +Router selection has two-level diversity: +1. **Operator diversity**: prefer routers from operators not already used (tracked by `usedOperators` set) +2. **Host diversity**: prefer routers with hosts not already used (tracked by `usedHosts` set) -`filterOrAll` ensures that if all servers are "used," the full list is returned rather than an empty one. +`filterOrAll` ensures that if all routers are "used," the full list is returned rather than an empty one. -`withNextSrv` is designed for retry loops — it re-reads user servers on each call (allowing configuration changes during retries) and tracks `triedHosts` across attempts. When all hosts are tried, the tried set is reset (`S.empty`), creating a round-robin effect. +`withNextSrv` is designed for retry loops — it re-reads user routers on each call (allowing configuration changes during retries) and tracks `triedHosts` across attempts. When all hosts are tried, the tried set is reset (`S.empty`), creating a round-robin effect. ## Locking primitives @@ -295,6 +295,6 @@ Classifies errors as temporary (retryable) or permanent. Notable non-obvious cla - `CRITICAL True` is temporary — `True` means the error shows a restart button, implying the user should retry. `CRITICAL False` is permanent. - `INACTIVE` is temporary — the agent may be reactivated - `SMP.PROXY NO_SESSION` via proxy is temporary — session can be re-established -- `SMP.STORE _` is temporary — server-side store error, not a client issue +- `SMP.STORE _` is temporary — router-side store error, not a client issue `temporaryOrHostError` extends `temporaryAgentError` to also include host-related errors (`HOST`, `TRANSPORT TEVersion`). Used in subscription management where host errors should trigger resubscription rather than permanent failure. diff --git a/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md b/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md index 7bfb10bbc..ec7852acf 100644 --- a/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md +++ b/spec/modules/Simplex/Messaging/Agent/Env/SQLite.md @@ -6,4 +6,4 @@ ## mkUserServers — silent fallback on all-disabled -See comment on `mkUserServers`. If filtering servers by `enabled && role` yields an empty list, `fromMaybe srvs` falls back to *all* servers regardless of enabled/role status. This prevents a configuration where all servers are disabled from leaving the user with no servers — but means disabled servers can still be used if every server in a role is disabled. +See comment on `mkUserServers`. If filtering routers by `enabled && role` yields an empty list, `fromMaybe srvs` falls back to *all* routers regardless of enabled/role status. This prevents a configuration where all routers are disabled from leaving the user with no routers — but means disabled routers can still be used if every router in a role is disabled. diff --git a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md index d55cfd746..ac591c192 100644 --- a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md +++ b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md @@ -6,7 +6,7 @@ ## Architecture -The notification system uses a supervisor with **three worker pools**, each keyed by server address: +The notification system uses a supervisor with **three worker pools**, each keyed by router address: | Pool | Key | Purpose | |------|-----|---------| @@ -23,7 +23,7 @@ The supervisor (`runNtfSupervisor`) reads commands from `ntfSubQ` and dispatches `partitionQueueSubActions` classifies each (queue, subscription) pair into one of four buckets: - **New sub**: no existing subscription record — create from scratch -- **Reset sub**: credentials mismatch (SMP server changed, notifier ID changed, action was nulled by error, or action is a delete) — wipe and restart from SMP key exchange +- **Reset sub**: credentials mismatch (SMP router changed, notifier ID changed, action was nulled by error, or action is a delete) — wipe and restart from SMP key exchange - **Continue SMP work**: existing action is `NSASMP` and credentials are consistent — kick the SMP worker - **Continue NTF work**: existing action is `NSANtf` and credentials are consistent — kick the NTF worker @@ -54,11 +54,11 @@ Successful check responses with statuses not in `subscribeNtfStatuses` also trig Token deletion splits into two phases: 1. **Store phase**: Remove token from active store, persist `(server, privateKey, tokenId)` to a deletion queue via `addNtfTokenToDelete` -2. **Network phase**: `runNtfTknDelWorker` reads from the queue and performs the actual server-side deletion +2. **Network phase**: `runNtfTknDelWorker` reads from the queue and performs the actual router-side deletion On supervisor startup, `startTknDelete` scans for any pending deletion queue entries and launches workers. This ensures token cleanup survives agent restarts. -If the token has no server-side ID (`ntfTokenId = Nothing`), only the store phase runs — no worker is launched. +If the token has no router-side ID (`ntfTokenId = Nothing`), only the store phase runs — no worker is launched. ### 6. workerErrors nulls subscription action @@ -88,7 +88,7 @@ When token deletion gets a permanent (non-temporary, non-host) error, the deleti ### 12. getNtfServer — random selection from multiple -When multiple notification routers are configured, one is selected randomly using `randomR` with a session-stable `TVar` generator. Single-server configurations skip the randomness. +When multiple notification routers are configured, one is selected randomly using `randomR` with a session-stable `TVar` generator. Single-router configurations skip the randomness. ### 13. closeNtfSupervisor — atomic swap then cancel diff --git a/spec/modules/Simplex/Messaging/Agent/Protocol.md b/spec/modules/Simplex/Messaging/Agent/Protocol.md index ad95df809..c6e65fbdf 100644 --- a/spec/modules/Simplex/Messaging/Agent/Protocol.md +++ b/spec/modules/Simplex/Messaging/Agent/Protocol.md @@ -64,9 +64,9 @@ The semicolon separator for SMP queues in the URI query string is deliberate — Short links encode `ContactConnType` as a single lowercase letter in the URL path: `a` (contact), `c` (channel), `g` (group), `r` (relay). Invitation links use `i`. The parser uses `toUpper` before dispatching to `ctTypeP` (which expects uppercase), while the encoder uses `toLower` on `ctTypeChar` output. This case dance happens because the wire format wants lowercase URLs but the internal representation uses uppercase. -## Short link server shortening +## Short link router shortening -`shortenShortLink` strips port and key hash from preset servers, leaving only the hostname (`SMPServerOnlyHost` pattern). This makes short links shorter for well-known servers. `restoreShortLink` reverses this by looking up the full server definition from the preset list. Both functions match on primary hostname only (first in the `NonEmpty` list). +`shortenShortLink` strips port and key hash from preset routers, leaving only the hostname (`SMPServerOnlyHost` pattern). This makes short links shorter for well-known routers. `restoreShortLink` reverses this by looking up the full router definition from the preset list. Both functions match on primary hostname only (first in the `NonEmpty` list). `isPresetServer` has a non-obvious port matching rule: empty port in the preset matches `"443"` or `"5223"` in the link. This handles servers that use default ports without explicitly listing them. diff --git a/spec/modules/Simplex/Messaging/Agent/Stats.md b/spec/modules/Simplex/Messaging/Agent/Stats.md index d793564e7..d501c3f7e 100644 --- a/spec/modules/Simplex/Messaging/Agent/Stats.md +++ b/spec/modules/Simplex/Messaging/Agent/Stats.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Agent.Stats -> Per-server statistics counters (SMP, XFTP, NTF) with TVar-based live state and serializable snapshots. +> Per-router statistics counters (SMP, XFTP, NTF) with TVar-based live state and serializable snapshots. **Source**: [`Agent/Stats.hs`](../../../../../src/Simplex/Messaging/Agent/Stats.hs) diff --git a/spec/modules/Simplex/Messaging/Notifications/Protocol.md b/spec/modules/Simplex/Messaging/Notifications/Protocol.md index 71daf771d..fb718fd80 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Protocol.md +++ b/spec/modules/Simplex/Messaging/Notifications/Protocol.md @@ -28,7 +28,7 @@ When encoding `NRTkn` responses, the `NTInvalid` reason is only included if the ### 4. subscribeNtfStatuses migration invariant -The comment on `subscribeNtfStatuses` (`[NSNew, NSPending, NSActive, NSInactive]`) warns that changing these statuses requires a new database migration for queue ID hashes (see `m20250830_queue_ids_hash`). This is a cross-module invariant between protocol types and server storage. +The comment on `subscribeNtfStatuses` (`[NSNew, NSPending, NSActive, NSInactive]`) warns that changing these statuses requires a new database migration for queue ID hashes (see `m20250830_queue_ids_hash`). This is a cross-module invariant between protocol types and router storage. ### 5. allowNtfSubCommands permits NTInvalid and NTExpired @@ -36,7 +36,7 @@ Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL), which ### 6. PPApnsNull test provider -`PPApnsNull` is a push provider that never communicates with APNS. It's used for end-to-end testing of the notification server from clients without requiring actual push infrastructure. +`PPApnsNull` is a push provider that never communicates with APNS. It's used for end-to-end testing of the notification router from clients without requiring actual push infrastructure. ### 7. DeviceToken hex validation @@ -44,7 +44,7 @@ Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL), which ### 8. SMPQueueNtf parsing applies updateSMPServerHosts -Both `smpP` and `strP` for `SMPQueueNtf` apply `updateSMPServerHosts` to the parsed SMP server. This normalizes server host addresses on deserialization, ensuring consistent comparison even if the on-wire format uses different host representations. +Both `smpP` and `strP` for `SMPQueueNtf` apply `updateSMPServerHosts` to the parsed SMP server. This normalizes router host addresses on deserialization, ensuring consistent comparison even if the on-wire format uses different host representations. ### 9. NRTknId response tag comment diff --git a/spec/modules/Simplex/Messaging/Notifications/Server.md b/spec/modules/Simplex/Messaging/Notifications/Server.md index 5c74878d7..d77a30a00 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server.md @@ -1,12 +1,12 @@ # Simplex.Messaging.Notifications.Server -> NTF server: manages tokens, subscriptions, SMP subscriber connections, and push notification delivery. +> NTF router: manages tokens, subscriptions, SMP subscriber connections, and push notification delivery. **Source**: [`Notifications/Server.hs`](../../../../../src/Simplex/Messaging/Notifications/Server.hs) ## Architecture -The NTF server runs several concurrent threads via `raceAny_`: +The NTF router runs several concurrent threads via `raceAny_`: | Thread | Purpose | |--------|---------| @@ -26,7 +26,7 @@ When `verifyNtfTransmission` encounters an AUTH error (entity not found), it cal ### 2. TNEW idempotent re-registration -When TNEW is received for an already-registered token, the server: +When TNEW is received for an already-registered token, the router: 1. Looks up the existing token via `findNtfTokenRegistration` (matches on push provider, device token, AND verify key) 2. Verifies the DH secret matches (recomputed from the new `dhPubKey` and stored `tknDhPrivKey`) 3. If DH secrets differ → AUTH error (prevents token hijacking) @@ -36,7 +36,7 @@ If the verify key doesn't match in step 1, the lookup returns `Nothing` and a ne ### 3. SNEW idempotent subscription -When SNEW is received for an existing subscription (same token + SMP queue), the server returns the existing `ntfSubId` if the notifier key matches. If keys differ, AUTH error. New subscriptions are only created when no match exists in `findNtfSubscription`. +When SNEW is received for an existing subscription (same token + SMP queue), the router returns the existing `ntfSubId` if the notifier key matches. If keys differ, AUTH error. New subscriptions are only created when no match exists in `findNtfSubscription`. ### 4. PPApnsNull suppresses statistics @@ -44,7 +44,7 @@ When SNEW is received for an existing subscription (same token + SMP queue), the ### 5. END requires active session validation -SMP END messages are only processed when the originating session is the currently active session for that server (`activeClientSession'` check). This prevents stale END messages from previous (reconnected) sessions from incorrectly marking subscriptions as ended. +SMP END messages are only processed when the originating session is the currently active session for that router (`activeClientSession'` check). This prevents stale END messages from previous (reconnected) sessions from incorrectly marking subscriptions as ended. ### 6. waitForSMPSubscriber two-phase wait @@ -52,9 +52,9 @@ SMP END messages are only processed when the originating session is the currentl ### 7. CAServiceUnavailable triggers individual resubscription -When a service subscription becomes unavailable (SMP server rejects service credentials), the NTF server: +When a service subscription becomes unavailable (SMP router rejects service credentials), the NTF router: 1. Removes the service association from the database -2. Resubscribes all individual queues for that server via `subscribeSrvSubs` +2. Resubscribes all individual queues for that router via `subscribeSrvSubs` This is the fallback path from service-level to queue-level SMP subscriptions. @@ -70,9 +70,9 @@ On the second failure, the error is logged and returned. `PPTokenInvalid` marks Cron notification interval has a hard minimum of 20 minutes. `TCRN 0` disables cron notifications. `TCRN n` where `1 <= n < 20` returns `QUOTA` error. -### 10. Startup resubscription is concurrent per server +### 10. Startup resubscription is concurrent per router -`resubscribe` uses `mapConcurrently` to resubscribe to all known SMP servers in parallel. Within each server, subscriptions are paginated via `subscribeLoop` using cursor-based pagination (`afterSubId_`). +`resubscribe` uses `mapConcurrently` to resubscribe to all known SMP routers in parallel. Within each router, subscriptions are paginated via `subscribeLoop` using cursor-based pagination (`afterSubId_`). ### 11. receive separates error responses from commands @@ -80,7 +80,7 @@ The `receive` function processes incoming transmissions and partitions results: ### 12. Maintenance mode saves state then exits immediately -When `maintenance` is set in `startOptions`, the server restores stats, calls `stopServer` (closes DB, saves stats), and exits with `exitSuccess`. It never starts transport listeners, subscriber threads, or resubscription. This provides a way to run database migrations without the server serving traffic. +When `maintenance` is set in `startOptions`, the router restores stats, calls `stopServer` (closes DB, saves stats), and exits with `exitSuccess`. It never starts transport listeners, subscriber threads, or resubscription. This provides a way to run database migrations without the router serving traffic. ### 13. Resubscription runs as a detached fork @@ -88,7 +88,7 @@ When `maintenance` is set in `startOptions`, the server restores stats, calls `s ### 14. TNEW re-registration resets status for non-verifiable tokens -When a re-registration TNEW matches on DH secret but `allowTokenVerification tknStatus` is `False` (token is `NTNew`, `NTInvalid`, or `NTExpired`), the server resets status to `NTRegistered` before sending the verification push. This makes TNEW a "status repair" mechanism — clients with stuck tokens can restart the verification flow by re-registering with the same DH key. +When a re-registration TNEW matches on DH secret but `allowTokenVerification tknStatus` is `False` (token is `NTNew`, `NTInvalid`, or `NTExpired`), the router resets status to `NTRegistered` before sending the verification push. This makes TNEW a "status repair" mechanism — clients with stuck tokens can restart the verification flow by re-registering with the same DH key. ### 15. DELD unconditionally updates status (no session validation) @@ -96,7 +96,7 @@ Unlike `SMP.END` which checks `activeClientSession'` to prevent stale session me ### 16. TRPL generates new code but reuses the DH key -`TRPL` (token replace) creates a new registration code and resets status to `NTRegistered`, but does NOT generate a new server DH key pair. The existing `tknDhPrivKey` and `tknDhSecret` are preserved — only the push provider token and registration code change. The encrypted channel between client and NTF router persists across device token replacements. +`TRPL` (token replace) creates a new registration code and resets status to `NTRegistered`, but does NOT generate a new router DH key pair. The existing `tknDhPrivKey` and `tknDhSecret` are preserved — only the push provider token and registration code change. The encrypted channel between client and NTF router persists across device token replacements. ### 17. PNMessage delivery requires NTActive, verification and cron do not @@ -112,7 +112,7 @@ When a service subscription is confirmed, the NTF router compares expected and c ### 20. subscribeLoop calls exitFailure on database error -If `getServerNtfSubscriptions` returns `Left _` during startup resubscription, the server terminates via `exitFailure`. Since `resubscribe` runs in a forked thread (pattern 13), this `exitFailure` terminates the entire process — a transient database error during startup resubscription kills the server. +If `getServerNtfSubscriptions` returns `Left _` during startup resubscription, the router terminates via `exitFailure`. Since `resubscribe` runs in a forked thread (pattern 13), this `exitFailure` terminates the entire process — a transient database error during startup resubscription kills the router. ### 21. Stats log aligns to wall-clock time of day @@ -120,7 +120,7 @@ The stats logging thread calculates an `initialDelay` to synchronize the first f ### 22. NMSG AUTH errors silently counted, not logged -When `addTokenLastNtf` returns `Left AUTH` (notification for a queue whose subscription/token association is invalid), the server increments `ntfReceivedAuth` but takes no corrective action. Other error types are silently ignored. This is expected — subscriptions may be deleted while messages are in-flight. +When `addTokenLastNtf` returns `Left AUTH` (notification for a queue whose subscription/token association is invalid), the router increments `ntfReceivedAuth` but takes no corrective action. Other error types are silently ignored. This is expected — subscriptions may be deleted while messages are in-flight. ### 23. PNVerification delivery transitions token to NTConfirmed diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Control.md b/spec/modules/Simplex/Messaging/Notifications/Server/Control.md index 897f81c16..cbdb5b416 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Control.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Control.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Notifications.Server.Control -> Control port command protocol for NTF server administration. +> Control port command protocol for NTF router administration. **Source**: [`Notifications/Server/Control.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Control.hs) diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Env.md b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md index c266390d2..17ae63862 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Env.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Env.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Notifications.Server.Env -> NTF server environment: configuration, subscriber state, and push provider management. +> NTF router environment: configuration, subscriber state, and push provider management. **Source**: [`Notifications/Server/Env.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Env.hs) @@ -8,7 +8,7 @@ ### 1. Service credentials are lazily generated -`mkDbService` in `newNtfServerEnv` generates service credentials on demand: when `getCredentials` is called for an SMP server, it checks the database. If the server is known and already has credentials, they are reused. If the server is known but has no credentials yet (first connection), new credentials are generated via `genCredentials`, stored in the database, and returned. If the server is not in the database at all, `PCEServiceUnavailable` is thrown (this case should not occur in practice, as clients only connect to servers already tracked in the database). +`mkDbService` in `newNtfServerEnv` generates service credentials on demand: when `getCredentials` is called for an SMP router, it checks the database. If the router is known and already has credentials, they are reused. If the router is known but has no credentials yet (first connection), new credentials are generated via `genCredentials`, stored in the database, and returned. If the router is not in the database at all, `PCEServiceUnavailable` is thrown (this case should not occur in practice, as clients only connect to routers already tracked in the database). Service credentials are only used when `useServiceCreds` is enabled in the config. @@ -18,7 +18,7 @@ Service credentials are only used when `useServiceCreds` is enabled in the confi ### 3. getPushClient lazy initialization -`getPushClient` looks up the push client by provider in `pushClients` TMap. If not found, it calls `newPushClient` to create and register one. Push provider connections are established on first use, not at server startup. +`getPushClient` looks up the push client by provider in `pushClients` TMap. If not found, it calls `newPushClient` to create and register one. Push provider connections are established on first use, not at router startup. ### 4. Service credential validity: 25h backdating, ~2700yr forward diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Main.md b/spec/modules/Simplex/Messaging/Notifications/Server/Main.md index 3719dcd97..54136f1c3 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Main.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Main.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Notifications.Server.Main -> CLI interface and INI configuration parsing for the NTF server. +> CLI interface and INI configuration parsing for the NTF router. **Source**: [`Notifications/Server/Main.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Main.hs) diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md index d2a49471d..3fd2bd880 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md @@ -72,4 +72,4 @@ The comment explicitly states `APNSErrorResponse` is `data` rather than `newtype ### 17. Connection initialization is fire-and-forget -`createAPNSPushClient` calls `connectHTTPS2` and discards the result with `void`. If the initial connection fails, the error is only logged — the client is still created. The first push delivery triggers `getApnsHTTP2Client` which reconnects. This means the server can start even if APNS is unreachable. +`createAPNSPushClient` calls `connectHTTPS2` and discards the result with `void`. If the initial connection fails, the error is only logged — the client is still created. The first push delivery triggers `getApnsHTTP2Client` which reconnects. This means the router can start even if APNS is unreachable. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md index 4a4439f54..d954f03d1 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Stats.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Notifications.Server.Stats -> NTF server statistics collection with own-server breakdown and backward-compatible persistence. +> NTF router statistics collection with own-router breakdown and backward-compatible persistence. **Source**: [`Notifications/Server/Stats.hs`](../../../../../../src/Simplex/Messaging/Notifications/Server/Stats.hs) @@ -8,27 +8,27 @@ ### 1. incServerStat double lookup -`incServerStat` performs a non-STM IO lookup first. On cache hit, the STM transaction only touches the per-server `TVar Int` without reading the shared TMap, avoiding contention. On cache miss, the STM block re-checks the map to handle races (another thread may have inserted between the IO lookup and STM entry). +`incServerStat` performs a non-STM IO lookup first. On cache hit, the STM transaction only touches the per-router `TVar Int` without reading the shared TMap, avoiding contention. On cache miss, the STM block re-checks the map to handle races (another thread may have inserted between the IO lookup and STM entry). ### 2. setNtfServerStats is not thread safe -`setNtfServerStats` is explicitly documented as non-thread-safe and intended for server startup only (restoring from backup file). +`setNtfServerStats` is explicitly documented as non-thread-safe and intended for router startup only (restoring from backup file). ### 3. Backward-compatible parsing -The `strP` parser uses `opt` which defaults missing fields to 0. This allows reading stats files from older server versions that don't include newer fields (`ntfReceivedAuth`, `ntfFailed`, `ntfVrf*`, etc.). +The `strP` parser uses `opt` which defaults missing fields to 0. This allows reading stats files from older router versions that don't include newer fields (`ntfReceivedAuth`, `ntfFailed`, `ntfVrf*`, etc.). ### 4. getNtfServerStatsData is a non-atomic snapshot -`getNtfServerStatsData` reads each `IORef` and `TMap` field sequentially in plain `IO`, not inside a single STM transaction. The returned `NtfServerStatsData` is not a consistent point-in-time snapshot — invariants like "received >= delivered" may not hold. The same applies to `getStatsByServer`, which does one `readTVarIO` for the map root TVar, then a separate `readTVarIO` for each per-server TVar. This is acceptable for periodic reporting where approximate consistency suffices. +`getNtfServerStatsData` reads each `IORef` and `TMap` field sequentially in plain `IO`, not inside a single STM transaction. The returned `NtfServerStatsData` is not a consistent point-in-time snapshot — invariants like "received >= delivered" may not hold. The same applies to `getStatsByServer`, which does one `readTVarIO` for the map root TVar, then a separate `readTVarIO` for each per-router TVar. This is acceptable for periodic reporting where approximate consistency suffices. ### 5. Mixed IORef/TVar concurrency primitives -Aggregate counters (`ntfReceived`, `ntfDelivered`, etc.) use `IORef Int` incremented via `atomicModifyIORef'_`, while per-server breakdowns use `TMap Text (TVar Int)` incremented atomically via STM in `incServerStat`. Although both individual operations are atomic, the aggregate and per-server increments are separate operations, so their values can drift: a thread could increment the aggregate `IORef` before `incServerStat` runs, or vice versa. +Aggregate counters (`ntfReceived`, `ntfDelivered`, etc.) use `IORef Int` incremented via `atomicModifyIORef'_`, while per-router breakdowns use `TMap Text (TVar Int)` incremented atomically via STM in `incServerStat`. Although both individual operations are atomic, the aggregate and per-router increments are separate operations, so their values can drift: a thread could increment the aggregate `IORef` before `incServerStat` runs, or vice versa. ### 6. setStatsByServer replaces TMap atomically but orphans old TVars -`setStatsByServer` builds a fresh `Map Text (TVar Int)` in IO via `newTVarIO`, then atomically replaces the TMap's root TVar. Old per-server TVars are not reused — any other thread holding a reference from a prior `TM.lookupIO` would modify an orphaned counter. Safe only because it's called at startup (like `setNtfServerStats`), but lacks the explicit "not thread safe" comment. +`setStatsByServer` builds a fresh `Map Text (TVar Int)` in IO via `newTVarIO`, then atomically replaces the TMap's root TVar. Old per-router TVars are not reused — any other thread holding a reference from a prior `TM.lookupIO` would modify an orphaned counter. Safe only because it's called at startup (like `setNtfServerStats`), but lacks the explicit "not thread safe" comment. ### 7. Positional parser format despite key=value appearance diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md index 05a7e70e2..d9deedbf4 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md @@ -36,7 +36,7 @@ When `stmDeleteNtfToken` removes a token, it deletes the entry from the inner `T ### 8. deleteTokenSubs returns SMP queues for upstream unsubscription -`deleteTokenSubs` atomically collects all `SMPQueueNtf` values from the deleted subscriptions and returns them. This is how the server layer knows which SMP notifier subscriptions to tear down. `stmRemoveInactiveTokenRegistrations` discards this list (`void $`), meaning rival-token cleanup does **not** trigger SMP unsubscription — only explicit token deletion does. +`deleteTokenSubs` atomically collects all `SMPQueueNtf` values from the deleted subscriptions and returns them. This is how the router layer knows which SMP notifier subscriptions to tear down. `stmRemoveInactiveTokenRegistrations` discards this list (`void $`), meaning rival-token cleanup does **not** trigger SMP unsubscription — only explicit token deletion does. ### 9. stmAddNtfSubscription always returns Just (vestigial Maybe) @@ -48,7 +48,7 @@ When `stmDeleteNtfSubscription` removes a subscription, it deletes the `subId` f ### 11. stmSetNtfService — asymmetric cleanup with Postgres store -`stmSetNtfService` uses `maybe TM.delete TM.insert` to either remove or set the service association for an SMP server. This is purely a key-value update with no cascading effects on subscriptions. The Postgres store's `removeServiceAndAssociations` handles subscription cleanup separately, meaning the STM and Postgres stores have **different cleanup semantics** for service removal. +`stmSetNtfService` uses `maybe TM.delete TM.insert` to either remove or set the service association for an SMP router. This is purely a key-value update with no cascading effects on subscriptions. The Postgres store's `removeServiceAndAssociations` handles subscription cleanup separately, meaning the STM and Postgres stores have **different cleanup semantics** for service removal. ### 12. Subscription index triple-write invariant diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md index 440797539..bde863eb6 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md @@ -8,7 +8,7 @@ ### 1. deleteNtfToken exclusive row lock -`deleteNtfToken` acquires `FOR UPDATE` on the token row before cascading deletes. This prevents concurrent subscription inserts for this token during the deletion window. The subscriptions are aggregated by SMP server and returned for in-memory subscription cleanup. +`deleteNtfToken` acquires `FOR UPDATE` on the token row before cascading deletes. This prevents concurrent subscription inserts for this token during the deletion window. The subscriptions are aggregated by SMP router and returned for in-memory subscription cleanup. ### 2. addTokenLastNtf atomic CTE @@ -47,11 +47,11 @@ Only non-service-associated subscriptions (`NOT ntf_service_assoc`) are returned ### 9. Server upsert optimization -`addNtfSubscription` first tries a plain SELECT for the SMP server, then falls back to INSERT with ON CONFLICT only if the server doesn't exist. This avoids the upsert overhead in the common case where the server already exists. +`addNtfSubscription` first tries a plain SELECT for the SMP router, then falls back to INSERT with ON CONFLICT only if the router doesn't exist. This avoids the upsert overhead in the common case where the router already exists. ### 10. Service association tracking -`batchUpdateSrvSubStatus` atomically updates both subscription status and `ntf_service_assoc` flag. When notifications arrive via a service subscription (`newServiceId` is `Just`), all affected subscriptions are marked as service-associated. `removeServiceAndAssociations` resets all subscriptions for a server to `NSInactive` with `ntf_service_assoc = FALSE`. +`batchUpdateSrvSubStatus` atomically updates both subscription status and `ntf_service_assoc` flag. When notifications arrive via a service subscription (`newServiceId` is `Just`), all affected subscriptions are marked as service-associated. `removeServiceAndAssociations` resets all subscriptions for a router to `NSInactive` with `ntf_service_assoc = FALSE`. ### 11. uninterruptibleMask_ wraps most store operations @@ -63,7 +63,7 @@ Only non-service-associated subscriptions (`NOT ntf_service_assoc`) are returned ### 13. getUsedSMPServers uncorrelated EXISTS -The `EXISTS` subquery in `getUsedSMPServers` has no join condition to the outer `smp_servers` table — it returns ALL servers if ANY subscription anywhere has a subscribable status. This is intentional for server startup: the server needs all SMP server records (including `ServiceSub` data) to rebuild in-memory state, and the EXISTS clause is a cheap guard against an empty subscription table. +The `EXISTS` subquery in `getUsedSMPServers` has no join condition to the outer `smp_servers` table — it returns ALL servers if ANY subscription anywhere has a subscribable status. This is intentional for router startup: the router needs all SMP router records (including `ServiceSub` data) to rebuild in-memory state, and the EXISTS clause is a cheap guard against an empty subscription table. ### 14. Trigger-maintained XOR hash aggregates diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md index df4021475..9b94d7e0d 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Transport.md +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -33,6 +33,6 @@ NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. `ntfTHandle` creates a THandle with `thVersion = VersionNTF 0` — a version that no real protocol supports. This is a placeholder value that gets overwritten during version negotiation. All feature gates check `v >= authBatchCmdsNTFVersion` (v2), so the v0 placeholder disables all optional features. -### 6. Server handshake always sends authPubKey +### 6. Router handshake always sends authPubKey -`ntfServerHandshake` always includes `authPubKey = Just sk` in the server handshake, regardless of the advertised version range. The encoding functions (`encodeAuthEncryptCmds`) then decide whether to actually serialize it based on the max version. This means the key is computed even when it won't be sent. +`ntfServerHandshake` always includes `authPubKey = Just sk` in the router handshake, regardless of the advertised version range. The encoding functions (`encodeAuthEncryptCmds`) then decide whether to actually serialize it based on the max version. This means the key is computed even when it won't be sent. diff --git a/spec/modules/Simplex/Messaging/Notifications/Types.md b/spec/modules/Simplex/Messaging/Notifications/Types.md index 97cc66913..576d9c088 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Types.md +++ b/spec/modules/Simplex/Messaging/Notifications/Types.md @@ -16,4 +16,4 @@ ### 3. NSADelete and NSARotate are deprecated -These `NtfSubNTFAction` values are no longer generated by current code but are retained in the type for processing legacy database records. `NSARotate` is logically "delete + recreate" while `NSADelete` is "delete subscription on NTF server + delete notifier credentials on SMP server". +These `NtfSubNTFAction` values are no longer generated by current code but are retained in the type for processing legacy database records. `NSARotate` is logically "delete + recreate" while `NSADelete` is "delete subscription on NTF router + delete notifier credentials on SMP router". From 1cc4d98dd082ffa0ab9b5b153f8d7d97768c2134 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 17:56:14 +0000 Subject: [PATCH 27/61] terms 2 --- spec/TOPICS.md | 8 ++--- spec/modules/Simplex/FileTransfer/Agent.md | 18 +++++++---- spec/modules/Simplex/FileTransfer/Client.md | 14 ++++---- .../Simplex/FileTransfer/Client/Agent.md | 4 +-- .../Simplex/FileTransfer/Client/Main.md | 8 ++--- spec/modules/Simplex/FileTransfer/Crypto.md | 4 +-- .../Simplex/FileTransfer/Description.md | 6 ++-- spec/modules/Simplex/FileTransfer/Protocol.md | 6 ++-- spec/modules/Simplex/FileTransfer/Server.md | 32 +++++++++---------- .../Simplex/FileTransfer/Server/Env.md | 12 +++---- .../Simplex/FileTransfer/Server/Main.md | 2 +- .../Simplex/FileTransfer/Server/Stats.md | 2 +- .../Simplex/FileTransfer/Server/Store.md | 12 +++---- .../Simplex/FileTransfer/Server/StoreLog.md | 10 +++--- spec/modules/Simplex/FileTransfer/Types.md | 4 +-- spec/modules/Simplex/Messaging/Agent.md | 4 +-- .../Messaging/Agent/NtfSubSupervisor.md | 2 +- .../Simplex/Messaging/Agent/TSessionSubs.md | 2 +- spec/modules/Simplex/Messaging/Client.md | 26 +++++++-------- .../modules/Simplex/Messaging/Client/Agent.md | 4 +-- .../Messaging/Notifications/Protocol.md | 8 ++--- .../Simplex/Messaging/Notifications/Server.md | 6 ++-- .../Messaging/Notifications/Transport.md | 4 +-- spec/modules/Simplex/Messaging/Protocol.md | 4 +-- .../Simplex/Messaging/Protocol/Types.md | 2 +- spec/modules/Simplex/Messaging/Server.md | 10 +++--- spec/modules/Simplex/Messaging/Transport.md | 4 +-- 27 files changed, 111 insertions(+), 107 deletions(-) diff --git a/spec/TOPICS.md b/spec/TOPICS.md index a62e23c29..8ce45e800 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -44,15 +44,15 @@ - **NTF startup resubscription**: `resubscribe` runs as detached `forkIO` (not in `raceAny_` group), uses `mapConcurrently` across SMP routers, each with `subscribeLoop` using 100x database batch multiplier and cursor-based pagination. `ExitCode` exceptions from `exitFailure` on DB error propagate to main thread despite `forkIO`. `getServerNtfSubscriptions` claims subscriptions by batch-updating to `NSPending`. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md). -- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-router chunk creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-router upload (command + file body in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). +- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-router data packet creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-router data packet upload (command + data in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). -- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-router chunk download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, chunk-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with router grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). +- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-router data packet download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, size-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with router grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). - **XFTP handshake state machine**: Three-state session-cached handshake (`No entry` → `HandshakeSent` → `HandshakeAccepted`) per HTTP/2 session. Web clients use `xftp-web-hello` header and challenge-response identity proof; native clients use standard ALPN. SNI presence gates CORS headers, web serving, and SESSION error for unrecognized connections. Key reuse on re-hello preserves existing DH keys. Spans [Server.hs](modules/Simplex/FileTransfer/Server.md) (handshake logic, CORS, web serving), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ALPN selection, cert chain validation), [Transport.hs](modules/Simplex/FileTransfer/Transport.md) (block size, version). -- **XFTP storage lifecycle**: Quota reservation via atomic `stateTVar` before upload → rollback on failure (subtract + delete partial file) → physical file deleted before store cleanup (crash risk: store references missing file) → `RoundedSystemTime 3600` for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between files) → startup storage reconciliation (override stats from live store). Spans [Server.hs](modules/Simplex/FileTransfer/Server.md), [Server/Store.hs](modules/Simplex/FileTransfer/Server/Store.md), [Server/Env.hs](modules/Simplex/FileTransfer/Server/Env.md), [Server/StoreLog.hs](modules/Simplex/FileTransfer/Server/StoreLog.md) (error-resilient replay, compaction). +- **XFTP storage lifecycle**: Quota reservation via atomic `stateTVar` before upload → rollback on failure (subtract + delete partial data packet) → stored data packet deleted before store cleanup (crash risk: store references missing data packet) → `RoundedSystemTime 3600` for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between data packets) → startup storage reconciliation (override stats from live store). Spans [Server.hs](modules/Simplex/FileTransfer/Server.md), [Server/Store.hs](modules/Simplex/FileTransfer/Server/Store.md), [Server/Env.hs](modules/Simplex/FileTransfer/Server/Env.md), [Server/StoreLog.hs](modules/Simplex/FileTransfer/Server/StoreLog.md) (error-resilient replay, compaction). -- **XFTP worker architecture**: Five worker types in three categories: rcv (per-router download + local decryption), snd (local prepare/encrypt + per-router upload), del (per-router delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every chunk operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). +- **XFTP worker architecture**: Five worker types in three categories: rcv (per-router data packet download + local decryption), snd (local prepare/encrypt + per-router data packet upload), del (per-router data packet delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every data packet operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). - **SessionVar protocol client lifecycle**: Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern: `getSessVar` atomically checks TMap → `newProtocolClient` fills TMVar on success/failure → `waitForProtocolClient` reads with timeout. Error caching via `persistErrorInterval` prevents connection storms (failed connections cache the error with expiry; callers receive cached error without reconnecting). `removeSessVar` uses monotonic `sessionVarId` compare-and-swap to prevent stale disconnect callbacks from removing newer clients. SMP has additional complexity: `SMPConnectedClient` wraps client with per-connection proxied relay map, `updateClientService` synchronizes service credentials post-connect, disconnect callback moves subscriptions to pending with session-ID matching. XFTP always uses `NRMBackground` timing regardless of caller request. Spans [Session.md](modules/Simplex/Messaging/Session.md), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (lifecycle, disconnect callbacks, reconnection workers), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop consuming events). diff --git a/spec/modules/Simplex/FileTransfer/Agent.md b/spec/modules/Simplex/FileTransfer/Agent.md index e5f58e996..84238e1d0 100644 --- a/spec/modules/Simplex/FileTransfer/Agent.md +++ b/spec/modules/Simplex/FileTransfer/Agent.md @@ -4,17 +4,21 @@ **Source**: [`FileTransfer/Agent.hs`](../../../../src/Simplex/FileTransfer/Agent.hs) +## Terminology + +The agent splits a **file** into **chunks** determined by the chunking algorithm. Each chunk is stored on an XFTP router as a **data packet** — the router has no concept of files or chunks, only directly addressable data packets. This document uses "chunk" for the agent's internal tracking and "data packet" when referring to what is transferred to/from or stored on routers. + ## Architecture The XFTP agent uses five worker types organized in three categories: | Worker | Key (router) | Purpose | |--------|-------------|---------| -| `xftpRcvWorker` | `Just server` | Download chunks from a specific XFTP router | +| `xftpRcvWorker` | `Just server` | Download data packets from a specific XFTP router | | `xftpRcvLocalWorker` | `Nothing` | Decrypt completed downloads locally | -| `xftpSndPrepareWorker` | `Nothing` | Encrypt files and create chunks on routers | -| `xftpSndWorker` | `Just server` | Upload chunks to a specific XFTP router | -| `xftpDelWorker` | `Just server` | Delete chunks from a specific XFTP router | +| `xftpSndPrepareWorker` | `Nothing` | Encrypt files and create data packets on routers | +| `xftpSndWorker` | `Just server` | Upload data packets to a specific XFTP router | +| `xftpDelWorker` | `Just server` | Delete data packets from a specific XFTP router | Workers are created on-demand via `getAgentWorker` and keyed by router address. The local workers (keyed by `Nothing`) handle CPU-bound operations that don't require network access. @@ -55,7 +59,7 @@ Similarly, `prepareFile` checks `status /= SFSEncrypted` and deletes the partial ### 8. addRecipients recursive batching -During upload, `addRecipients` recursively calls itself if a chunk needs more recipients than `xftpMaxRecipientsPerRequest`. Each iteration sends an FADD command for up to `maxRecipients` new recipients, accumulates the results, and recurses until all recipients are registered. +During upload, `addRecipients` recursively calls itself if a data packet needs more recipients than `xftpMaxRecipientsPerRequest`. Each iteration sends an FADD command for up to `maxRecipients` new recipients, accumulates the results, and recurses until all recipients are registered. ### 9. File description generation cross-product @@ -71,7 +75,7 @@ During upload, `addRecipients` recursively calls itself if a chunk needs more re ### 12. Delete workers skip files older than rcvFilesTTL -`runXFTPDelWorker` uses `rcvFilesTTL` (not a dedicated delete TTL) to filter pending deletions. Files older than this TTL would already be expired on the router, so attempting deletion is pointless. This reuses the receive TTL as a proxy for router-side expiration. +`runXFTPDelWorker` uses `rcvFilesTTL` (not a dedicated delete TTL) to filter pending deletions. Data packets older than this TTL would already be expired on the router, so attempting deletion is pointless. This reuses the receive TTL as a proxy for router-side expiration. ### 13. closeXFTPAgent atomically swaps worker maps @@ -83,4 +87,4 @@ During upload, `addRecipients` recursively calls itself if a chunk needs more re ### 15. Per-router stats tracking -Every chunk download, upload, and delete operation increments per-router statistics (`downloads`, `uploads`, `deletions`, `downloadAttempts`, `uploadAttempts`, `deleteAttempts`, and error variants). Size-based stats (`downloadsSize`, `uploadsSize`) track throughput in kilobytes. +Every data packet download, upload, and delete operation increments per-router statistics (`downloads`, `uploads`, `deletions`, `downloadAttempts`, `uploadAttempts`, `deleteAttempts`, and error variants). Size-based stats (`downloadsSize`, `uploadsSize`) track throughput in kilobytes. diff --git a/spec/modules/Simplex/FileTransfer/Client.md b/spec/modules/Simplex/FileTransfer/Client.md index 27fb50bc3..da659cf49 100644 --- a/spec/modules/Simplex/FileTransfer/Client.md +++ b/spec/modules/Simplex/FileTransfer/Client.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Client -> XFTP client: connection management, handshake, chunk upload/download with forward secrecy. +> XFTP client: connection management, handshake, data packet upload/download with forward secrecy. **Source**: [`FileTransfer/Client.hs`](../../../../src/Simplex/FileTransfer/Client.hs) @@ -18,19 +18,19 @@ ### 3. Ephemeral DH key pair per download -`downloadXFTPChunk` generates a fresh X25519 key pair for each chunk download. The public key is sent with the FGET command; the router responds with its own ephemeral key. The derived shared secret encrypts the file data in transit. This provides forward secrecy — compromising a past DH key doesn't decrypt other downloads. +`downloadXFTPChunk` generates a fresh X25519 key pair for each data packet download. The public key is sent with the FGET command; the router returns its own ephemeral key. The derived shared secret encrypts the data packet in transit. This provides forward secrecy — compromising a past DH key doesn't decrypt other downloads. -### 4. Chunk-size-proportional download timeout +### 4. Size-proportional download timeout -`downloadXFTPChunk` calculates the timeout as `baseTimeout + (sizeInKB * perKbTimeout)`, where `baseTimeout` is the base TCP timeout and `perKbTimeout` is a per-kilobyte timeout from the network config. Larger chunks get proportionally more time. This prevents premature timeouts on large chunks over slow connections. +`downloadXFTPChunk` calculates the timeout as `baseTimeout + (sizeInKB * perKbTimeout)`, where `baseTimeout` is the base TCP timeout and `perKbTimeout` is a per-kilobyte timeout from the network config. Larger data packets get proportionally more time. This prevents premature timeouts on large data packets over slow connections. ### 5. prepareChunkSizes threshold algorithm -`prepareChunkSizes` selects chunk sizes using a 75% threshold: if the remaining payload exceeds 75% of the next larger chunk size, it uses the larger size. Otherwise, it uses the smaller size. `singleChunkSize` returns `Just size` only if the payload fits in a single chunk (used for redirect files which must be single-chunk). +`prepareChunkSizes` selects data packet sizes using a 75% threshold: if the remaining payload exceeds 75% of the next larger size, it uses the larger size. Otherwise, it uses the smaller size. `singleChunkSize` returns `Just size` only if the payload fits in a single data packet (used for redirect files which must be single-packet). -### 6. Upload sends file body after command response +### 6. Upload sends data packet after command block -`uploadXFTPChunk` sends the FPUT command and file body in the same streaming HTTP/2 request: the protocol command block is sent first, followed immediately by the raw file data via `hSendFile`. The router response (`FROk` or error) is received only after both the command and file body have been fully sent. This is a single HTTP/2 round trip, not a two-phase interaction. +`uploadXFTPChunk` sends the FPUT command and data packet body in the same streaming HTTP/2 request: the protocol command block is sent first, followed immediately by the raw encrypted data via `hSendFile`. The command result (`FROk` or error) is received only after both the command and data have been fully sent. This is a single HTTP/2 round trip, not a two-phase interaction. ### 7. Empty corrId as nonce diff --git a/spec/modules/Simplex/FileTransfer/Client/Agent.md b/spec/modules/Simplex/FileTransfer/Client/Agent.md index c03400d90..0a8f17bcf 100644 --- a/spec/modules/Simplex/FileTransfer/Client/Agent.md +++ b/spec/modules/Simplex/FileTransfer/Client/Agent.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Client.Agent -> XFTP client connection management with TMVar-based sharing, async retry, and connection lifecycle. +> XFTP client: router connection management with TMVar-based sharing, async retry, and connection lifecycle. **Source**: [`FileTransfer/Client/Agent.hs`](../../../../../src/Simplex/FileTransfer/Client/Agent.hs) @@ -24,4 +24,4 @@ On permanent error, `newXFTPClient` puts the `Left error` into the `TMVar` (unbl ### 5. closeXFTPServerClient removes from TMap -Closing a router client deletes its entry from the TMap, so the next request will establish a fresh connection. This is called on connection errors during file operations to force reconnection. +Closing a router client deletes its entry from the TMap, so the next request will establish a fresh connection. This is called on connection errors during data packet operations to force reconnection. diff --git a/spec/modules/Simplex/FileTransfer/Client/Main.md b/spec/modules/Simplex/FileTransfer/Client/Main.md index abb9eceb5..a7589c05e 100644 --- a/spec/modules/Simplex/FileTransfer/Client/Main.md +++ b/spec/modules/Simplex/FileTransfer/Client/Main.md @@ -18,9 +18,9 @@ `receive` tracks a `depth` parameter starting at 1. After following one redirect, `depth` becomes 0. A second redirect throws "Redirect chain too long". This prevents infinite redirect loops from malicious file descriptions. -### 4. Parallel chunk uploads with router grouping +### 4. Parallel data packet uploads with router grouping -`uploadFile` groups chunks by router via `groupAllOn`, then uses `pooledForConcurrentlyN 16` to process up to 16 router-groups concurrently. Within each group, chunks are uploaded sequentially (`mapM`). Errors from any chunk are collected and the first one is thrown. +`uploadFile` groups data packets by router via `groupAllOn`, then uses `pooledForConcurrentlyN 16` to process up to 16 router-groups concurrently. Within each group, data packets are uploaded sequentially (`mapM`). Errors from any upload are collected and the first one is thrown. ### 5. Random router selection @@ -36,8 +36,8 @@ ### 8. File description auto-deletion prompt -After successful receive or delete, `removeFD` either auto-deletes the file description (if `--yes` flag) or prompts the user. This prevents accidental reuse of one-time file descriptions — each receive consumes the description by ACKing chunks on the router. +After successful receive or delete, `removeFD` either auto-deletes the file description (if `--yes` flag) or prompts the user. This prevents accidental reuse of one-time file descriptions — each receive consumes the description by ACKing data packets on the router. ### 9. Sender description uses first replica's router -`createSndFileDescription` takes the router from the first replica of each chunk for the sender's `FileChunkReplica`. This reflects the current limitation that each chunk is uploaded to exactly one router — the sender description records that single router. +`createSndFileDescription` takes the router from the first replica of each chunk for the sender's `FileChunkReplica`. This reflects the current limitation that each data packet is uploaded to exactly one router — the sender description records that single router. diff --git a/spec/modules/Simplex/FileTransfer/Crypto.md b/spec/modules/Simplex/FileTransfer/Crypto.md index 1911de60e..a3e625a8e 100644 --- a/spec/modules/Simplex/FileTransfer/Crypto.md +++ b/spec/modules/Simplex/FileTransfer/Crypto.md @@ -12,7 +12,7 @@ ### 2. Fixed-size padding hides actual file size -The encrypted output is padded to `encSize` (the sum of chunk sizes). Since chunk sizes are fixed powers of 2 (64KB, 256KB, 1MB, 4MB), the encrypted file size reveals only which chunk size bucket the file falls into, not the actual size. The encryption streams data with `LC.sbEncryptChunk` in a loop, pads the remaining space, then manually appends the auth tag via `LC.sbAuth`. This manual streaming approach (rather than using the all-at-once `LC.sbEncryptTailTag`) is necessary because encryption is interleaved with file I/O. +The encrypted output is padded to `encSize` (the sum of data packet sizes). Since data packet sizes are fixed powers of 2 (64KB, 256KB, 1MB, 4MB), the encrypted file size reveals only which size bucket the file falls into, not the actual size. The encryption streams data with `LC.sbEncryptChunk` in a loop, pads the remaining space, then manually appends the auth tag via `LC.sbAuth`. This manual streaming approach (rather than using the all-at-once `LC.sbEncryptTailTag`) is necessary because encryption is interleaved with file I/O. ### 3. Dual decrypt paths: single-chunk vs multi-chunk @@ -28,4 +28,4 @@ In the multi-chunk streaming path, if `BA.constEq` detects an auth tag mismatch ### 5. Streaming encryption uses 64KB blocks -`encryptFile` reads plaintext in 65536-byte blocks (`LC.sbEncryptChunk`), regardless of the XFTP chunk size. These are encryption blocks within a single continuous stream — not to be confused with XFTP protocol chunks which are much larger (64KB–4MB). +`encryptFile` reads plaintext in 65536-byte blocks (`LC.sbEncryptChunk`), regardless of the XFTP data packet size. These are encryption blocks within a single continuous stream — not to be confused with XFTP data packets which are much larger (64KB–4MB). diff --git a/spec/modules/Simplex/FileTransfer/Description.md b/spec/modules/Simplex/FileTransfer/Description.md index 0edd0bee8..835ca081a 100644 --- a/spec/modules/Simplex/FileTransfer/Description.md +++ b/spec/modules/Simplex/FileTransfer/Description.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Description -> File description: YAML encoding/decoding, validation, URI format, and replica optimization. +> File description: YAML encoding/decoding, validation, URI format, and replica optimization. A file description maps a file's chunks to data packets stored on XFTP routers — each chunk corresponds to one data packet, and each data packet may have multiple replicas on different routers. **Source**: [`FileTransfer/Description.hs`](../../../../src/Simplex/FileTransfer/Description.hs) @@ -24,7 +24,7 @@ The top-level `FileDescription` has a `chunkSize` field. Individual chunk replic ### 4. YAML encoding groups replicas by router -`groupReplicasByServer` groups all chunk replicas by their router, producing `FileServerReplica` records. This is the serialization format — replicas are organized by router, not by chunk. The parser (`foldReplicasToChunks`) reverses this grouping back to per-chunk replica lists. +`groupReplicasByServer` groups all data packet replicas by their router, producing `FileServerReplica` records. This is the serialization format — replicas are organized by router, not by chunk. The parser (`foldReplicasToChunks`) reverses this grouping back to per-chunk replica lists. ### 5. FileDescriptionURI uses query-string encoding @@ -40,4 +40,4 @@ Two limits exist: `maxFileSize = 1GB` (soft limit, checked by CLI client) and `m ### 8. Redirect file descriptions -A `FileDescription` can contain a `redirect` field pointing to another file's metadata (`RedirectFileInfo` with size and digest). The outer description downloads an encrypted YAML file that, once decrypted, yields the actual `FileDescription` for the real file. This adds one level of indirection for privacy — the relay routers hosting the redirect don't know the actual file's routers. +A `FileDescription` can contain a `redirect` field pointing to another file's metadata (`RedirectFileInfo` with size and digest). The outer description downloads an encrypted YAML data packet that, once decrypted, yields the actual `FileDescription` for the real file. This adds one level of indirection for privacy — the routers hosting the redirect data packet don't know the actual file's routers. diff --git a/spec/modules/Simplex/FileTransfer/Protocol.md b/spec/modules/Simplex/FileTransfer/Protocol.md index 4bbcb8726..8b99e6849 100644 --- a/spec/modules/Simplex/FileTransfer/Protocol.md +++ b/spec/modules/Simplex/FileTransfer/Protocol.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Protocol -> XFTP protocol types, commands, responses, and credential verification. +> XFTP protocol types, commands, command results, and credential verification. **Source**: [`FileTransfer/Protocol.hs`](../../../../src/Simplex/FileTransfer/Protocol.hs) @@ -15,9 +15,9 @@ This asymmetry means FNEW and PING bypass the standard entity-lookup path entirely — they are handled as separate `XFTPRequest` constructors (`XFTPReqNew`, `XFTPReqPing`). -### 2. BLOCKED response downgraded to AUTH for old clients +### 2. BLOCKED result downgraded to AUTH for old clients -`encodeProtocol` checks the protocol version: if `v < blockedFilesXFTPVersion`, a `BLOCKED` response is encoded as `AUTH` instead. This prevents old clients that don't understand `BLOCKED` from receiving an unknown error type. The blocking information is silently lost for these clients. +`encodeProtocol` checks the protocol version: if `v < blockedFilesXFTPVersion`, a `BLOCKED` result is encoded as `AUTH` instead. This prevents old clients that don't understand `BLOCKED` from receiving an unknown error type. The blocking information is silently lost for these clients. ### 3. Single-transmission batch enforcement diff --git a/spec/modules/Simplex/FileTransfer/Server.md b/spec/modules/Simplex/FileTransfer/Server.md index 99e17a427..cb64adad2 100644 --- a/spec/modules/Simplex/FileTransfer/Server.md +++ b/spec/modules/Simplex/FileTransfer/Server.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Server -> XFTP router: HTTP/2 request handling, handshake state machine, file operations, and statistics. +> XFTP router: HTTP/2 request handling, handshake state machine, data packet operations, and statistics. **Source**: [`FileTransfer/Server.hs`](../../../../src/Simplex/FileTransfer/Server.hs) @@ -10,8 +10,8 @@ The XFTP router runs several concurrent threads via `raceAny_`: | Thread | Purpose | |--------|---------| -| `runServer` | HTTP/2 router accepting file transfer requests | -| `expireFiles` | Periodic file expiration with throttling | +| `runServer` | HTTP/2 router accepting data packet transfer requests | +| `expireFiles` | Periodic data packet expiration with throttling | | `logServerStats` | Periodic stats flush to CSV | | `savePrometheusMetrics` | Periodic Prometheus metrics dump | | `runCPServer` | Control port for admin commands | @@ -29,15 +29,15 @@ Web clients can re-send hello (`xftp-web-hello` header) even in `HandshakeSent` ### 2. Web identity proof via challenge-response -When a web client sends a hello with a non-empty body, the router parses an `XFTPClientHello` containing a `webChallenge`. The router signs `challenge <> sessionId` with its long-term key and includes the signature in the handshake response. This proves router identity to web clients that cannot verify TLS certificates directly. +When a web client sends a hello with a non-empty body, the router parses an `XFTPClientHello` containing a `webChallenge`. The router signs `challenge <> sessionId` with its long-term key and includes the signature in the handshake result. This proves router identity to web clients that cannot verify TLS certificates directly. ### 3. skipCommitted drains request body on re-upload -If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is `Just`), it cannot simply ignore the request body — the HTTP/2 client would block waiting for the router to consume it. Instead, `skipCommitted` reads and discards the entire body in `fileBlockSize` increments, returning `FROk` when complete. This makes FPUT idempotent from the client's perspective. +If `receiveServerFile` detects the data packet is already uploaded (`filePath` TVar is `Just`), it cannot simply ignore the request body — the HTTP/2 client would block waiting for the router to consume it. Instead, `skipCommitted` reads and discards the entire body in `fileBlockSize` increments, returning `FROk` when complete. This makes FPUT idempotent from the client's perspective. ### 4. Atomic quota reservation with rollback -`receiveServerFile` uses `stateTVar` to atomically check and reserve storage quota before receiving the file. If the upload fails (timeout, size mismatch, IO error), the reserved size is subtracted from `usedStorage` and the partial file is deleted on the router. This prevents failed uploads from permanently consuming quota. +`receiveServerFile` uses `stateTVar` to atomically check and reserve storage quota before receiving the data packet. If the upload fails (timeout, size mismatch, IO error), the reserved size is subtracted from `usedStorage` and the partial data packet is deleted on the router. This prevents failed uploads from permanently consuming quota. ### 5. retryAdd generates new IDs on collision @@ -45,7 +45,7 @@ If `receiveServerFile` detects the file is already uploaded (`filePath` TVar is ### 6. Timing attack mitigation on entity lookup -`verifyXFTPTransmission` calls `dummyVerifyCmd` (imported from SMP router) when a file entity is not found. This equalizes response timing to prevent attackers from distinguishing "entity doesn't exist" from "signature invalid" based on latency. +`verifyXFTPTransmission` calls `dummyVerifyCmd` (imported from SMP router) when a data packet entity is not found. This equalizes result timing to prevent attackers from distinguishing "entity doesn't exist" from "signature invalid" based on latency. ### 7. BLOCKED vs EntityOff distinction @@ -56,30 +56,30 @@ When `verifyXFTPTransmission` reads `fileStatus`: `EntityOff` is treated identically to missing entities for information-hiding purposes. -### 8. blockServerFile deletes the physical file +### 8. blockServerFile deletes the stored data packet -Despite the name suggesting it only marks a file as blocked, `blockServerFile` also deletes the physical file from disk via `deleteOrBlockServerFile_`. The `deleted = True` parameter to `blockFile` in the store adjusts `usedStorage`. A blocked file returns `BLOCKED` errors on access but has no data on disk. +Despite the name suggesting it only marks a data packet as blocked, `blockServerFile` also deletes the stored data packet from disk via `deleteOrBlockServerFile_`. The `deleted = True` parameter to `blockFile` in the store adjusts `usedStorage`. A blocked data packet returns `BLOCKED` errors on access but has no data on disk. ### 9. Stats restore overrides counts from live store -`restoreServerStats` loads stats from the backup file but overrides `_filesCount` and `_filesSize` with values computed from the live file store (TMap size and `usedStorage` TVar). If the backup values differ, warnings are logged. This handles cases where files were expired or deleted while the router was down. +`restoreServerStats` loads stats from the backup file but overrides `_filesCount` and `_filesSize` with values computed from the live file store (TMap size and `usedStorage` TVar). If the backup values differ, warnings are logged. This handles cases where data packets were expired or deleted while the router was down. -### 10. File expiration with configurable throttling +### 10. Data packet expiration with configurable throttling -`expireServerFiles` accepts an optional `itemDelay` (100ms when called from the periodic thread, `Nothing` at router startup). Between each file check, `threadDelay itemDelay` prevents expiration from monopolizing IO. At startup, files are expired without delay to clean up quickly. +`expireServerFiles` accepts an optional `itemDelay` (100ms when called from the periodic thread, `Nothing` at router startup). Between each data packet check, `threadDelay itemDelay` prevents expiration from monopolizing IO. At startup, data packets are expired without delay to clean up quickly. ### 11. Stats log aligns to wall-clock midnight `logServerStats` computes an `initialDelay` to align the first stats flush to `logStatsStartTime` (default 0 = midnight UTC). If the target time already passed today, it adds 86400 seconds for the next day. Subsequent flushes use exact `logInterval` cadence. -### 12. Physical file deleted before store cleanup +### 12. Stored data packet deleted before store cleanup -`deleteOrBlockServerFile_` removes the physical file first, then runs the STM store action. If the process crashes between these two operations, the store will reference a file that no longer exists on disk. The next access would return `AUTH` (file not found on disk), and eventual expiration would clean the store entry. +`deleteOrBlockServerFile_` removes the stored data packet first, then runs the STM store action. If the process crashes between these two operations, the store will reference a data packet that no longer exists on disk. The next access would return `AUTH` (data packet not found on disk), and eventual expiration would clean the store entry. ### 13. SNI-dependent CORS and web serving CORS headers require both `sniUsed = True` and `addCORSHeaders = True` in the transport config. Static web page serving is enabled when `sniUsed = True`. Non-SNI connections (direct TLS without hostname) skip both CORS and web serving. This separates the web-facing and protocol-facing behaviors of the same router port. -### 14. Control port file operations use recipient index +### 14. Control port data packet operations use recipient index -`CPDelete` and `CPBlock` commands look up files via `getFile fs SFRecipient fileId`, meaning the control port takes a recipient ID, not a sender ID. This is the ID visible to recipients and contained in file descriptions. +`CPDelete` and `CPBlock` commands look up data packets via `getFile fs SFRecipient fileId`, meaning the control port takes a recipient ID, not a sender ID. This is the ID visible to recipients and contained in data packet descriptions. diff --git a/spec/modules/Simplex/FileTransfer/Server/Env.md b/spec/modules/Simplex/FileTransfer/Server/Env.md index 0b3bba3ff..161bcd487 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Env.md +++ b/spec/modules/Simplex/FileTransfer/Server/Env.md @@ -8,17 +8,17 @@ ### 1. Startup storage accounting with quota warning -`newXFTPServerEnv` computes `usedStorage` by summing file sizes from the in-memory store at startup. If the computed usage exceeds the configured `fileSizeQuota`, a warning is logged but the router still starts. This allows the router to come up even if it's over quota (e.g., after a quota reduction), relying on expiration to reclaim space. +`newXFTPServerEnv` computes `usedStorage` by summing data packet sizes from the in-memory store at startup. If the computed usage exceeds the configured `fileSizeQuota`, a warning is logged but the router still starts. This allows the router to come up even if it's over quota (e.g., after a quota reduction), relying on expiration to reclaim space. -### 2. XFTPRequest ADT separates new files from commands +### 2. XFTPRequest ADT separates new data packets from commands `XFTPRequest` has three constructors: -- `XFTPReqNew`: file creation (carries `FileInfo`, recipient keys, optional basic auth) -- `XFTPReqCmd`: command on an existing file (carries file ID, `FileRec`, and the command) +- `XFTPReqNew`: data packet creation (carries `FileInfo`, recipient keys, optional basic auth) +- `XFTPReqCmd`: command on an existing data packet (carries file ID, `FileRec`, and the command) - `XFTPReqPing`: health check -This separation occurs after credential verification in `Server.hs`. `XFTPReqNew` bypasses entity lookup entirely since the file doesn't exist yet. +This separation occurs after credential verification in `Server.hs`. `XFTPReqNew` bypasses entity lookup entirely since the data packet doesn't exist yet. ### 3. fileTimeout for upload deadline -`fileTimeout` in `XFTPServerConfig` sets the maximum time allowed for a single file upload (FPUT). The router wraps the receive operation in `timeout fileTimeout`. Default is 5 minutes (for 4MB chunks). This prevents slow or stalled uploads from holding router resources indefinitely. +`fileTimeout` in `XFTPServerConfig` sets the maximum time allowed for a single data packet upload (FPUT). The router wraps the receive operation in `timeout fileTimeout`. Default is 5 minutes (for 4MB chunks). This prevents slow or stalled uploads from holding router resources indefinitely. diff --git a/spec/modules/Simplex/FileTransfer/Server/Main.md b/spec/modules/Simplex/FileTransfer/Server/Main.md index c892e6bf5..2a5c78288 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Main.md +++ b/spec/modules/Simplex/FileTransfer/Server/Main.md @@ -10,7 +10,7 @@ | Constant | Value | Purpose | |----------|-------|---------| -| `fileIdSize` | 16 bytes | Random file/recipient ID length | +| `fileIdSize` | 16 bytes | Random data packet/recipient ID length | | `fileTimeout` | 5 minutes | Maximum upload duration per chunk | | `logStatsInterval` | 86400s (daily) | Stats CSV flush interval | | `logStatsStartTime` | 0 (midnight UTC) | First stats flush time-of-day | diff --git a/spec/modules/Simplex/FileTransfer/Server/Stats.md b/spec/modules/Simplex/FileTransfer/Server/Stats.md index 7eb2ad47b..30b04c496 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Stats.md +++ b/spec/modules/Simplex/FileTransfer/Server/Stats.md @@ -16,4 +16,4 @@ The `strP` parser uses `opt` for newer fields, defaulting missing fields to 0. T ### 3. PeriodStats for download tracking -`filesDownloaded` uses `PeriodStats` (not a simple `IORef Int`) to track unique file downloads over time periods (day/week/month). This enables the CSV stats log to report distinct files downloaded per period, not just total download count. +`filesDownloaded` uses `PeriodStats` (not a simple `IORef Int`) to track unique data packet downloads over time periods (day/week/month). This enables the CSV stats log to report distinct data packets downloaded per period, not just total download count. diff --git a/spec/modules/Simplex/FileTransfer/Server/Store.md b/spec/modules/Simplex/FileTransfer/Server/Store.md index f2ded441e..bbcc419f6 100644 --- a/spec/modules/Simplex/FileTransfer/Server/Store.md +++ b/spec/modules/Simplex/FileTransfer/Server/Store.md @@ -16,24 +16,24 @@ The file store maintains two indices: `files :: TMap SenderId FileRec` (by sende ### 3. Storage accounting on upload completion -`setFilePath` adds the file size to `usedStorage` and records the file path in the `filePath` TVar. However, during normal FPUT handling, `Server.hs` does NOT call `setFilePath` — it directly writes `filePath` via `writeTVar`. The quota reservation in `Server.hs` (`stateTVar` on `usedStorage`) is the sole `usedStorage` increment during upload. `setFilePath` IS called during store log replay (`StoreLog.hs`), where it increments `usedStorage`; `newXFTPServerEnv` then overwrites with the correct value computed from the live store. +`setFilePath` adds the data packet size to `usedStorage` and records the file path in the `filePath` TVar. However, during normal FPUT handling, `Server.hs` does NOT call `setFilePath` — it directly writes `filePath` via `writeTVar`. The quota reservation in `Server.hs` (`stateTVar` on `usedStorage`) is the sole `usedStorage` increment during upload. `setFilePath` IS called during store log replay (`StoreLog.hs`), where it increments `usedStorage`; `newXFTPServerEnv` then overwrites with the correct value computed from the live store. ### 4. deleteFile removes all recipients atomically -`deleteFile` atomically removes the sender entry from `files`, all recipient entries from the global `recipients` TMap, and unconditionally subtracts the file size from `usedStorage` (regardless of whether the file was actually uploaded). The entire operation runs in a single STM transaction. +`deleteFile` atomically removes the sender entry from `files`, all recipient entries from the global `recipients` TMap, and unconditionally subtracts the data packet size from `usedStorage` (regardless of whether the data packet was actually uploaded). The entire operation runs in a single STM transaction. ### 5. RoundedSystemTime for privacy-preserving expiration -File timestamps use `RoundedFileTime` which is `RoundedSystemTime 3600` — system time rounded to 1-hour precision. This means files created within the same hour have identical timestamps. An observer with access to the store cannot determine exact file creation times, only the hour. +Data packet timestamps use `RoundedFileTime` which is `RoundedSystemTime 3600` — system time rounded to 1-hour precision. This means data packets created within the same hour have identical timestamps. An observer with access to the store cannot determine exact data packet creation times, only the hour. ### 6. expiredFilePath returns path only if expired -`expiredFilePath` returns `STM (Maybe (Maybe FilePath))`. The outer `Maybe` is `Nothing` when the file doesn't exist or isn't expired; the inner `Maybe` is the file path (present only if the file was uploaded). The expiration check adds `fileTimePrecision` (one hour) to the creation timestamp before comparing, providing a grace period. The caller uses the inner path to decide whether to also delete the physical file. +`expiredFilePath` returns `STM (Maybe (Maybe FilePath))`. The outer `Maybe` is `Nothing` when the data packet doesn't exist or isn't expired; the inner `Maybe` is the file path (present only if the data packet was uploaded). The expiration check adds `fileTimePrecision` (one hour) to the creation timestamp before comparing, providing a grace period. The caller uses the inner path to decide whether to also delete the stored data packet. ### 7. ackFile removes single recipient -`ackFile` removes a specific recipient from both the global `recipients` TMap and the per-file `recipientIds` Set. Unlike `deleteFile` which removes the entire file, `ackFile` only removes one recipient's access. The file and other recipients remain intact. +`ackFile` removes a specific recipient from both the global `recipients` TMap and the per-file `recipientIds` Set. Unlike `deleteFile` which removes the entire data packet, `ackFile` only removes one recipient's access. The data packet and other recipients remain intact. ### 8. blockFile conditional storage adjustment -`blockFile` takes a `deleted :: Bool` parameter. When `True` (file blocked with physical deletion), it subtracts the file size from `usedStorage`. When `False` (block without deletion), storage is unchanged. This allows blocking without physical deletion for audit purposes. Currently, both the router's `blockServerFile` and the store log replay path pass `True`. +`blockFile` takes a `deleted :: Bool` parameter. When `True` (data packet blocked with physical deletion), it subtracts the data packet size from `usedStorage`. When `False` (block without deletion), storage is unchanged. This allows blocking without physical deletion for audit purposes. Currently, both the router's `blockServerFile` and the store log replay path pass `True`. diff --git a/spec/modules/Simplex/FileTransfer/Server/StoreLog.md b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md index 6549c3666..5514cbd27 100644 --- a/spec/modules/Simplex/FileTransfer/Server/StoreLog.md +++ b/spec/modules/Simplex/FileTransfer/Server/StoreLog.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Server.StoreLog -> Append-only store log for XFTP router file operations with error-resilient replay and compaction. +> Append-only store log for XFTP router data packet operations with error-resilient replay and compaction. **Source**: [`FileTransfer/Server/StoreLog.hs`](../../../../../src/Simplex/FileTransfer/Server/StoreLog.hs) @@ -24,10 +24,10 @@ ### 5. Log entry types track operation lifecycle -Six log entry types capture the complete file lifecycle: -- `AddFile`: file creation with sender ID, file info, timestamp, and status +Six log entry types capture the complete data packet lifecycle: +- `AddFile`: data packet creation with sender ID, file info, timestamp, and status - `AddRecipients`: recipient registration (batched as `NonEmpty FileRecipient`) with sender ID association - `PutFile`: upload completion with file path -- `DeleteFile`: file deletion by sender ID +- `DeleteFile`: data packet deletion by sender ID - `AckFile`: single recipient acknowledgment -- `BlockFile`: file blocking with blocking info +- `BlockFile`: data packet blocking with blocking info diff --git a/spec/modules/Simplex/FileTransfer/Types.md b/spec/modules/Simplex/FileTransfer/Types.md index 14abc7b21..0cd889dcf 100644 --- a/spec/modules/Simplex/FileTransfer/Types.md +++ b/spec/modules/Simplex/FileTransfer/Types.md @@ -1,6 +1,6 @@ # Simplex.FileTransfer.Types -> Agent-side file transfer types: receive/send file records, status state machines, chunk/replica structures. +> Agent-side file transfer types: receive/send file records, status state machines, and chunk/replica structures. Chunks are the agent's view of file pieces; each chunk maps to a data packet on an XFTP router. **Source**: [`FileTransfer/Types.hs`](../../../../src/Simplex/FileTransfer/Types.hs) @@ -24,4 +24,4 @@ ### 5. authTagSize = 16 bytes -`authTagSize` is defined as `fromIntegral C.authTagSize` (16 bytes). This is the AES-GCM authentication tag appended to the encrypted file stream. It is included in the payload size calculation (`payloadSize = fileSize' + fileSizeLen + authTagSize`), which is then passed to `prepareChunkSizes` to determine chunk allocation. +`authTagSize` is defined as `fromIntegral C.authTagSize` (16 bytes). This is the AES-GCM authentication tag appended to the encrypted file stream. It is included in the payload size calculation (`payloadSize = fileSize' + fileSizeLen + authTagSize`), which is then passed to `prepareChunkSizes` to determine data packet allocation. diff --git a/spec/modules/Simplex/Messaging/Agent.md b/spec/modules/Simplex/Messaging/Agent.md index e2cac0638..1b8769416 100644 --- a/spec/modules/Simplex/Messaging/Agent.md +++ b/spec/modules/Simplex/Messaging/Agent.md @@ -38,9 +38,9 @@ The subscriber thread reads batches from `msgQ` (filled by SMP protocol clients) **Batch UP notification accumulation.** Successful subscription confirmations (`processSubOk`) append to a shared `upConnIds` TVar across the batch. A single `UP` event is emitted after all transmissions are processed, not per-transmission. Similarly, `serviceRQs` accumulates service-associated receive queues for batch processing via `processRcvServiceAssocs`. -**Double validation for subscription results.** `isPendingSub` checks two conditions atomically: the queue must be in the pending map AND the client session must still be active (`activeClientSession`). If either fails, the result is counted as ignored (statistics only). This handles the race where a subscription response arrives after reconnection. +**Double validation for subscription results.** `isPendingSub` checks two conditions atomically: the queue must be in the pending map AND the client session must still be active (`activeClientSession`). If either fails, the result is counted as ignored (statistics only). This handles the race where a subscription result arrives after reconnection. -**SUB response piggybacking MSG.** When a SUB response arrives as `Right msg@SMP.MSG {}`, the connection is marked UP (via `processSubOk`) AND the MSG is processed. The UP notification happens even if the MSG processing fails — the connection is up regardless. +**SUB result piggybacking MSG.** When a SUB result arrives as `Right msg@SMP.MSG {}`, the connection is marked UP (via `processSubOk`) AND the MSG is processed. The UP notification happens even if the MSG processing fails — the connection is up regardless. **subQ overflow to pendingMsgs.** `processSMP` writes events to `subQ` (bounded TBQueue) but when full, events go into a `pendingMsgs` TVar. After processing, pending messages are drained in reverse order (LIFO). This prevents the message processing thread from blocking on a full queue, which would stall the entire SMP client. diff --git a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md index ac591c192..ae4c803e3 100644 --- a/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md +++ b/spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md @@ -48,7 +48,7 @@ This is the mechanism for time-scheduled subscription health checks. When the notification router returns `AUTH` for a subscription check, the subscription is not simply marked as failed — it is fully recreated from scratch by resetting to `NSASMP NSASmpKey` state. This handles the case where the notification router has lost its subscription state (restart, data loss). The SMP worker is kicked to re-establish notifier credentials. -Successful check responses with statuses not in `subscribeNtfStatuses` also trigger recreation via `recreateNtfSub`. +Successful check results with statuses not in `subscribeNtfStatuses` also trigger recreation via `recreateNtfSub`. ### 5. deleteToken two-phase with restart survival diff --git a/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md b/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md index 0274de59d..68337208c 100644 --- a/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md +++ b/spec/modules/Simplex/Messaging/Agent/TSessionSubs.md @@ -16,7 +16,7 @@ Service subscriptions (aggregate, router-managed) and queue subscriptions (indiv The central invariant: a subscription is only active if it was confirmed on the *current* TLS session. Every function that promotes subscriptions to active (`addActiveSub'`, `batchAddActiveSubs`, `setActiveServiceSub`) checks `Just sessId == sessId'` (stored session ID). On mismatch, the subscription goes to pending instead — silently, with no error. -This means subscription RPCs that succeed but return after a reconnect are safely caught: the response carries the old session ID, which won't match the new one stored by `setSessionId`. +This means subscription RPCs that succeed but return after a reconnect are safely caught: the result carries the old session ID, which won't match the new one stored by `setSessionId`. ## setSessionId — silent demotion on reconnect diff --git a/spec/modules/Simplex/Messaging/Client.md b/spec/modules/Simplex/Messaging/Client.md index 35fee9226..f23d5a005 100644 --- a/spec/modules/Simplex/Messaging/Client.md +++ b/spec/modules/Simplex/Messaging/Client.md @@ -8,37 +8,37 @@ ## Overview -This module implements the client side of the `Protocol` typeclass — connecting to SMP routers, sending commands, receiving responses, and managing connection lifecycle. It is generic over `Protocol v err msg`, instantiated for SMP as `SMPClient` (= `ProtocolClient SMPVersion ErrorType BrokerMsg`). The SMP proxy protocol (PRXY/PFWD/RFWD) is also implemented here. +This module implements the client side of the `Protocol` typeclass — connecting to SMP routers, sending commands, receiving command results, and managing connection lifecycle. It is generic over `Protocol v err msg`, instantiated for SMP as `SMPClient` (= `ProtocolClient SMPVersion ErrorType BrokerMsg`). The SMP proxy protocol (PRXY/PFWD/RFWD) is also implemented here. ## Four concurrent threads — teardown semantics `getProtocolClient` launches four threads via `raceAny_`: - `send`: reads from `sndQ` (TBQueue) and writes to TLS - `receive`: reads from TLS and writes to `rcvQ` (TBQueue), updates `lastReceived` -- `process`: reads from `rcvQ` and dispatches to response vars or `msgQ` +- `process`: reads from `rcvQ` and dispatches to result vars or `msgQ` - `monitor`: periodic ping loop (only when `smpPingInterval > 0`) When ANY thread exits (normally or exceptionally), `raceAny_` cancels all others. `E.finally` ensures the `disconnected` callback always fires. Implication: a single stuck thread (e.g., TLS read blocked on a half-open connection) keeps the entire client alive until `monitor` drops it. There is no per-thread health check — liveness depends entirely on the monitor's timeout logic. ## Request lifecycle and leak risk -`mkRequest` inserts a `Request` into `sentCommands` TMap BEFORE the transmission is written to TLS. If the TLS write fails silently or the connection drops before the response, the entry remains in `sentCommands` until the monitor's timeout counter exceeds `maxCnt` and drops the entire client. There is no per-request cleanup on send failure — individual request entries are only removed by `processMsg` (on response) or by `getResponse` timeout (which sets `pending = False` but doesn't remove the entry). +`mkRequest` inserts a `Request` into `sentCommands` TMap BEFORE the transmission is written to TLS. If the TLS write fails silently or the connection drops before the result arrives, the entry remains in `sentCommands` until the monitor's timeout counter exceeds `maxCnt` and drops the entire client. There is no per-request cleanup on send failure — individual request entries are only removed by `processMsg` (on result) or by `getResponse` timeout (which sets `pending = False` but doesn't remove the entry). ## getResponse — pending flag race contract -This is the core concurrency contract between timeout and response processing: +This is the core concurrency contract between timeout and result processing: 1. `getResponse` waits with `timeout` for `takeTMVar responseVar` 2. Regardless of result, atomically sets `pending = False` and tries `tryTakeTMVar` again (see comment on `getResponse`) -3. In `processMsg`, when a response arrives for a request where `pending` is already `False` (timeout won), `wasPending` is `False` and the response is forwarded to `msgQ` as `STResponse` rather than discarded +3. In `processMsg`, when a result arrives for a request where `pending` is already `False` (timeout won), `wasPending` is `False` and the result is forwarded to `msgQ` as `STResponse` rather than discarded -The double-check pattern (`swapTVar pending False` + `tryTakeTMVar`) handles the race window where a response arrives between timeout firing and `pending` being set to `False`. Without this, responses arriving in that gap would be silently lost. +The double-check pattern (`swapTVar pending False` + `tryTakeTMVar`) handles the race window where a result arrives between timeout firing and `pending` being set to `False`. Without this, results arriving in that gap would be silently lost. -`timeoutErrorCount` is reset to 0 in three places: in `getResponse` when a response arrives, in `receive` on every TLS read, and the monitor uses this count to decide when to drop the connection. +`timeoutErrorCount` is reset to 0 in three places: in `getResponse` when a result arrives, in `receive` on every TLS read, and the monitor uses this count to decide when to drop the connection. -## processMsg — router events vs expired responses +## processMsg — router events vs expired results -When `corrId` is empty, the message is an `STEvent` (router-initiated). When non-empty and the request was already expired (`wasPending` is `False`), the response becomes `STResponse` — not discarded, but forwarded to `msgQ` with the original command context. Entity ID mismatch is `STUnexpectedError`. +When `corrId` is empty, the message is an `STEvent` (router-initiated). When non-empty and the request was already expired (`wasPending` is `False`), the result becomes `STResponse` — not discarded, but forwarded to `msgQ` with the original command context. Entity ID mismatch is `STUnexpectedError`. ## nonBlockingWriteTBQueue — fork on full @@ -46,13 +46,13 @@ If `tryWriteTBQueue` returns `False`, a new thread is forked for the blocking wr ## Batch commands do not expire -See comment on `sendBatch`. Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands use `Just r` and the send thread checks `pending` after dequeue. The coupling: if the router stops responding, batched commands can block the send queue indefinitely since they have no timeout-based expiry. +See comment on `sendBatch`. Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands use `Just r` and the send thread checks `pending` after dequeue. The coupling: if the router stops returning results, batched commands can block the send queue indefinitely since they have no timeout-based expiry. ## monitor — quasi-periodic adaptive ping The ping loop sleeps for `smpPingInterval`, then checks elapsed time since `lastReceived`. If significant time remains in the interval (> 1 second), it re-sleeps for just the remaining time rather than sending a ping. This means ping frequency adapts to actual receive activity — frequent receives suppress pings. -Pings are only sent when `sendPings` is `True`, set by `enablePings` (called from `subscribeSMPQueue`, `subscribeSMPQueues`, `subscribeSMPQueueNotifications`, `subscribeSMPQueuesNtfs`, `subscribeService`). The client drops the connection when `maxCnt` commands have timed out in sequence AND at least `recoverWindow` (15 minutes) has passed since the last received response. +Pings are only sent when `sendPings` is `True`, set by `enablePings` (called from `subscribeSMPQueue`, `subscribeSMPQueues`, `subscribeSMPQueueNotifications`, `subscribeSMPQueuesNtfs`, `subscribeService`). The client drops the connection when `maxCnt` commands have timed out in sequence AND at least `recoverWindow` (15 minutes) has passed since the last received result. ## clientCorrId — dual-purpose random values @@ -68,7 +68,7 @@ See comment above `proxySMPCommand` for the 9 error scenarios (0-9) mapping each ## forwardSMPTransmission — proxy-side forwarding -Used by the proxy router to forward `RFWD` to the destination relay. Uses `cbEncryptNoPad`/`cbDecryptNoPad` (no padding) with the session secret from the proxy-relay connection. Response nonce is `reverseNonce` of the request nonce. +Used by the proxy router to forward `RFWD` to the destination relay. Uses `cbEncryptNoPad`/`cbDecryptNoPad` (no padding) with the session secret from the proxy-relay connection. Result nonce is `reverseNonce` of the request nonce. ## authTransmission — dual auth with service signature @@ -82,4 +82,4 @@ The service signature is only added when the entity authenticator is non-empty. ## writeSMPMessage — router-side event injection -`writeSMPMessage` writes directly to `msgQ` as `STEvent`, bypassing the entire command/response pipeline. This is used by the router to inject MSG events into the subscription response path. +`writeSMPMessage` writes directly to `msgQ` as `STEvent`, bypassing the entire command/result pipeline. This is used by the router to inject MSG events into the subscription result path. diff --git a/spec/modules/Simplex/Messaging/Client/Agent.md b/spec/modules/Simplex/Messaging/Client/Agent.md index 30fbe2ac2..7c62dce82 100644 --- a/spec/modules/Simplex/Messaging/Client/Agent.md +++ b/spec/modules/Simplex/Messaging/Client/Agent.md @@ -45,9 +45,9 @@ When `connectClient` calls `newSMPClient` and it fails, the error is stored with Both `smpSubscribeQueues` and `smpSubscribeService` validate `activeClientSession` AFTER the subscription RPC completes, before committing results to state. If the session changed during the RPC (client reconnected), results are discarded and reconnection is triggered. This is optimistic execution with post-hoc validation — the RPC may succeed but its results are thrown away if the session is stale. -## groupSub — subscription response classification +## groupSub — subscription result classification -Each queue response is classified by a `foldr` over the (subs, responses) zip: +Each queue result is classified by a `foldr` over the (subs, results) zip: - **Success with matching serviceId**: counted as service-subscribed (`sQs` list) - **Success without matching serviceId**: counted as queue-only (`qOks` list with SessionId and key) diff --git a/spec/modules/Simplex/Messaging/Notifications/Protocol.md b/spec/modules/Simplex/Messaging/Notifications/Protocol.md index fb718fd80..6347aef11 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Protocol.md +++ b/spec/modules/Simplex/Messaging/Notifications/Protocol.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Notifications.Protocol -> NTF protocol entities, commands, responses, and wire encoding for the notification system. +> NTF protocol entities, commands, command results, and wire encoding for the notification system. **Source**: [`Notifications/Protocol.hs`](../../../../../src/Simplex/Messaging/Notifications/Protocol.hs) @@ -16,7 +16,7 @@ | PING | No | Must be empty | | All others | Yes | Must be present | -For responses, the rule inverts: `NRTknId`, `NRSubId`, and `NRPong` must NOT have entity IDs (they are returned before/without entity context), while `NRErr` optionally has one (errors can occur with or without entity context). +For command results, the rule inverts: `NRTknId`, `NRSubId`, and `NRPong` must NOT have entity IDs (they are returned before/without entity context), while `NRErr` optionally has one (errors can occur with or without entity context). ### 2. PNMessageData semicolon separator @@ -24,7 +24,7 @@ For responses, the rule inverts: `NRTknId`, `NRSubId`, and `NRPong` must NOT hav ### 3. NTInvalid reason is version-gated -When encoding `NRTkn` responses, the `NTInvalid` reason is only included if the negotiated protocol version is >= `invalidReasonNTFVersion` (v3). Older clients receive `NTInvalid Nothing`. This prevents parse failures on clients that don't understand the reason field. +When encoding `NRTkn` results, the `NTInvalid` reason is only included if the negotiated protocol version is >= `invalidReasonNTFVersion` (v3). Older clients receive `NTInvalid Nothing`. This prevents parse failures on clients that don't understand the reason field. ### 4. subscribeNtfStatuses migration invariant @@ -46,7 +46,7 @@ Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL), which Both `smpP` and `strP` for `SMPQueueNtf` apply `updateSMPServerHosts` to the parsed SMP server. This normalizes router host addresses on deserialization, ensuring consistent comparison even if the on-wire format uses different host representations. -### 9. NRTknId response tag comment +### 9. NRTknId result tag comment The `NRTknId_` tag encodes as `"IDTKN"` with a source comment: "it should be 'TID', 'SID'". This indicates a naming inconsistency that was preserved for backward compatibility — the tag names don't follow the pattern of other NTF protocol tags. diff --git a/spec/modules/Simplex/Messaging/Notifications/Server.md b/spec/modules/Simplex/Messaging/Notifications/Server.md index d77a30a00..b87f64ce8 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server.md @@ -22,7 +22,7 @@ Each client connection spawns `receive`, `send`, and `client` threads via `raceA ### 1. Timing attack mitigation on entity lookup -When `verifyNtfTransmission` encounters an AUTH error (entity not found), it calls `dummyVerifyCmd` to equalize response timing before returning the error. This prevents attackers from distinguishing "entity doesn't exist" from "signature invalid" based on response latency. +When `verifyNtfTransmission` encounters an AUTH error (entity not found), it calls `dummyVerifyCmd` to equalize result timing before returning the error. This prevents attackers from distinguishing "entity doesn't exist" from "signature invalid" based on result latency. ### 2. TNEW idempotent re-registration @@ -74,9 +74,9 @@ Cron notification interval has a hard minimum of 20 minutes. `TCRN 0` disables c `resubscribe` uses `mapConcurrently` to resubscribe to all known SMP routers in parallel. Within each router, subscriptions are paginated via `subscribeLoop` using cursor-based pagination (`afterSubId_`). -### 11. receive separates error responses from commands +### 11. receive separates error results from commands -The `receive` function processes incoming transmissions and partitions results: malformed/unauthorized requests are written directly to `sndQ` as error responses, while valid commands go to `rcvQ` for processing. This ensures protocol errors get immediate responses without competing for the command processing queue. +The `receive` function processes incoming transmissions and partitions results: malformed/unauthorized requests are written directly to `sndQ` as error results, while valid commands go to `rcvQ` for processing. This ensures protocol errors get immediate results without competing for the command processing queue. ### 12. Maintenance mode saves state then exits immediately diff --git a/spec/modules/Simplex/Messaging/Notifications/Transport.md b/spec/modules/Simplex/Messaging/Notifications/Transport.md index 9b94d7e0d..dfc4cdb5e 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Transport.md +++ b/spec/modules/Simplex/Messaging/Notifications/Transport.md @@ -17,7 +17,7 @@ Two feature gates exist in the NTF protocol: | Version | Feature | Effect | |---------|---------|--------| | v2 (`authBatchCmdsNTFVersion`) | Auth key exchange + batching | `authPubKey` sent in handshake, `implySessId` and `batch` enabled | -| v3 (`invalidReasonNTFVersion`) | Token invalid reasons | `NTInvalid` responses include the reason enum | +| v3 (`invalidReasonNTFVersion`) | Token invalid reasons | `NTInvalid` results include the reason enum | Pre-v2 connections have no command encryption or batching — commands are sent in plaintext within TLS. @@ -27,7 +27,7 @@ Pre-v2 connections have no command encryption or batching — commands are sent ### 4. Block size -NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. This is sufficient because NTF protocol commands (TNEW, SNEW, TCHK, etc.) and their responses are short. `PNMessageData` (which contains encrypted message metadata) is not sent over the NTF transport — it is delivered via APNS push notifications. +NTF uses a 512-byte block size (`ntfBlockSize`), significantly smaller than SMP. This is sufficient because NTF protocol commands (TNEW, SNEW, TCHK, etc.) and their results are short. `PNMessageData` (which contains encrypted message metadata) is not sent over the NTF transport — it is delivered via APNS push notifications. ### 5. Initial THandle has version 0 diff --git a/spec/modules/Simplex/Messaging/Protocol.md b/spec/modules/Simplex/Messaging/Protocol.md index 2ed7113c8..082e7d0e8 100644 --- a/spec/modules/Simplex/Messaging/Protocol.md +++ b/spec/modules/Simplex/Messaging/Protocol.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Protocol -> SMP protocol types, commands, responses, encoding/decoding, and transport functions. +> SMP protocol types, commands, command results, encoding/decoding, and transport functions. **Source**: [`Protocol.hs`](../../../../src/Simplex/Messaging/Protocol.hs) @@ -65,4 +65,4 @@ The `NETWORK` variant of `BrokerErrorType` encodes as just `"NETWORK"` (detail d ## SUBS/NSUBS — asymmetric defaulting -When the router parses `SUBS`/`NSUBS` from a client using a version older than `rcvServiceSMPVersion`, both count and hash default (`-1` and `mempty`). For the response side (`SOKS`/`ENDS` via `serviceRespP`), count is still parsed from the wire — only hash defaults to `mempty`. This asymmetry means command-side and response-side parsing have different fallback behavior for the same version boundary. +When the router parses `SUBS`/`NSUBS` from a client using a version older than `rcvServiceSMPVersion`, both count and hash default (`-1` and `mempty`). For the result side (`SOKS`/`ENDS` via `serviceRespP`), count is still parsed from the wire — only hash defaults to `mempty`. This asymmetry means command-side and result-side parsing have different fallback behavior for the same version boundary. diff --git a/spec/modules/Simplex/Messaging/Protocol/Types.md b/spec/modules/Simplex/Messaging/Protocol/Types.md index 0797bc185..06e60adaa 100644 --- a/spec/modules/Simplex/Messaging/Protocol/Types.md +++ b/spec/modules/Simplex/Messaging/Protocol/Types.md @@ -1,6 +1,6 @@ # Simplex.Messaging.Protocol.Types -> Client notice type with optional TTL, used in BLOCKED error responses. +> Client notice type with optional TTL, used in BLOCKED error results. **Source**: [`Protocol/Types.hs`](../../../../../src/Simplex/Messaging/Protocol/Types.hs) diff --git a/spec/modules/Simplex/Messaging/Server.md b/spec/modules/Simplex/Messaging/Server.md index 8d23404c9..5cfdfa24a 100644 --- a/spec/modules/Simplex/Messaging/Server.md +++ b/spec/modules/Simplex/Messaging/Server.md @@ -59,7 +59,7 @@ Stats classification: exactly one of `srvSubOk`/`srvSubMore`/`srvSubFewer`/`srvS See comment on `processForwardedCommand`. Only single forwarded transmissions are allowed — batches are rejected with `BLOCK`. The synthetic `THandleAuth` has `peerClientService = Nothing`, preventing forwarded clients from claiming service identity. Only SEND, SKEY, LKEY, and LGET are allowed through `rejectOrVerify`. -Double encryption: response is encrypted first to the client (with `C.cbEncrypt` using `reverseNonce clientNonce`), then wrapped and encrypted to the proxy (with `C.cbEncryptNoPad` using `reverseNonce proxyNonce`). Using reversed nonces ensures request and response directions use distinct nonces. +Double encryption: the result is encrypted first to the client (with `C.cbEncrypt` using `reverseNonce clientNonce`), then wrapped and encrypted to the proxy (with `C.cbEncryptNoPad` using `reverseNonce proxyNonce`). Using reversed nonces ensures command and result directions use distinct nonces. ## Proxy concurrency limiter @@ -73,13 +73,13 @@ See `wait`/`signal` around `forkProxiedCmd`. `procThreads` TVar implements a cou See `withSubscribed`. When a service client unsubscribes between the TVar read and the flush, `throwSTM (userError "service unsubscribed")` aborts the STM transaction. This is caught by `tryAny` and logged as "cancelled" — it's a successful path, not an error. The `flushSubscribedNtfs` function also cancels via `throwSTM` if the client is no longer current or sndQ is full. -## Batch subscription responses — SOK grouped with MSG +## Batch subscription results — SOK grouped with MSG -See comment on `processSubBatch`. When batched SUB commands produce SOK responses plus messages, the first message is appended to the SOK batch (up to 4 SOKs per block) in a single transmission. Remaining messages go to `msgQ` for separate delivery. This ensures the client receives at least one message quickly with its subscription acknowledgments. +See comment on `processSubBatch`. When batched SUB commands produce SOK results plus messages, the first message is appended to the SOK batch (up to 4 SOKs per block) in a single transmission. Remaining messages go to `msgQ` for separate delivery. This ensures the client receives at least one message quickly with its subscription acknowledgments. ## send thread — MVar fair lock -The TLS handle is wrapped in an `MVar` (`newMVar h`). Both `send` (command responses from `sndQ`) and `sendMsg` (messages from `msgQ`) acquire this lock via `withMVar`. This ensures fair interleaving between response batches and individual messages, preventing either from starving the other. +The TLS handle is wrapped in an `MVar` (`newMVar h`). Both `send` (command results from `sndQ`) and `sendMsg` (messages from `msgQ`) acquire this lock via `withMVar`. This ensures fair interleaving between result batches and individual messages, preventing either from starving the other. ## Queue creation — ID oracle prevention @@ -103,4 +103,4 @@ Every queue command calls `withQueue_` which checks if `updatedAt` matches today ## foldrM in client command processing -`foldrM process ([], [])` processes a batch of verified commands right-to-left, accumulating responses and messages. The responses list is built with `(:)`, so the final order matches the original command order. Messages from SUB are collected separately and passed as the second element of the `sndQ` tuple. +`foldrM process ([], [])` processes a batch of verified commands right-to-left, accumulating results and messages. The results list is built with `(:)`, so the final order matches the original command order. Messages from SUB are collected separately and passed as the second element of the `sndQ` tuple. diff --git a/spec/modules/Simplex/Messaging/Transport.md b/spec/modules/Simplex/Messaging/Transport.md index 1b4656071..4daa5b23f 100644 --- a/spec/modules/Simplex/Messaging/Transport.md +++ b/spec/modules/Simplex/Messaging/Transport.md @@ -24,7 +24,7 @@ The version history jumps from 12 (`blockedEntitySMPVersion`) to 14 (`proxyServe `proxiedSMPRelayVersion = 18`, one below `currentClientSMPRelayVersion = 19`. The code comment states: "SMP proxy sets it to lower than its current version to prevent client version fingerprinting by the destination relays when clients upgrade at different times." -In practice (Server.hs), the SMP proxy uses `proxiedSMPRelayVRange` to cap the destination relay's version range in the `PKEY` response sent to the client, so the client sees a capped version range rather than the relay's actual range. +In practice (Server.hs), the SMP proxy uses `proxiedSMPRelayVRange` to cap the destination relay's version range in the `PKEY` result sent to the client, so the client sees a capped version range rather than the relay's actual range. ## withTlsUnique — different API calls yield same value @@ -67,7 +67,7 @@ When `clientService` is present in the client handshake, the router performs add - On success, the router sends `SMPServerHandshakeResponse` with a `serviceId` - On failure, the router sends `SMPServerHandshakeError` before raising the error -Per the protocol spec (v16+): "`clientService` provides long-term service client certificate for high-volume services using SMP router (chat relays, notification routers, high traffic bots). The router responds with a third handshake message containing the assigned service ID." +Per the protocol spec (v16+): "`clientService` provides long-term service client certificate for high-volume services using SMP router (chat relays, notification routers, high traffic bots). The router returns a third handshake message containing the assigned service ID." The client only includes service credentials when `v >= serviceCertsSMPVersion && certificateSent c` (the TLS client certificate was actually sent). From fc5b601cb43b5065c58bfc93cf8386a96f4b2b7d Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 21:45:24 +0000 Subject: [PATCH 28/61] notes --- spec/modules/NOTES.md | 155 ++++++++++++++++++ .../Messaging/Agent/Store/AgentStore.md | 4 - spec/modules/Simplex/Messaging/Client.md | 2 +- spec/modules/Simplex/Messaging/Encoding.md | 6 - .../Simplex/Messaging/Encoding/String.md | 2 +- .../Messaging/Notifications/Server/Store.md | 6 +- .../Notifications/Server/Store/Postgres.md | 2 +- .../Messaging/Server/MsgStore/Postgres.md | 3 - .../Messaging/Server/QueueStore/Postgres.md | 8 +- spec/modules/Simplex/Messaging/Util.md | 3 - spec/modules/Simplex/RemoteControl/Client.md | 2 +- .../Simplex/RemoteControl/Discovery.md | 2 - .../Simplex/RemoteControl/Invitation.md | 2 +- spec/modules/Simplex/RemoteControl/Types.md | 2 +- 14 files changed, 166 insertions(+), 33 deletions(-) create mode 100644 spec/modules/NOTES.md diff --git a/spec/modules/NOTES.md b/spec/modules/NOTES.md new file mode 100644 index 000000000..0fad99561 --- /dev/null +++ b/spec/modules/NOTES.md @@ -0,0 +1,155 @@ +# Design Notes + +Non-bug observations from module specs that are worth tracking. These remain documented in their respective module specs — this file serves as an index. + +## Backend Observations + +### N-01: SNotifier path doesn't cache + +**Location**: `Simplex.Messaging.Server.QueueStore.Postgres` — `getQueues_` SNotifier branch +**Description**: The SRecipient path caches loaded queues via `cacheRcvQueue` with double-check locking. The SNotifier path does NOT cache — it uses a stale TMap snapshot and `maybe (mkQ False rId qRec) pure`, so concurrent loads for the same notifier can create duplicate ephemeral queue objects. Functionally correct but wasteful. +**Module spec**: [QueueStore/Postgres.md](Simplex/Messaging/Server/QueueStore/Postgres.md) + +### N-02: assertUpdated error conflation + +**Location**: `Simplex.Messaging.Server.QueueStore.Postgres` — `assertUpdated` +**Description**: `assertUpdated` returns `AUTH` for zero-rows-affected. This is the same error code used for "not found" (via `readQueueRecIO`) and "duplicate" (via `handleDuplicate`). The actual cause — stale cache, deleted queue, or constraint violation — is indistinguishable in logs. +**Module spec**: [QueueStore/Postgres.md](Simplex/Messaging/Server/QueueStore/Postgres.md) + +## Design Characteristics + +### N-03: RCVerifiedInvitation constructor exported + +**Location**: `Simplex.RemoteControl.Invitation` — `RCVerifiedInvitation` +**Description**: `RCVerifiedInvitation` is a newtype with constructor exported via `(..)`. It can be constructed without calling `verifySignedInvitation`, bypassing signature verification. The trust boundary is conventional, not enforced by the type system. `connectRCCtrl` accepts only `RCVerifiedInvitation`. +**Module spec**: [RemoteControl/Invitation.md](Simplex/RemoteControl/Invitation.md) + +### N-04: smpEncode Word16 silent truncation + +**Location**: `Simplex.Messaging.Encoding` — `Encoding Word16` instance +**Description**: `smpEncode` for ByteString uses a 1-byte length prefix. Maximum encodable length is 255 bytes. Longer values silently wrap via `w2c . fromIntegral`. Callers must ensure ByteStrings fit or use `Large`. +**Module spec**: [Encoding.md](Simplex/Messaging/Encoding.md) + +### N-05: writeIORef for period stats — not atomic + +**Location**: `Simplex.Messaging.Server.Stats` — `setPeriodStats` +**Description**: Uses `writeIORef` (not atomic). Only safe during router startup when no other threads are running. If called concurrently, period data could be corrupted. +**Module spec**: [Server/Stats.md](Simplex/Messaging/Server/Stats.md) + +### N-06: setStatsByServer orphans old TVars + +**Location**: `Simplex.Messaging.Notifications.Server.Stats` — `setStatsByServer` +**Description**: Builds a fresh `Map Text (TVar Int)` in IO, then atomically replaces the TMap's root TVar. Old per-router TVars are not reused — any other thread holding a reference from a prior `TM.lookupIO` would modify an orphaned counter. Called at startup, but lacks the explicit "not thread safe" comment. +**Module spec**: [Notifications/Server/Stats.md](Simplex/Messaging/Notifications/Server/Stats.md) + +### N-07: Lazy.unPad doesn't validate data length + +**Location**: `Simplex.Messaging.Crypto.Lazy` — `unPad` / `splitLen` +**Description**: `splitLen` does not validate that the remaining data is at least `len` bytes — `LB.take len` silently returns a shorter result. The source comment notes this is intentional to avoid consuming all lazy chunks for validation. +**Module spec**: [Crypto/Lazy.md](Simplex/Messaging/Crypto/Lazy.md) + +### N-08: Batched commands have no timeout-based expiry + +**Location**: `Simplex.Messaging.Client` — `sendBatch` +**Description**: Batched commands are written with `Nothing` as the request parameter — the send thread skips the `pending` flag check. Individual commands have timeout-based expiry. If the router stops returning results, batched commands can block the send queue indefinitely. +**Module spec**: [Client.md](Simplex/Messaging/Client.md) + +### N-09: Postgres MsgStore nanosecond precision + +**Location**: `Simplex.Messaging.Server.MsgStore.Postgres` — `toMessage` +**Description**: `MkSystemTime ts 0` constructs timestamps with zero nanoseconds. Only whole seconds are stored. Messages read from Postgres have coarser timestamps than STM/Journal stores. Not a practical issue — timestamps are typically rounded to hours or days. +**Module spec**: [Server/MsgStore/Postgres.md](Simplex/Messaging/Server/MsgStore/Postgres.md) + +### N-10: MsgStore Postgres — error stubs crash at runtime + +**Location**: `Simplex.Messaging.Server.MsgStore.Postgres` — multiple `MsgStoreClass` methods +**Description**: Multiple `MsgStoreClass` methods are `error "X not used"`. Required by the type class but not applicable to Postgres. Calling any at runtime crashes. Safe because Postgres overrides the relevant default methods, but a new caller using the wrong method would crash with no compile-time warning. +**Module spec**: [Server/MsgStore/Postgres.md](Simplex/Messaging/Server/MsgStore/Postgres.md) + +### N-11: strP default assumes base64url for all types + +**Location**: `Simplex.Messaging.Encoding.String` — `StrEncoding` class default +**Description**: The `MINIMAL` pragma allows defining only `strDecode` without `strP`. The default `strP = strDecode <$?> base64urlP` assumes input is base64url-encoded for any type. A new `StrEncoding` instance that defines only `strDecode` for non-base64 data would get a broken parser. +**Module spec**: [Encoding/String.md](Simplex/Messaging/Encoding/String.md) + +## Silent Behaviors + +Intentional design choices that are correct but non-obvious. A code modifier who doesn't know these could introduce bugs. + +### N-12: Service signing silently skipped on empty authenticator + +**Location**: `Simplex.Messaging.Client` — service signature path +**Description**: The service signature is only added when the entity authenticator is non-empty. If authenticator generation fails silently (returns empty bytes), service signing is silently skipped. +**Module spec**: [Client.md](Simplex/Messaging/Client.md) + +### N-13: stmDeleteNtfToken — nonexistent token indistinguishable from empty + +**Location**: `Simplex.Messaging.Notifications.Server.Store` — `stmDeleteNtfToken` +**Description**: If the token ID doesn't exist in the `tokens` map, the registration-cleanup branch is skipped and the function returns an empty list. The caller cannot distinguish "deleted a token with no subscriptions" from "token never existed." +**Module spec**: [Notifications/Server/Store.md](Simplex/Messaging/Notifications/Server/Store.md) + +### N-14: createCommand silently drops commands for deleted connections + +**Location**: `Simplex.Messaging.Agent.Store.AgentStore` — `createCommand` +**Description**: When `createCommand` encounters a constraint violation (the referenced connection was already deleted), it logs the error and returns successfully. Commands targeting deleted connections are silently dropped. +**Module spec**: [Agent/Store/AgentStore.md](Simplex/Messaging/Agent/Store/AgentStore.md) + +### N-15: Redirect chain loading errors silently swallowed + +**Location**: `Simplex.Messaging.Agent.Store.AgentStore` +**Description**: When loading redirect chains, errors loading individual redirect files are silently swallowed via `either (const $ pure Nothing) (pure . Just)`. Prevents a corrupt redirect from blocking access to the main file. +**Module spec**: [Agent/Store/AgentStore.md](Simplex/Messaging/Agent/Store/AgentStore.md) + +### N-16: BLOCKED encoded as AUTH for old XFTP clients + +**Location**: `Simplex.FileTransfer.Protocol` — `encodeProtocol` +**Description**: If the protocol version is below `blockedFilesXFTPVersion`, a `BLOCKED` result is encoded as `AUTH` instead. The blocking information (reason) is permanently lost for these clients. +**Module spec**: [FileTransfer/Protocol.md](Simplex/FileTransfer/Protocol.md) + +### N-17: restore_messages three-valued logic with implicit default + +**Location**: `Simplex.Messaging.Server.Main` — INI config +**Description**: The `restore_messages` INI setting has three-valued logic: explicit "on" → restore, explicit "off" → skip, missing → inherits from `enable_store_log`. This implicit default is not captured in the type system — callers see `Maybe Bool`. +**Module spec**: [Server/Main.md](Simplex/Messaging/Server/Main.md) + +### N-18: Stats format migration permanently loses precision + +**Location**: `Simplex.Messaging.Server.Stats` — `strP` for `ServerStatsData` +**Description**: The parser handles multiple format generations. Old format `qDeleted=` is read as `(value, 0, 0)`. `qSubNoMsg` is parsed and discarded. `subscribedQueues` is parsed but replaced with empty data. Data loaded from old formats is coerced — precision is permanently lost. +**Module spec**: [Server/Stats.md](Simplex/Messaging/Server/Stats.md) + +### N-19: resubscribe exceptions silently lost + +**Location**: `Simplex.Messaging.Notifications.Server` — `resubscribe` +**Description**: `resubscribe` is launched via `forkIO` before `raceAny_` starts — not part of the `raceAny_` group. Most exceptions are silently lost per `forkIO` semantics. `ExitCode` exceptions are special-cased by GHC's runtime and do propagate. +**Module spec**: [Notifications/Server.md](Simplex/Messaging/Notifications/Server.md) + +### N-20: closeSMPClientAgent worker cancellation is fire-and-forget + +**Location**: `Simplex.Messaging.Client.Agent` — `closeSMPClientAgent` +**Description**: Executes in order: set `active = False`, close all client connections, swap workers map to empty and fork cancellation threads. Cancel threads use `uninterruptibleCancel` but are fire-and-forget — the function may return before all workers are cancelled. +**Module spec**: [Client/Agent.md](Simplex/Messaging/Client/Agent.md) + +### N-21: APNS unknown 410 reasons trigger retry instead of permanent failure + +**Location**: `Simplex.Messaging.Notifications.Server.Push.APNS` +**Description**: Unknown 410 (Gone) reasons fall through to `PPRetryLater`, while unknown 400 and 403 reasons fall through to `PPResponseError`. An unexpected APNS 410 reason string triggers retry rather than permanent failure. +**Module spec**: [Notifications/Server/Push/APNS.md](Simplex/Messaging/Notifications/Server/Push/APNS.md) + +### N-22: NTInvalid/NTExpired tokens can create subscriptions + +**Location**: `Simplex.Messaging.Notifications.Protocol` — token status permissions +**Description**: Token status `NTInvalid` allows subscription commands (SNEW, SCHK, SDEL). A TODO comment explains: invalidation can happen after verification, and existing subscriptions should remain manageable. `NTExpired` is also permitted. +**Module spec**: [Notifications/Protocol.md](Simplex/Messaging/Notifications/Protocol.md) + +### N-23: removeInactiveTokenRegistrations doesn't clean up empty inner maps + +**Location**: `Simplex.Messaging.Notifications.Server.Store` — `stmRemoveInactiveTokenRegistrations` +**Description**: `stmDeleteNtfToken` checks whether inner TMap is empty after removal and cleans up the outer key. `stmRemoveInactiveTokenRegistrations` does not — surviving active tokens' registrations remain, but empty inner maps can persist. +**Module spec**: [Notifications/Server/Store.md](Simplex/Messaging/Notifications/Server/Store.md) + +### N-24: cbNonce silently truncates or pads + +**Location**: `Simplex.Messaging.Crypto` — `cbNonce` +**Description**: If the input is longer than 24 bytes, it is silently truncated. If shorter, it is silently padded. No error is raised. Callers must ensure correct length. +**Module spec**: [Crypto.md](Simplex/Messaging/Crypto.md) diff --git a/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md index d1271a6d3..59d7c2010 100644 --- a/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md +++ b/spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md @@ -75,10 +75,6 @@ Generates random 12-byte IDs (base64url encoded) and retries up to 3 times on co First clears primary flag on all queues in the connection, then sets it on the target queue. Also clears `replace_*_queue_id` on the new primary — this completes the queue rotation by removing the "replacing" marker. -## checkConfirmedSndQueueExists_ — dpPostgres typo - -The CPP guard reads `#if defined(dpPostgres)` (note `dp` instead of `db`). This means the `FOR UPDATE` clause is never included for any backend. The check still works correctly for SQLite (single-writer model) but on PostgreSQL the query runs without row locking, which could allow a TOCTOU race between checking and inserting. - ## createCommand — silent drop for deleted connections When `createCommand` encounters a constraint violation (the referenced connection was already deleted), it logs the error and returns successfully rather than throwing. This means commands targeting deleted connections are silently dropped. The rationale: the connection is already gone, so there's nothing useful to do with the error. diff --git a/spec/modules/Simplex/Messaging/Client.md b/spec/modules/Simplex/Messaging/Client.md index f23d5a005..4e97c9a5c 100644 --- a/spec/modules/Simplex/Messaging/Client.md +++ b/spec/modules/Simplex/Messaging/Client.md @@ -42,7 +42,7 @@ When `corrId` is empty, the message is an `STEvent` (router-initiated). When non ## nonBlockingWriteTBQueue — fork on full -If `tryWriteTBQueue` returns `False`, a new thread is forked for the blocking write. No backpressure mechanism — under sustained overload, thread count grows without bound. This is a deliberate tradeoff: the caller never blocks (preventing deadlock between send and process threads), at the cost of potential unbounded thread creation. +If `tryWriteTBQueue` returns `False` (queue full), a new thread is forked for the blocking write. The caller never blocks, preventing deadlock between send and process threads. ## Batch commands do not expire diff --git a/spec/modules/Simplex/Messaging/Encoding.md b/spec/modules/Simplex/Messaging/Encoding.md index 8db63d0cc..984498dd4 100644 --- a/spec/modules/Simplex/Messaging/Encoding.md +++ b/spec/modules/Simplex/Messaging/Encoding.md @@ -14,8 +14,6 @@ The two encoding classes share some instances (`Char`, `Bool`, `SystemTime`) but **Length prefix is 1 byte.** Maximum encodable length is 255 bytes. If a ByteString exceeds 255 bytes, the length silently wraps via `w2c . fromIntegral` — a 300-byte string encodes length as 44 (300 mod 256). Callers must ensure ByteStrings fit in 255 bytes, or use `Large` for longer values. -**Security**: silent truncation means a caller encoding untrusted input without length validation could produce a malformed message where the decoder reads fewer bytes than were intended, then misparses the remainder as the next field. - ## Large 2-byte length prefix (`Word16`). Use for ByteStrings that may exceed 255 bytes. Maximum 65535 bytes. @@ -36,10 +34,6 @@ Sequential concatenation with no separators. Works because each element's encodi Only seconds are encoded (as Int64); nanoseconds are discarded on encode and set to 0 on decode. -## String instance - -`smpEncode` goes through `B.pack`, which silently truncates any Unicode character above codepoint 255 to its lowest byte. A String containing non-Latin-1 characters is silently corrupted on encode with no error. Same issue exists in the `StrEncoding String` instance — see [Simplex.Messaging.Encoding.String](./Encoding/String.md#string-instance). - ## smpEncodeList / smpListP 1-byte length prefix for lists — same 255-item limit as ByteString's 255-byte limit. diff --git a/spec/modules/Simplex/Messaging/Encoding/String.md b/spec/modules/Simplex/Messaging/Encoding/String.md index 1e60295b8..378bed11f 100644 --- a/spec/modules/Simplex/Messaging/Encoding/String.md +++ b/spec/modules/Simplex/Messaging/Encoding/String.md @@ -21,7 +21,7 @@ Encodes as base64url. The parser (`strP`) only accepts non-empty strings — emp ## String instance -Inherits from ByteString via `B.pack` / `B.unpack`. Only Char8 (Latin-1) characters round-trip; `B.pack` truncates unicode codepoints above 255. The source comment warns about this. +Inherits from ByteString via `B.pack` / `B.unpack`. Only Char8 (Latin-1) characters round-trip. ## strToJSON / strParseJSON diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md index d9deedbf4..4259b44c7 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store.md @@ -20,7 +20,7 @@ When a token is activated, `stmRemoveInactiveTokenRegistrations` removes ALL oth ### 4. tokenLastNtfs accumulates via prepend -New notifications are prepended to the `NonEmpty PNMessageData` list via `(<|)`. The list is unbounded in the STM store — bounding is handled at the push delivery layer (the Postgres store limits to 6). +New notifications are prepended to the `NonEmpty PNMessageData` list via `(<|)`. ### 5. stmDeleteNtfToken prunes empty registration maps @@ -46,9 +46,9 @@ When `stmDeleteNtfToken` removes a token, it deletes the entry from the inner `T When `stmDeleteNtfSubscription` removes a subscription, it deletes the `subId` from the token's `Set NtfSubscriptionId` in `tokenSubscriptions` but never checks whether the set became empty. Tokens with all subscriptions individually deleted accumulate empty set entries — these are only cleaned up when the token itself is deleted via `deleteTokenSubs`. -### 11. stmSetNtfService — asymmetric cleanup with Postgres store +### 11. stmSetNtfService — key-value service association -`stmSetNtfService` uses `maybe TM.delete TM.insert` to either remove or set the service association for an SMP router. This is purely a key-value update with no cascading effects on subscriptions. The Postgres store's `removeServiceAndAssociations` handles subscription cleanup separately, meaning the STM and Postgres stores have **different cleanup semantics** for service removal. +`stmSetNtfService` uses `maybe TM.delete TM.insert` to either remove or set the service association for an SMP router. This is purely a key-value update with no cascading effects on subscriptions. ### 12. Subscription index triple-write invariant diff --git a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md index bde863eb6..1950ee1e1 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md @@ -83,7 +83,7 @@ The `insertServer` fallback uses `ON CONFLICT ... DO UPDATE SET smp_host = EXCLU ### 18. deleteNtfToken string_agg with hex parsing -`deleteNtfToken` uses `string_agg(s.smp_notifier_id :: TEXT, ',')` to aggregate `BYTEA` notifier IDs into comma-separated text, then parses with `parseByteaString` which drops the `\x` prefix and hex-decodes. `mapMaybe` silently drops any IDs that fail hex decoding, which could mask data corruption. +`deleteNtfToken` uses `string_agg(s.smp_notifier_id :: TEXT, ',')` to aggregate `BYTEA` notifier IDs into comma-separated text, then parses with `parseByteaString` which drops the `\x` prefix and hex-decodes. ### 19. withPeriodicNtfTokens streams with DB.fold diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md index eaeca3b90..a69e2e9ee 100644 --- a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md @@ -52,6 +52,3 @@ Creates a temp table with aggregated message stats, then updates `msg_queues` in `deleteQueueSize` calls `getQueueSize` BEFORE `deleteStoreQueue`. The returned size is the count at query time — a concurrent `writeMsg` between the size query and the delete means the reported size is stale. This is acceptable because the size is used for statistics, not for correctness. -## unsafeMaxLenBS - -`toMessage` uses `C.unsafeMaxLenBS` to bypass the `MaxLen` length check on message bodies read from the database. A TODO comment questions this choice. If the database contains oversized data, the length invariant is silently violated. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md index f97acaa2d..39d833169 100644 --- a/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md @@ -6,7 +6,7 @@ ## addQueue_ — no in-memory duplicate check, relies on DB constraint -See comment on `addQueue_`: "Not doing duplicate checks in maps as the probability of duplicates is very low." The STM implementation checks all four ID maps before insertion and returns `DUPLICATE_`. The Postgres implementation skips this and relies on `UniqueViolation` from the DB, which `handleDuplicate` maps to `AUTH`, not `DUPLICATE_`. The same logical error produces different error codes depending on the store backend. +See comment on `addQueue_`: "Not doing duplicate checks in maps as the probability of duplicates is very low." The Postgres implementation relies on `UniqueViolation` from the DB rather than pre-checking in-memory maps. ## addQueue_ — non-atomic cache updates @@ -56,17 +56,13 @@ Re-securing with the same key falls through the verify function to `pure ()`, th (1) **Cache check**: `checkCachedNotifier` acquires a per-notifier-ID lock via `notifierLocks`, then checks `TM.memberIO`. Returns `DUPLICATE_`. (2) **Queue lock**: Via `withQueueRec`, prevents concurrent modifications to the same queue. (3) **Database constraint**: `handleDuplicate` catches `UniqueViolation`, returns `AUTH`. Same duplicate, different error codes depending on whether cache was warm. The `notifierLocks` map grows unboundedly — locks are never removed except when the queue is deleted. -## addQueueNotifier — always clears notification service - -The SQL UPDATE always sets `ntf_service_id = NULL` when adding/replacing a notifier. The previous notifier's service association is silently lost. The STM implementation additionally calls `removeServiceQueue` to update service-level tracking; the Postgres version does not. - ## rowToQueueRec — link data replaced with empty stubs The standard `queueRecQuery` does NOT select `fixed_data` and `user_data` columns. When converting to `QueueRec`, link data is stubbed: `(,(EncDataBytes "", EncDataBytes "")) <$> linkId_`. Actual link data is loaded on demand via `getQueueLinkData`. Any code reading `queueData` from a cached `QueueRec` without going through `getQueueLinkData` sees empty bytes. The separate `rowToQueueRecWithData` (used by `foldQueueRecs` with `withData = True`) includes real data. ## getCreateService — serialization via serviceLocks -Entire operation wrapped in `withLockMap (serviceLocks st) fp`, serializing all creation/lookup for the same certificate fingerprint. Inside the lock: SELECT by `service_cert_hash`, if not found attempt INSERT catching `UniqueViolation`. The `serviceLocks` map grows unboundedly — no cleanup mechanism. +Entire operation wrapped in `withLockMap (serviceLocks st) fp`, serializing all creation/lookup for the same certificate fingerprint. Inside the lock: SELECT by `service_cert_hash`, if not found attempt INSERT catching `UniqueViolation`. ## batchInsertQueues — COPY protocol with manual CSV serialization diff --git a/spec/modules/Simplex/Messaging/Util.md b/spec/modules/Simplex/Messaging/Util.md index 3b9fd3777..d89e27bf1 100644 --- a/spec/modules/Simplex/Messaging/Util.md +++ b/spec/modules/Simplex/Messaging/Util.md @@ -47,6 +47,3 @@ Runs all actions concurrently, waits for any one to complete, then cancels all o Handles `Int64` delays exceeding `maxBound :: Int` (~2147 seconds on 32-bit) by looping in chunks. Necessary because `threadDelay` takes `Int`, not `Int64`. -## toChunks - -Precondition: `n > 0` (comment-only, not enforced). Passing `n = 0` causes infinite loop. diff --git a/spec/modules/Simplex/RemoteControl/Client.md b/spec/modules/Simplex/RemoteControl/Client.md index 55fd05bc1..84a5d2dca 100644 --- a/spec/modules/Simplex/RemoteControl/Client.md +++ b/spec/modules/Simplex/RemoteControl/Client.md @@ -30,7 +30,7 @@ The session key combines DH and post-quantum KEM via `kemHybridSecret`: `SHA3_25 2. Application displays session code for user verification → calls `confirmCtrlSession` with `True`/`False` 3. If confirmed, `runSession` proceeds with hello exchange → second `RCStepTMVar` resolved with session -`confirmCtrlSession` does a double `putTMVar` — the first signals the decision, the second blocks until the session thread does `takeTMVar` (synchronization point). See TODO in source: no timeout on this wait. +`confirmCtrlSession` does a double `putTMVar` — the first signals the decision, the second blocks until the session thread does `takeTMVar` (synchronization point). ## TLS hooks — single-session enforcement diff --git a/spec/modules/Simplex/RemoteControl/Discovery.md b/spec/modules/Simplex/RemoteControl/Discovery.md index 52c861c79..22fd9d6d6 100644 --- a/spec/modules/Simplex/RemoteControl/Discovery.md +++ b/spec/modules/Simplex/RemoteControl/Discovery.md @@ -12,8 +12,6 @@ Enumerates network interfaces and filters out non-routable addresses (0.0.0.0, b `joinMulticast` / `partMulticast` use a shared `TMVar Int` counter to track active listeners. Multicast group membership is per-host (not per-process — see comment in Multicast.hsc), so the counter ensures `IP_ADD_MEMBERSHIP` is called only when transitioning from 0→1 listeners and `IP_DROP_MEMBERSHIP` only when transitioning from 1→0. If `setMembership` fails, the counter is restored to its previous value and the error is logged (not thrown). -**TMVar hazard**: Both functions take the counter from the TMVar unconditionally but only put it back in the 0-or-1 branches. If `joinMulticast` is called when the counter is already >0, or `partMulticast` when >1, the TMVar is left empty and subsequent accesses will deadlock. In practice this is safe because `withListener` serializes access through a single `TMVar Int`, but the abstraction does not protect against concurrent use. - ## startTLSServer — ephemeral port support When `port_` is `Nothing`, passes `"0"` to `startTCPServer`, which causes the OS to assign an ephemeral port. The assigned port is read via `socketPort` and communicated back through the `startedOnPort` TMVar. On any startup error, `setPort Nothing` is signalled so callers don't block indefinitely on the TMVar. diff --git a/spec/modules/Simplex/RemoteControl/Invitation.md b/spec/modules/Simplex/RemoteControl/Invitation.md index 3f65ec46c..a12a12f99 100644 --- a/spec/modules/Simplex/RemoteControl/Invitation.md +++ b/spec/modules/Simplex/RemoteControl/Invitation.md @@ -15,7 +15,7 @@ Verification in `verifySignedInvitation` mirrors this: `ssig` is verified agains ## Invitation URI format -The `xrcp:/` scheme uses the SMP-style pattern: CA fingerprint as userinfo (`ca@host:port`), query parameters after `#/?`. The `app` field is raw JSON encoded in a query parameter. `RCInvitation`'s parser uses `parseSimpleQuery` + `lookup` (order-independent), but `RCSignedInvitation`'s parser uses `B.breakSubstring "&ssig="` which assumes the signatures appear at a fixed position — see TODO in source on `RCSignedInvitation`'s `strP`. +The `xrcp:/` scheme uses the SMP-style pattern: CA fingerprint as userinfo (`ca@host:port`), query parameters after `#/?`. The `app` field is raw JSON encoded in a query parameter. `RCInvitation`'s parser uses `parseSimpleQuery` + `lookup` (order-independent). ## RCVerifiedInvitation — newtype trust boundary diff --git a/spec/modules/Simplex/RemoteControl/Types.md b/spec/modules/Simplex/RemoteControl/Types.md index ad165f442..f752d465f 100644 --- a/spec/modules/Simplex/RemoteControl/Types.md +++ b/spec/modules/Simplex/RemoteControl/Types.md @@ -28,4 +28,4 @@ This module defines the data types for the XRCP (remote control) protocol, which ## IpProbe — unused discovery type -`IpProbe` is defined with `Encoding` instance but not used anywhere in the current codebase. It appears to be a placeholder for a planned IP discovery mechanism. Note: the `smpP` parser has a precedence bug — `IpProbe <$> (smpP <* "I") *> smpP` parses as `(IpProbe <$> (smpP <* "I")) *> smpP`, which discards the `IpProbe` wrapper. This has never manifested because the type is unused. +`IpProbe` is defined with `Encoding` instance but not used anywhere in the current codebase. It appears to be a placeholder for a planned IP discovery mechanism. From 388b13d417d1a88d4411ec0949b01cbb3e4350cf Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 22:16:49 +0000 Subject: [PATCH 29/61] docs --- README.md | 319 ++++++++---------------------------------------- docs/AGENT.md | 75 ++++++++++++ docs/CLIENT.md | 75 ++++++++++++ docs/ROUTERS.md | 194 +++++++++++++++++++++++++++++ 4 files changed, 392 insertions(+), 271 deletions(-) create mode 100644 docs/AGENT.md create mode 100644 docs/CLIENT.md create mode 100644 docs/ROUTERS.md diff --git a/README.md b/README.md index b138c8cc9..8d6333f3a 100644 --- a/README.md +++ b/README.md @@ -3,84 +3,78 @@ [![GitHub build](https://github.com/simplex-chat/simplexmq/actions/workflows/build.yml/badge.svg)](https://github.com/simplex-chat/simplexmq/actions/workflows/build.yml) [![GitHub release](https://img.shields.io/github/v/release/simplex-chat/simplexmq)](https://github.com/simplex-chat/simplexmq/releases) -📢 SimpleXMQ v1 is released - with many security, privacy and efficiency improvements, new functionality - see [release notes](https://github.com/simplex-chat/simplexmq/releases/tag/v1.0.0). +## SimpleX Network software -**Please note**: v1 is not backwards compatible, but it has the version negotiation built into all protocol layers for forwards compatibility of this version and backwards compatibility of the future versions, that will be backwards compatible for at least two versions back. +SimpleXMQ provides the software for [SimpleX Network](./protocol/overview-tjr.md) — a general-purpose packet routing network where endpoints exchange data through independently operated routers using resource-based addressing. Unlike IP networks, SimpleX addresses identify resources on routers (queues, data packets), not endpoint devices. Participants do not need globally unique identifiers to communicate. -If you have a server deployed please deploy a new server to a new host and retire the previous version once it is no longer used. +The software is organized in three layers: -## Message broker for unidirectional (simplex) queues - -SimpleXMQ is a message broker for managing message queues and sending messages over public network. It consists of SMP server, SMP client library and SMP agent that implement [SMP protocol](./protocol/simplex-messaging.md) for client-server communication and [SMP agent protocol](./protocol/agent-protocol.md) to manage duplex connections via simplex queues on multiple SMP servers. - -SMP protocol is inspired by [Redis serialization protocol](https://redis.io/topics/protocol), but it is much simpler - it currently has only 10 client commands and 8 server responses. - -SimpleXMQ is implemented in Haskell - it benefits from robust software transactional memory (STM) and concurrency primitives that Haskell provides. - -## SimpleXMQ roadmap +``` + Application (e.g. SimpleX Chat) ++----------------------------------+ +| SimpleX Agent | Layer 3 — bidirectional connections, e2e encryption ++----------------------------------+ +| SimpleX Client Libraries | Layer 2 — protocol clients for SMP, XFTP ++----------------------------------+ +| SimpleX Routers | Layer 1 — network infrastructure (SMP, XFTP, NTF) ++----------------------------------+ +``` -- SimpleX service protocol and application template - to enable users building services and chat bots that work over SimpleX protocol stack. The first such service will be a notification service for a mobile app. -- SMP queue redundancy and rotation in SMP agent connections. -- SMP agents synchronization to share connections and messages between multiple agents (it would allow using multiple devices for [simplex-chat](https://github.com/simplex-chat/simplex-chat)). +[SimpleX Chat](https://github.com/simplex-chat/simplex-chat) is one application built on Layer 3. IoT devices, AI services, monitoring systems, and automated services are other applications that can use Layers 2 or 3 directly. -## Components +SimpleXMQ is implemented in Haskell, benefiting from robust software transactional memory (STM) and concurrency primitives. -### SMP server +See the [SimpleX Network overview](./protocol/overview-tjr.md) for the full protocol architecture, trust model, and security analysis. -[SMP server](./apps/smp-server/Main.hs) can be run on any Linux distribution, including low power/low memory devices. OpenSSL library is required for initialization. +## Architecture -To initialize the server use `smp-server init -n ` (or `smp-server init --ip ` for IP based address) command - it will generate keys and certificates for TLS transport. The fingerprint of offline certificate is used as part of the server address to protect client/server connection against man-in-the-middle attacks: `smp://@[:5223]`. +### SimpleX Routers -SMP server uses in-memory persistence with an optional append-only log of created queues that allows to re-start the server without losing the connections. This log is compacted on every server restart, permanently removing suspended and removed queues. +Routers are the network infrastructure — they accept, buffer, and deliver packets. Three router types serve different purposes: -To enable store log, initialize server using `smp-server -l` command, or modify `smp-server.ini` created during initialization (uncomment `enable: on` option in the store log section). Use `smp-server --help` for other usage tips. +- **SMP routers** provide messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes). Protocol: [SMP](./protocol/simplex-messaging.md). +- **XFTP routers** provide data packet storage — individually addressed blocks in fixed sizes (64KB–4MB) for larger payloads. Protocol: [XFTP](./protocol/xftp.md). +- **NTF routers** bridge to platform push services (APNS) for mobile notification delivery. Protocol: [Push Notifications](./protocol/push-notifications.md). -Starting from version 2.3.0, when store log is enabled, the server would also enable saving undelivered messages on exit and restoring them on start. This can be disabled via a separate setting `restore_messages` in `smp-server.ini` file. Saving messages would only work if the server is stopped with SIGINT signal (keyboard interrupt), if it is stopped with SIGTERM signal the messages would not be saved. +#### Running an SMP router -> **Please note:** On initialization SMP server creates a chain of two certificates: a self-signed CA certificate ("offline") and a server certificate used for TLS handshake ("online"). **You should store CA certificate private key securely and delete it from the server. If server TLS credential is compromised this key can be used to sign a new one, keeping the same server identity and established connections.** CA private key location by default is `/etc/opt/simplex/ca.key`. +[SMP server](./apps/smp-server/Main.hs) runs on any Linux distribution. OpenSSL is required for initialization. -SMP server implements [SMP protocol](./protocol/simplex-messaging.md). +Initialize: `smp-server init -n ` (or `--ip `). This generates TLS certificates. The CA certificate fingerprint becomes part of the server address: `smp://@[:5223]`. -#### Running SMP server on MacOS +The server uses in-memory persistence with an optional append-only store log for queue persistence across restarts. Enable with `smp-server init -l` or in `smp-server.ini`. The log is compacted on every restart. -SMP server requires OpenSSL library for initialization. On MacOS OpenSSL library may be replaced with LibreSSL, which doesn't support required algorithms. Before initializing SMP server verify you have OpenSSL installed: +When store log is enabled, undelivered messages are saved on exit (SIGINT only, not SIGTERM) and restored on start. Control this independently with the `restore_messages` setting. -```sh -openssl version -``` +> **Please note:** On initialization, SMP server creates a certificate chain: a self-signed CA certificate ("offline") and a server certificate for TLS ("online"). **Store the CA private key securely and delete it from the server.** If the server TLS credential is compromised, this key can sign a new one while keeping the same server identity. Default location: `/etc/opt/simplex/ca.key`. -If it says "LibreSSL", please install original OpenSSL: +See [docs/ROUTERS.md](./docs/ROUTERS.md) for XFTP/NTF router setup, advanced configuration, MacOS notes, and all deployment options (Docker, installation script, building from source, Linode, DigitalOcean). -```sh -brew update -brew install openssl -echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile # or follow whatever instructions brew suggests -. ~/.zprofile # or restart your terminal to start a new session -``` +### SimpleX Client Libraries -Now `openssl version` should be saying "OpenSSL". You can now run `smp-server init` to initialize your SMP server. +[Client libraries](./docs/CLIENT.md) provide low-level protocol access to SimpleX routers. They implement the wire protocols (SMP, XFTP) and handle connection lifecycle, command authentication, and keep-alive. -### SMP client library +The [SMP client](./src/Simplex/Messaging/Client.hs) offers a functional Haskell API with STM queues for asynchronous event delivery. The [XFTP client](./src/Simplex/FileTransfer/Client.hs) handles data packet upload/download with per-download forward secrecy. -[SMP client](./src/Simplex/Messaging/Client.hs) is a Haskell library to connect to SMP servers that allows to: +Applications that manage their own encryption and connection logic — IoT devices, sensors, simple data pipelines — can use this layer directly. See [docs/CLIENT.md](./docs/CLIENT.md). -- execute commands with a functional API. -- receive messages and other notifications via STM queue. -- automatically send keep-alive commands. +### SimpleX Agent -### SMP agent +The [Agent](./docs/AGENT.md) builds bidirectional encrypted connections on top of the client libraries. It manages: -[SMP agent library](./src/Simplex/Messaging/Agent.hs) can be used to run SMP agent as part of another application and to communicate with the agent via STM queues, without serializing and parsing commands and responses. +- Duplex connections from unidirectional queue pairs +- End-to-end encryption with double ratchet and post-quantum extensions +- File transfer with chunking, encryption, and multi-router distribution +- Queue rotation for metadata privacy +- Push notification subscriptions -Haskell type [ACommand](./src/Simplex/Messaging/Agent/Protocol.hs) represents SMP agent protocol to communicate via STM queues. +The [Agent library](./src/Simplex/Messaging/Agent.hs) communicates via STM queues using the [ACommand](./src/Simplex/Messaging/Agent/Protocol.hs) type — no serialization needed. -See [simplex-chat](https://github.com/simplex-chat/simplex-chat) terminal UI for the example of integrating SMP agent into another application. +See [docs/AGENT.md](./docs/AGENT.md). -[SMP agent executable](./apps/smp-agent/Main.hs) can be used to run a standalone SMP agent process that implements plaintext [SMP agent protocol](./protocol/agent-protocol.md) via TCP port 5224, so it can be used via telnet. It can be deployed in private networks to share access to the connections between multiple applications and services. +## Quick start -## Using SMP server and SMP agent - -You can either run your own SMP server locally or deploy using [Linode StackScript](https://cloud.linode.com/stackscripts/748014), or try local SMP agent with the deployed servers: +Public SMP routers for testing: `smp://u2dS9sG8nMNURyZwqASV4yROM28Er0luVTx5X1CsMrU=@smp4.simplex.im` @@ -88,233 +82,16 @@ You can either run your own SMP server locally or deploy using [Linode StackScri `smp://PQUV2eL0t7OStZOoAsPEV2QYWt4-xilbakvGUGOItUo=@smp6.simplex.im` -It's the easiest to try SMP agent via a prototype [simplex-chat](https://github.com/simplex-chat/simplex-chat) terminal UI. - -## Deploy SMP/XFTP servers on Linux - -You can run your SMP/XFTP server as a Linux process, optionally using a service manager for booting and restarts. +## Deploy routers -Notice that `smp-server` and `xftp-server` requires `openssl` as run-time dependency (it is used to generate server certificates during initialization). Install it with your packet manager: +You can run SMP/XFTP routers on any Linux distribution. OpenSSL is required: ```sh -# For Ubuntu +# Ubuntu apt update && apt install openssl ``` -### Install binaries - -#### Using Docker - -On Linux, you can deploy smp and xftp server using Docker. This will download image from [Docker Hub](https://hub.docker.com/r/simplexchat). - -1. Create directories for persistent Docker configuration: - - ```sh - mkdir -p $HOME/simplex/{xftp,smp}/{config,logs} && mkdir -p $HOME/simplex/xftp/files - ``` - -2. Run your Docker container. - - - `smp-server` - - You must change **your_ip_or_domain**. `-e "pass=password"` is optional variable to password-protect your `smp` server: - ```sh - docker run -d \ - -e "ADDR=your_ip_or_domain" \ - -e "PASS=password" \ - -p 5223:5223 \ - -v $HOME/simplex/smp/config:/etc/opt/simplex:z \ - -v $HOME/simplex/smp/logs:/var/opt/simplex:z \ - simplexchat/smp-server:latest - ``` - - - `xftp-server` - - You must change **your_ip_or_domain** and **maximum_storage**. - ```sh - docker run -d \ - -e "ADDR=your_ip_or_domain" \ - -e "QUOTA=maximum_storage" \ - -p 443:443 \ - -v $HOME/simplex/xftp/config:/etc/opt/simplex-xftp:z \ - -v $HOME/simplex/xftp/logs:/var/opt/simplex-xftp:z \ - -v $HOME/simplex/xftp/files:/srv/xftp:z \ - simplexchat/xftp-server:latest - ``` - -#### Using installation script - -**Please note** that currently, only Ubuntu distribution is supported. - -You can install and setup servers automatically using our script: - -```sh -curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/simplex-chat/simplexmq/stable/install.sh -o simplex-server-install.sh &&\ -if echo '53fcdb4ceab324316e2c4cda7e84dbbb344f32550a65975a7895425e5a1be757 simplex-server-install.sh' | sha256sum -c; then - chmod +x ./simplex-server-install.sh - ./simplex-server-install.sh - rm ./simplex-server-install.sh -else - echo "SHA-256 checksum is incorrect!" - rm ./simplex-server-install.sh -fi -``` - -### Build from source - -#### Using Docker - -> **Please note:** to build the app use source code from [stable branch](https://github.com/simplex-chat/simplexmq/tree/stable). - -On Linux, you can build smp server using Docker. - -1. Build your images: - - ```sh - git clone https://github.com/simplex-chat/simplexmq - cd simplexmq - git checkout stable - DOCKER_BUILDKIT=1 docker build -t local/smp-server --build-arg APP="smp-server" --build-arg APP_PORT="5223" . # For xmp-server - DOCKER_BUILDKIT=1 docker build -t local/xftp-server --build-arg APP="xftp-server" --build-arg APP_PORT="443" . # For xftp-server - ``` - -2. Create directories for persistent Docker configuration: - - ```sh - mkdir -p $HOME/simplex/{xftp,smp}/{config,logs} && mkdir -p $HOME/simplex/xftp/files - ``` - -3. Run your Docker container. - - - `smp-server` - - You must change **your_ip_or_domain**. `-e "pass=password"` is optional variable to password-protect your `smp` server: - ```sh - docker run -d \ - -e "ADDR=your_ip_or_domain" \ - -e "PASS=password" \ - -p 5223:5223 \ - -v $HOME/simplex/smp/config:/etc/opt/simplex:z \ - -v $HOME/simplex/smp/logs:/var/opt/simplex:z \ - simplexchat/smp-server:latest - ``` - - - `xftp-server` - - You must change **your_ip_or_domain** and **maximum_storage**. - ```sh - docker run -d \ - -e "ADDR=your_ip_or_domain" \ - -e "QUOTA=maximum_storage" \ - -p 443:443 \ - -v $HOME/simplex/xftp/config:/etc/opt/simplex-xftp:z \ - -v $HOME/simplex/xftp/logs:/var/opt/simplex-xftp:z \ - -v $HOME/simplex/xftp/files:/srv/xftp:z \ - simplexchat/xftp-server:latest - ``` - -#### Using your distribution - -1. Install dependencies and build tools (`GHC`, `cabal` and dev libs): - - ```sh - # On Ubuntu. Depending on your distribution, use your package manager to determine package names. - sudo apt-get update && apt-get install -y build-essential curl libffi-dev libffi7 libgmp3-dev libgmp10 libncurses-dev libncurses5 libtinfo5 pkg-config zlib1g-dev libnuma-dev libssl-dev - export BOOTSTRAP_HASKELL_GHC_VERSION=9.6.3 - export BOOTSTRAP_HASKELL_CABAL_VERSION=3.10.3.0 - curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | BOOTSTRAP_HASKELL_NONINTERACTIVE=1 sh - ghcup set ghc "${BOOTSTRAP_HASKELL_GHC_VERSION}" - ghcup set cabal "${BOOTSTRAP_HASKELL_CABAL_VERSION}" - source ~/.ghcup/env - ``` - -2. Build the project: - - ```sh - git clone https://github.com/simplex-chat/simplexmq - cd simplexmq - git checkout stable - cabal update - cabal build exe:smp-server exe:xftp-server - ``` - -3. List compiled binaries: - - `smp-server` - ```sh - cabal list-bin exe:smp-server - ``` - - `xftp-server` - ```sh - cabal list-bin exe:xftp-server - ``` - -- Initialize SMP server with `smp-server init [-l] -n ` or `smp-server init [-l] --ip ` - depending on how you initialize it, either FQDN or IP will be used for server's address. - -- Run `smp-server start` to start SMP server, or you can configure a service manager to run it as a service. - -- Optionally, `smp-server` can be setup for having an onion address in `tor` network. See: [`scripts/tor`](./scripts/tor/). In this case, the server address can have both public and onion hostname pointing to the same server, to allow two people connect when only one of them is using Tor. The server address would be: `smp://@,` - -See [this section](#smp-server) for more information. Run `smp-server -h` and `smp-server init -h` for explanation of commands and options. - -[Linode](https://cloud.linode.com/stackscripts/748014) - -## Deploy SMP server on Linode - -\* You can use free credit Linode offers when [creating a new account](https://www.linode.com/) to deploy an SMP server. - -Deployment on Linode is performed via StackScripts, which serve as recipes for Linode instances, also called Linodes. To deploy SMP server on Linode: - -- Create a Linode account or login with an already existing one. -- Open [SMP server StackScript](https://cloud.linode.com/stackscripts/748014) and click "Deploy New Linode". -- You can optionally configure the following parameters: - - SMP Server store log flag for queue persistence on server restart, recommended. - - [Linode API token](https://www.linode.com/docs/guides/getting-started-with-the-linode-api#get-an-access-token) to attach server address etc. as tags to Linode and to add A record to your 2nd level domain (e.g. `example.com` [domain should be created](https://cloud.linode.com/domains/create) in your account prior to deployment). The API token access scopes: - - read/write for "linodes" - - read/write for "domains" - - Domain name to use instead of Linode IP address, e.g. `smp1.example.com`. -- Choose the region and plan, Shared CPU Nanode with 1Gb is sufficient. -- Provide ssh key to be able to connect to your Linode via ssh. If you haven't provided a Linode API token this step is required to login to your Linode and get the server's fingerprint either from the welcome message or from the file `/etc/opt/simplex/fingerprint` after server starts. See [Linode's guide on ssh](https://www.linode.com/docs/guides/use-public-key-authentication-with-ssh/) . -- Deploy your Linode. After it starts wait for SMP server to start and for tags to appear (if a Linode API token was provided). It may take up to 5 minutes depending on the connection speed on the Linode. Connecting Linode IP address to provided domain name may take some additional time. -- Get `address` and `fingerprint` either from Linode tags (click on a tag and copy it's value from the browser search panel) or via ssh. -- Great, your own SMP server is ready! If you provided FQDN use `smp://@` as SMP server address in the client, otherwise use `smp://@`. - -Please submit an [issue](https://github.com/simplex-chat/simplexmq/issues) if any problems occur. - -[DigitalOcean](https://marketplace.digitalocean.com/apps/simplex-server) - -## Deploy SMP server on DigitalOcean - -> 🚧 DigitalOcean snapshot is currently not up to date, it will soon be updated 🏗️ - -\* When creating a DigitalOcean account you can use [this link](https://try.digitalocean.com/freetrialoffer/) to get free credit. (You would still be required either to provide your credit card details or make a confirmation pre-payment with PayPal) - -To deploy SMP server use [SimpleX Server 1-click app](https://marketplace.digitalocean.com/apps/simplex-server) from DigitalOcean marketplace: - -- Create a DigitalOcean account or login with an already existing one. -- Click 'Create SimpleX server Droplet' button. -- Choose the region and plan according to your requirements (Basic plan should be sufficient). -- Finalize Droplet creation. -- Open "Console" on your Droplet management page to get SMP server fingerprint - either from the welcome message or from `/etc/opt/simplex/fingerprint`. Alternatively you can manually SSH to created Droplet, see [DigitalOcean instruction](https://docs.digitalocean.com/products/droplets/how-to/connect-with-ssh/). -- Great, your own SMP server is ready! Use `smp://@` as SMP server address in the client. - -Please submit an [issue](https://github.com/simplex-chat/simplexmq/issues) if any problems occur. - -> **Please note:** SMP server uses server address as a Common Name for server certificate generated during initialization. If you would like your server address to be FQDN instead of IP address, you can log in to your Droplet and run the commands below to re-initialize the server. Alternatively you can use [Linode StackScript](https://cloud.linode.com/stackscripts/748014) which allows this parameterization. - -```sh -smp-server delete -smp-server init [-l] -n -``` - -## SMP server design - -![SMP server design](./design/server.svg) - -## SMP agent design - -![SMP agent design](./design/agent2.svg) +See [docs/ROUTERS.md](./docs/ROUTERS.md) for Docker, binary installation, building from source, and cloud deployment (Linode, DigitalOcean). ## License diff --git a/docs/AGENT.md b/docs/AGENT.md new file mode 100644 index 000000000..128edc76b --- /dev/null +++ b/docs/AGENT.md @@ -0,0 +1,75 @@ +# SimpleX Agent + +The SimpleX Agent builds bidirectional encrypted connections on top of [SimpleX client libraries](CLIENT.md). It manages the full lifecycle of secure communication: connection establishment, end-to-end encryption, queue rotation, file transfer, and push notifications. + +This is **Layer 3** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers; Layer 2 is the [client libraries](CLIENT.md) that speak the wire protocols. The Agent adds the connection semantics that applications need. + +**Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs) + +## Connections + +The Agent turns unidirectional SMP queues into bidirectional connections: + +- **Duplex connections**: each connection uses a pair of SMP queues — one for each direction. The queues can be on different routers chosen independently by each party. +- **Connection establishment**: one party creates a connection and generates an invitation (containing router address, queue ID, and public keys). The invitation is passed out-of-band (QR code, link, etc.). The other party joins by creating a reverse queue and completing the handshake. +- **Connection links**: the Agent supports connection links (long and short) for sharing connection invitations via URLs. Short links use a separate SMP queue to store the full invitation, allowing compact QR codes. +- **Queue rotation**: the Agent periodically rotates the underlying SMP queues, limiting the window for metadata correlation. Rotation is transparent to the application — the connection identity is stable while the underlying queues change. +- **Redundant queues**: connections can use multiple queues for reliability. If one router becomes unreachable, messages flow through the remaining queues. + +## Encryption + +The Agent provides end-to-end encryption with forward secrecy and break-in recovery: + +- **Double ratchet**: messages are encrypted using a double ratchet protocol derived from the Signal protocol. Each message uses a unique key; compromising one key does not reveal past or future messages. +- **Post-quantum extensions**: the ratchet supports hybrid key exchange using SNTRUP761 (a lattice-based KEM) combined with X25519 DH. This provides protection against future quantum computers that could break classical DH. +- **Ratchet synchronization**: if the ratchet state becomes desynchronized (e.g., due to message loss or device restore), the Agent detects this and can negotiate resynchronization with the peer. +- **Per-queue encryption**: in addition to end-to-end encryption, each queue has a separate encryption layer between sender and router, preventing traffic correlation even if TLS is compromised. + +## File Transfer + +The Agent handles file transfer over [XFTP](../protocol/xftp.md) routers: + +- **Chunking**: files are split into chunks, each stored as a data packet on an XFTP router. Chunk sizes are fixed powers of 2 (64KB to 4MB), hiding the actual file size. +- **Client-side encryption**: files are encrypted and padded before upload. The recipient decrypts after downloading all chunks. The encryption key and file metadata are sent through the SMP connection, not through XFTP. +- **Multi-router distribution**: chunks can be uploaded to different XFTP routers, and each chunk can have multiple replicas on different routers for redundancy. +- **Redirect chains**: for metadata privacy, file descriptors can be stored as XFTP data packets themselves, creating an indirection layer between the SMP message and the actual file location. + +## Notifications + +The Agent manages push notification subscriptions for mobile devices: + +- **Token registration**: registers device push tokens with NTF (notification) routers, which bridge to platform push services (APNS). +- **Notification subscriptions**: creates NTF subscriptions for SMP queues so that incoming messages trigger push notifications without requiring persistent connections. +- **Privacy preservation**: push notifications contain only a notification ID, not message content. The device wakes, connects to the SMP router, and retrieves the actual message. + +## Integration + +The Agent is designed to be embedded as a Haskell library: + +- **STM queues**: the application communicates with the Agent via STM queues. Commands go in (`ACommand`), events come out (`AEvent`). No serialization or parsing — direct Haskell values. +- **Async operation**: all network operations are asynchronous. The Agent manages internal worker threads for each router connection, message processing, and background tasks (cleanup, statistics, notification supervision). +- **Background mode**: on mobile platforms, the Agent can run in a reduced mode with only the message receiver active, minimizing resource usage when the app is backgrounded. +- **Dual database backends**: the Agent supports both SQLite (for mobile/desktop) and PostgreSQL (for server deployments) as persistence backends, selected at compile time. + +## Use cases + +- **Chat applications**: [SimpleX Chat](https://github.com/simplex-chat/simplex-chat) is the reference application, using the full Agent API for messaging, file sharing, groups, and calls. +- **Bots and automated services**: services that need bidirectional encrypted communication with SimpleX Chat users or other Agent-based applications. +- **Any application needing secure bidirectional communication** over the SimpleX Network without implementing the connection management, encryption, and queue rotation logic directly. + +## What this layer adds over client libraries + +| Capability | Client (Layer 2) | Agent (Layer 3) | +|---|---|---| +| Queue operations | Direct | Managed transparently | +| Connection model | Unidirectional queues | Bidirectional connections | +| Encryption | Application's responsibility | Double ratchet with PQ extensions | +| File transfer | Raw data packet upload/download | Chunking, encryption, reassembly | +| Identity | Per-queue keys | Per-connection, rotatable | +| Notifications | Not available | NTF router integration | + +## Protocol references + +- [Agent Protocol](../protocol/agent-protocol.md) — duplex connection procedure, message format +- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model +- [PQDR](../protocol/pqdr.md) — post-quantum double ratchet specification diff --git a/docs/CLIENT.md b/docs/CLIENT.md new file mode 100644 index 000000000..6e2ca0c53 --- /dev/null +++ b/docs/CLIENT.md @@ -0,0 +1,75 @@ +# SimpleX Client Libraries + +SimpleX client libraries provide low-level protocol access to SimpleX routers. They implement the wire protocols ([SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md)) and handle connection lifecycle, but leave encryption, identity management, and connection orchestration to the application. + +This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds bidirectional encrypted connections on top of these libraries. + +## SMP Client + +**Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs) + +The SMP client connects to SMP routers and manages messaging queues — the fundamental addressing primitive of the SimpleX Network. Each queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. + +### Capabilities + +- **Queue management**: create, secure, subscribe to, and delete queues on any SMP router +- **Message sending and receiving**: send messages to a queue's sender address; receive messages from a queue's recipient address +- **Command authentication**: each queue operation is authenticated with per-queue cryptographic keys (Ed25519, Ed448, or X25519) +- **Keep-alive**: automatic ping loop detects and recovers from half-open connections +- **Proxy forwarding**: send messages through a proxy router via 2-hop onion routing (PRXY/PFWD/RFWD commands), protecting the sender's IP address from the destination router +- **Batched commands**: multiple commands can be sent in a single transmission for efficiency + +### API model + +The client uses a functional Haskell API with STM queues for asynchronous event delivery: + +- **Commands** are sent via `sendProtocolCommand` (single) or `sendBatch` (multiple). Each returns a result synchronously or via timeout. +- **Router events** (incoming messages, subscription notifications) arrive on `msgQ`, an STM `TBQueue` that the application reads from its own thread. +- **Connection lifecycle** is managed automatically: the client maintains send, receive, process, and monitor threads internally. When any thread fails, all are torn down and the `disconnected` callback fires. + +### Router identity + +Routers are identified by the SHA-256 hash of their CA certificate fingerprint, not by hostname. The client validates the full X.509 certificate chain on every TLS connection and compares the CA fingerprint against the expected hash from the queue address. This means a DNS or IP-level attacker who cannot produce the correct certificate is detected at connection time. + +## XFTP Client + +**Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs) + +The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. + +### Capabilities + +- **Data packet creation**: create data packets on routers with sender, recipient, and optional additional recipient credentials +- **Upload**: send encrypted data in a single HTTP/2 streaming request (command + body) +- **Download**: retrieve data packets with per-download ephemeral Diffie-Hellman key exchange, providing forward secrecy — compromising one download key does not reveal other downloads +- **Acknowledgment and deletion**: recipients acknowledge receipt; senders delete data packets after delivery + +### Size selection + +`prepareChunkSizes` selects data packet sizes using a threshold algorithm: if the remaining payload exceeds 75% of the next larger size, it uses the larger size. This balances storage efficiency against the number of round trips. Single-chunk payloads (e.g., redirect descriptors) can use `singleChunkSize` to verify they fit in one data packet. + +## Use cases + +These libraries are appropriate when the application manages its own encryption and connection logic: + +- **IoT sensor data collection**: a sensor creates an SMP queue and sends readings; a collector subscribes and receives them. The queue address (router + queue ID + keys) is provisioned once, out-of-band. +- **Device control**: a controller sends commands to an actuator's queue. Separate queues for commands and telemetry provide unidirectional isolation. +- **Bulk data delivery**: an application encrypts and chunks a file, uploads data packets to XFTP routers, and shares the packet addresses with the recipient out-of-band. +- **Custom protocols**: any application that needs unidirectional, router-mediated packet delivery without the overhead of the Agent's connection management. + +## What this layer does NOT provide + +The following capabilities require the [Agent](AGENT.md) (Layer 3): + +- **Bidirectional connections** — the Agent pairs two unidirectional queues into a duplex connection +- **End-to-end encryption** — the Agent manages double ratchet with post-quantum extensions +- **File transfer** — the Agent handles chunking, encryption, padding, multi-router upload, and reassembly +- **Queue rotation** — the Agent transparently rotates queues to limit metadata correlation +- **Connection discovery** — connection links, short links, and contact addresses are Agent-level abstractions +- **Push notifications** — notification token management and subscription is Agent-level + +## Protocol references + +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format, commands, and security properties +- [XFTP Protocol](../protocol/xftp.md) — XFTP wire format, data packet lifecycle +- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model, and design rationale diff --git a/docs/ROUTERS.md b/docs/ROUTERS.md new file mode 100644 index 000000000..3169d480e --- /dev/null +++ b/docs/ROUTERS.md @@ -0,0 +1,194 @@ +# SimpleX Routers — Deployment and Configuration + +SimpleX routers are the network infrastructure of the [SimpleX Network](../protocol/overview-tjr.md). They accept, buffer, and deliver data packets between endpoints. Each router operates independently and can be run by any party on standard computing hardware. + +This document covers deployment and advanced configuration. For an overview of the router architecture and trust model, see the [SimpleX Network overview](../protocol/overview-tjr.md). + +## SMP Router + +The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). + +### Advanced configuration + +`smp-server.ini` is created during initialization and controls all runtime behavior. + +**Message persistence**: when store log is enabled (`enable: on`), the server saves undelivered messages on exit and restores them on start. This only works with SIGINT (keyboard interrupt); SIGTERM does not trigger message saving. The `restore_messages` setting can be used to override this behavior independently of the store log setting. + +**Tor onion addresses**: the server can have both a public hostname and an onion hostname, allowing two users to connect when only one is using Tor. Configure as: `smp://@,`. See [`scripts/tor/`](../scripts/tor/) for setup instructions. + +### Running on MacOS + +SMP server requires OpenSSL for initialization. MacOS may ship LibreSSL instead, which doesn't support the required algorithms. + +```sh +openssl version +``` + +If it says "LibreSSL", install OpenSSL: + +```sh +brew update +brew install openssl +echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile +. ~/.zprofile +``` + +## XFTP Router + +The XFTP router provides data packet storage — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. + +Initialize with `xftp-server init` and configure storage quota in `xftp-server.ini`. + +## NTF Router + +The NTF router bridges SimpleX Network to platform push notification services (APNS). It implements the [Push Notifications protocol](../protocol/push-notifications.md). Mobile clients register push tokens with the NTF router, which subscribes to their SMP queues and sends push notifications when messages arrive. The push notification contains only a notification ID, not message content. + +Initialize with `ntf-server init` and configure APNS credentials in `ntf-server.ini`. + +## Deployment methods + +All routers require `openssl` as a runtime dependency for certificate generation during initialization: + +```sh +# Ubuntu +apt update && apt install openssl +``` + +### Docker (prebuilt images) + +Prebuilt images are available from [Docker Hub](https://hub.docker.com/r/simplexchat). + +1. Create directories for persistent configuration: + + ```sh + mkdir -p $HOME/simplex/{xftp,smp}/{config,logs} && mkdir -p $HOME/simplex/xftp/files + ``` + +2. Run: + + **SMP router** — change `your_ip_or_domain`; `-e "PASS=password"` is optional: + ```sh + docker run -d \ + -e "ADDR=your_ip_or_domain" \ + -e "PASS=password" \ + -p 5223:5223 \ + -v $HOME/simplex/smp/config:/etc/opt/simplex:z \ + -v $HOME/simplex/smp/logs:/var/opt/simplex:z \ + simplexchat/smp-server:latest + ``` + + **XFTP router** — change `your_ip_or_domain` and `maximum_storage`: + ```sh + docker run -d \ + -e "ADDR=your_ip_or_domain" \ + -e "QUOTA=maximum_storage" \ + -p 443:443 \ + -v $HOME/simplex/xftp/config:/etc/opt/simplex-xftp:z \ + -v $HOME/simplex/xftp/logs:/var/opt/simplex-xftp:z \ + -v $HOME/simplex/xftp/files:/srv/xftp:z \ + simplexchat/xftp-server:latest + ``` + +### Installation script (Ubuntu) + +```sh +curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/simplex-chat/simplexmq/stable/install.sh -o simplex-server-install.sh &&\ +if echo '53fcdb4ceab324316e2c4cda7e84dbbb344f32550a65975a7895425e5a1be757 simplex-server-install.sh' | sha256sum -c; then + chmod +x ./simplex-server-install.sh + ./simplex-server-install.sh + rm ./simplex-server-install.sh +else + echo "SHA-256 checksum is incorrect!" + rm ./simplex-server-install.sh +fi +``` + +### Build from source + +#### Using Docker + +Build from the [stable branch](https://github.com/simplex-chat/simplexmq/tree/stable): + +```sh +git clone https://github.com/simplex-chat/simplexmq +cd simplexmq +git checkout stable +DOCKER_BUILDKIT=1 docker build -t local/smp-server --build-arg APP="smp-server" --build-arg APP_PORT="5223" . +DOCKER_BUILDKIT=1 docker build -t local/xftp-server --build-arg APP="xftp-server" --build-arg APP_PORT="443" . +``` + +Then run with the same Docker commands as above, replacing `simplexchat/smp-server:latest` with `local/smp-server` (and similarly for XFTP). + +#### Native build + +1. Install dependencies: + + ```sh + # Ubuntu + sudo apt-get update && apt-get install -y build-essential curl libffi-dev libffi7 libgmp3-dev libgmp10 libncurses-dev libncurses5 libtinfo5 pkg-config zlib1g-dev libnuma-dev libssl-dev + export BOOTSTRAP_HASKELL_GHC_VERSION=9.6.3 + export BOOTSTRAP_HASKELL_CABAL_VERSION=3.10.3.0 + curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | BOOTSTRAP_HASKELL_NONINTERACTIVE=1 sh + ghcup set ghc "${BOOTSTRAP_HASKELL_GHC_VERSION}" + ghcup set cabal "${BOOTSTRAP_HASKELL_CABAL_VERSION}" + source ~/.ghcup/env + ``` + +2. Build: + + ```sh + git clone https://github.com/simplex-chat/simplexmq + cd simplexmq + git checkout stable + cabal update + cabal build exe:smp-server exe:xftp-server + ``` + +3. Find binaries: + + ```sh + cabal list-bin exe:smp-server + cabal list-bin exe:xftp-server + ``` + +4. Initialize and run: + + ```sh + smp-server init [-l] -n # or --ip + smp-server start + ``` + +### Linode StackScript + +[Deploy via Linode StackScript](https://cloud.linode.com/stackscripts/748014) — Shared CPU Nanode with 1GB is sufficient. + +Configuration options: +- SMP Server store log flag for queue persistence (recommended) +- [Linode API token](https://www.linode.com/docs/guides/getting-started-with-the-linode-api#get-an-access-token) for automatic DNS and tagging (scopes: read/write for "linodes" and "domains") +- Domain name (e.g., `smp1.example.com`) — the [domain must exist](https://cloud.linode.com/domains/create) in your Linode account + +After deployment (up to 5 minutes), get the server address from Linode tags or SSH: `smp://@`. + +### DigitalOcean 1-click + +[SimpleX Server 1-click app](https://marketplace.digitalocean.com/apps/simplex-server) from DigitalOcean marketplace. + +After deployment, get the fingerprint from the Droplet console (`/etc/opt/simplex/fingerprint`). Server address: `smp://@`. + +To use FQDN instead of IP: + +```sh +smp-server delete +smp-server init [-l] -n +``` + +## Monitoring + +SMP and XFTP routers expose Prometheus metrics via a control port. The control port also supports commands for runtime inspection (queue counts, client counts, statistics). See module specs for details on available metrics and control commands. + +## Protocol references + +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format and security properties +- [XFTP Protocol](../protocol/xftp.md) — data packet protocol +- [Push Notifications Protocol](../protocol/push-notifications.md) — NTF protocol +- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture and trust model From 13e8b7b41189a758921464cca06cb9a4b47a6ddc Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 22:48:09 +0000 Subject: [PATCH 30/61] links --- README.md | 26 ++++++++++----------- docs/AGENT.md | 62 +++++++++++++++++++++++++++++++------------------ docs/CLIENT.md | 42 +++++++++++++++++++-------------- docs/ROUTERS.md | 35 ++++++++++++++++++++++++---- 4 files changed, 106 insertions(+), 59 deletions(-) diff --git a/README.md b/README.md index 8d6333f3a..4e3327c18 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,16 @@ -# SimpleXMQ +# SimpleX Network [![GitHub build](https://github.com/simplex-chat/simplexmq/actions/workflows/build.yml/badge.svg)](https://github.com/simplex-chat/simplexmq/actions/workflows/build.yml) [![GitHub release](https://img.shields.io/github/v/release/simplex-chat/simplexmq)](https://github.com/simplex-chat/simplexmq/releases) -## SimpleX Network software - -SimpleXMQ provides the software for [SimpleX Network](./protocol/overview-tjr.md) — a general-purpose packet routing network where endpoints exchange data through independently operated routers using resource-based addressing. Unlike IP networks, SimpleX addresses identify resources on routers (queues, data packets), not endpoint devices. Participants do not need globally unique identifiers to communicate. +The simplexmq package provides the software for [SimpleX Network](./protocol/overview-tjr.md) — a general-purpose packet routing network where endpoints exchange data through independently operated routers using resource-based addressing. Unlike IP networks, SimpleX addresses identify resources on routers (queues, data packets), not endpoint devices. Participants do not need globally unique identifiers to communicate. The software is organized in three layers: ``` Application (e.g. SimpleX Chat) +----------------------------------+ -| SimpleX Agent | Layer 3 — bidirectional connections, e2e encryption +| SimpleX Agent | Layer 3 — duplex connections, e2e encryption +----------------------------------+ | SimpleX Client Libraries | Layer 2 — protocol clients for SMP, XFTP +----------------------------------+ @@ -22,9 +20,9 @@ The software is organized in three layers: [SimpleX Chat](https://github.com/simplex-chat/simplex-chat) is one application built on Layer 3. IoT devices, AI services, monitoring systems, and automated services are other applications that can use Layers 2 or 3 directly. -SimpleXMQ is implemented in Haskell, benefiting from robust software transactional memory (STM) and concurrency primitives. +The simplexmq package is implemented in Haskell, benefiting from robust software transactional memory (STM) and concurrency primitives. -See the [SimpleX Network overview](./protocol/overview-tjr.md) for the full protocol architecture, trust model, and security analysis. +See the [SimpleX Network overview](./protocol/overview-tjr.md) for the full protocol architecture, trust model, and [security analysis](./protocol/security.md). ## Architecture @@ -32,9 +30,9 @@ See the [SimpleX Network overview](./protocol/overview-tjr.md) for the full prot Routers are the network infrastructure — they accept, buffer, and deliver packets. Three router types serve different purposes: -- **SMP routers** provide messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes). Protocol: [SMP](./protocol/simplex-messaging.md). -- **XFTP routers** provide data packet storage — individually addressed blocks in fixed sizes (64KB–4MB) for larger payloads. Protocol: [XFTP](./protocol/xftp.md). -- **NTF routers** bridge to platform push services (APNS) for mobile notification delivery. Protocol: [Push Notifications](./protocol/push-notifications.md). +- **SMP routers** provide messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes). Protocol: [SMP](./protocol/simplex-messaging.md). Module spec: [`Simplex.Messaging.Server`](./spec/modules/Simplex/Messaging/Server.md). +- **XFTP routers** accept and deliver data packets — individually addressed blocks in fixed sizes (64KB–4MB) for larger payloads. Protocol: [XFTP](./protocol/xftp.md). Module spec: [`Simplex.FileTransfer.Server`](./spec/modules/Simplex/FileTransfer/Server.md). +- **NTF routers** bridge to platform push services (APNS) for mobile notification delivery. Protocol: [Push Notifications](./protocol/push-notifications.md). Module spec: [`Simplex.Messaging.Notifications.Server`](./spec/modules/Simplex/Messaging/Notifications/Server.md). #### Running an SMP router @@ -54,21 +52,21 @@ See [docs/ROUTERS.md](./docs/ROUTERS.md) for XFTP/NTF router setup, advanced con [Client libraries](./docs/CLIENT.md) provide low-level protocol access to SimpleX routers. They implement the wire protocols (SMP, XFTP) and handle connection lifecycle, command authentication, and keep-alive. -The [SMP client](./src/Simplex/Messaging/Client.hs) offers a functional Haskell API with STM queues for asynchronous event delivery. The [XFTP client](./src/Simplex/FileTransfer/Client.hs) handles data packet upload/download with per-download forward secrecy. +The [SMP client](./src/Simplex/Messaging/Client.hs) ([module spec](./spec/modules/Simplex/Messaging/Client.md)) offers a functional Haskell API with STM queues for asynchronous event delivery. The [XFTP client](./src/Simplex/FileTransfer/Client.hs) ([module spec](./spec/modules/Simplex/FileTransfer/Client.md)) sends and receives data packets with per-request forward secrecy. Applications that manage their own encryption and connection logic — IoT devices, sensors, simple data pipelines — can use this layer directly. See [docs/CLIENT.md](./docs/CLIENT.md). ### SimpleX Agent -The [Agent](./docs/AGENT.md) builds bidirectional encrypted connections on top of the client libraries. It manages: +The [Agent](./docs/AGENT.md) builds duplex encrypted connections on top of the client libraries. It manages: -- Duplex connections from unidirectional queue pairs +- Duplex connections from simplex queue pairs - End-to-end encryption with double ratchet and post-quantum extensions - File transfer with chunking, encryption, and multi-router distribution - Queue rotation for metadata privacy - Push notification subscriptions -The [Agent library](./src/Simplex/Messaging/Agent.hs) communicates via STM queues using the [ACommand](./src/Simplex/Messaging/Agent/Protocol.hs) type — no serialization needed. +The [Agent library](./src/Simplex/Messaging/Agent.hs) ([module spec](./spec/modules/Simplex/Messaging/Agent.md)) communicates via STM queues using the [ACommand](./src/Simplex/Messaging/Agent/Protocol.hs) type — no serialization needed. The Agent implements the [Agent protocol](./protocol/agent-protocol.md) for duplex connections and uses the [PQDR protocol](./protocol/pqdr.md) for end-to-end encryption. Cross-device remote control uses the [XRCP protocol](./protocol/xrcp.md). See [docs/AGENT.md](./docs/AGENT.md). diff --git a/docs/AGENT.md b/docs/AGENT.md index 128edc76b..8079736e8 100644 --- a/docs/AGENT.md +++ b/docs/AGENT.md @@ -1,16 +1,16 @@ # SimpleX Agent -The SimpleX Agent builds bidirectional encrypted connections on top of [SimpleX client libraries](CLIENT.md). It manages the full lifecycle of secure communication: connection establishment, end-to-end encryption, queue rotation, file transfer, and push notifications. +The SimpleX Agent builds duplex encrypted connections on top of [SimpleX client libraries](CLIENT.md). It manages the full lifecycle of secure communication: connection establishment, end-to-end encryption, queue rotation, file transfer, and push notifications. This is **Layer 3** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers; Layer 2 is the [client libraries](CLIENT.md) that speak the wire protocols. The Agent adds the connection semantics that applications need. -**Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs) +**Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Agent.md`](../spec/modules/Simplex/Messaging/Agent.md) ## Connections -The Agent turns unidirectional SMP queues into bidirectional connections: +The Agent turns simplex (unidirectional) SMP queues into duplex connections, implementing the [Agent protocol](../protocol/agent-protocol.md): -- **Duplex connections**: each connection uses a pair of SMP queues — one for each direction. The queues can be on different routers chosen independently by each party. +- **Duplex connections**: each connection uses a pair of SMP queues — one for each direction. The queues can be on different routers chosen independently by each party. See the [duplex connection procedure](../protocol/agent-protocol.md) for the full handshake. - **Connection establishment**: one party creates a connection and generates an invitation (containing router address, queue ID, and public keys). The invitation is passed out-of-band (QR code, link, etc.). The other party joins by creating a reverse queue and completing the handshake. - **Connection links**: the Agent supports connection links (long and short) for sharing connection invitations via URLs. Short links use a separate SMP queue to store the full invitation, allowing compact QR codes. - **Queue rotation**: the Agent periodically rotates the underlying SMP queues, limiting the window for metadata correlation. Rotation is transparent to the application — the connection identity is stable while the underlying queues change. @@ -18,53 +18,53 @@ The Agent turns unidirectional SMP queues into bidirectional connections: ## Encryption -The Agent provides end-to-end encryption with forward secrecy and break-in recovery: +The Agent provides end-to-end encryption with forward secrecy and break-in recovery, specified in the [Post-Quantum Double Ratchet protocol](../protocol/pqdr.md): -- **Double ratchet**: messages are encrypted using a double ratchet protocol derived from the Signal protocol. Each message uses a unique key; compromising one key does not reveal past or future messages. -- **Post-quantum extensions**: the ratchet supports hybrid key exchange using SNTRUP761 (a lattice-based KEM) combined with X25519 DH. This provides protection against future quantum computers that could break classical DH. +- **Double ratchet**: messages are encrypted using a double ratchet protocol derived from the Signal protocol. Each message uses a unique key; compromising one key does not reveal past or future messages. See the [PQDR specification](../protocol/pqdr.md) for the full ratchet state machine. +- **Post-quantum extensions**: the ratchet supports hybrid key exchange using SNTRUP761 (a lattice-based KEM) combined with X25519 DH. This provides protection against future quantum computers that could break classical DH. See the [SNTRUP761 module spec](../spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md) and [Ratchet module spec](../spec/modules/Simplex/Messaging/Crypto/Ratchet.md) for implementation details. - **Ratchet synchronization**: if the ratchet state becomes desynchronized (e.g., due to message loss or device restore), the Agent detects this and can negotiate resynchronization with the peer. -- **Per-queue encryption**: in addition to end-to-end encryption, each queue has a separate encryption layer between sender and router, preventing traffic correlation even if TLS is compromised. +- **Per-queue encryption**: in addition to end-to-end encryption, each queue has a separate encryption layer between sender and router, preventing traffic correlation even if TLS is compromised. See the [SMP protocol security model](../protocol/simplex-messaging.md). ## File Transfer -The Agent handles file transfer over [XFTP](../protocol/xftp.md) routers: +The Agent handles file transfer over [XFTP](../protocol/xftp.md) routers. File transfer orchestration is implemented in the [XFTP Agent module](../spec/modules/Simplex/FileTransfer/Agent.md): -- **Chunking**: files are split into chunks, each stored as a data packet on an XFTP router. Chunk sizes are fixed powers of 2 (64KB to 4MB), hiding the actual file size. -- **Client-side encryption**: files are encrypted and padded before upload. The recipient decrypts after downloading all chunks. The encryption key and file metadata are sent through the SMP connection, not through XFTP. -- **Multi-router distribution**: chunks can be uploaded to different XFTP routers, and each chunk can have multiple replicas on different routers for redundancy. -- **Redirect chains**: for metadata privacy, file descriptors can be stored as XFTP data packets themselves, creating an indirection layer between the SMP message and the actual file location. +- **Chunking**: files are split into chunks, each sent as a data packet to an XFTP router. Chunk sizes are fixed powers of 2 (64KB to 4MB), hiding the actual file size. See the [file description module spec](../spec/modules/Simplex/FileTransfer/Description.md) for chunk size selection and file descriptor format. +- **Client-side encryption**: files are encrypted and padded before being sent to XFTP routers. The recipient decrypts after receiving all chunks. The encryption key and file metadata are sent through the SMP connection, not through XFTP. See [file crypto module spec](../spec/modules/Simplex/FileTransfer/Crypto.md). +- **Multi-router distribution**: chunks can be sent to different XFTP routers, and each chunk can have multiple replicas on different routers for redundancy. +- **Redirect chains**: for metadata privacy, file descriptors can be sent as XFTP data packets themselves, creating an indirection layer between the SMP message and the actual file location. ## Notifications -The Agent manages push notification subscriptions for mobile devices: +The Agent manages push notification subscriptions for mobile devices, using the [Push Notifications protocol](../protocol/push-notifications.md). Notification supervision is handled by the [NtfSubSupervisor](../spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md): -- **Token registration**: registers device push tokens with NTF (notification) routers, which bridge to platform push services (APNS). +- **Token registration**: registers device push tokens with NTF (notification) routers, which bridge to platform push services (APNS). See the [NTF client module spec](../spec/modules/Simplex/Messaging/Notifications/Client.md). - **Notification subscriptions**: creates NTF subscriptions for SMP queues so that incoming messages trigger push notifications without requiring persistent connections. -- **Privacy preservation**: push notifications contain only a notification ID, not message content. The device wakes, connects to the SMP router, and retrieves the actual message. +- **Privacy preservation**: push notifications contain only a notification ID, not message content. The device wakes, connects to the SMP router, and retrieves the actual message. See the [Push Notifications protocol](../protocol/push-notifications.md) for the full flow. ## Integration The Agent is designed to be embedded as a Haskell library: -- **STM queues**: the application communicates with the Agent via STM queues. Commands go in (`ACommand`), events come out (`AEvent`). No serialization or parsing — direct Haskell values. -- **Async operation**: all network operations are asynchronous. The Agent manages internal worker threads for each router connection, message processing, and background tasks (cleanup, statistics, notification supervision). +- **STM queues**: the application communicates with the Agent via STM queues. Commands go in (`ACommand`), events come out (`AEvent`). No serialization or parsing — direct Haskell values. The command/event types are defined in the [Agent Protocol module](../spec/modules/Simplex/Messaging/Agent/Protocol.md). +- **Async operation**: all network operations are asynchronous. The Agent manages internal worker threads for each router connection, message processing, and background tasks (cleanup, statistics, notification supervision). See the [Agent Client module spec](../spec/modules/Simplex/Messaging/Agent/Client.md) for worker architecture. - **Background mode**: on mobile platforms, the Agent can run in a reduced mode with only the message receiver active, minimizing resource usage when the app is backgrounded. -- **Dual database backends**: the Agent supports both SQLite (for mobile/desktop) and PostgreSQL (for server deployments) as persistence backends, selected at compile time. +- **Dual database backends**: the Agent supports both SQLite (for mobile/desktop) and PostgreSQL (for server deployments) as persistence backends, selected at compile time. See [Agent Store Interface](../spec/modules/Simplex/Messaging/Agent/Store/Interface.md) and [Agent Store Postgres](../spec/modules/Simplex/Messaging/Agent/Store/Postgres.md). ## Use cases - **Chat applications**: [SimpleX Chat](https://github.com/simplex-chat/simplex-chat) is the reference application, using the full Agent API for messaging, file sharing, groups, and calls. -- **Bots and automated services**: services that need bidirectional encrypted communication with SimpleX Chat users or other Agent-based applications. -- **Any application needing secure bidirectional communication** over the SimpleX Network without implementing the connection management, encryption, and queue rotation logic directly. +- **Bots and automated services**: services that need duplex encrypted communication with SimpleX Chat users or other Agent-based applications. +- **Any application needing secure duplex communication** over the SimpleX Network without implementing the connection management, encryption, and queue rotation logic directly. ## What this layer adds over client libraries | Capability | Client (Layer 2) | Agent (Layer 3) | |---|---|---| | Queue operations | Direct | Managed transparently | -| Connection model | Unidirectional queues | Bidirectional connections | +| Connection model | Simplex (unidirectional) queues | Duplex connections | | Encryption | Application's responsibility | Double ratchet with PQ extensions | -| File transfer | Raw data packet upload/download | Chunking, encryption, reassembly | +| File transfer | Raw data packet send/receive | Chunking, encryption, reassembly | | Identity | Per-queue keys | Per-connection, rotatable | | Notifications | Not available | NTF router integration | @@ -73,3 +73,19 @@ The Agent is designed to be embedded as a Haskell library: - [Agent Protocol](../protocol/agent-protocol.md) — duplex connection procedure, message format - [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model - [PQDR](../protocol/pqdr.md) — post-quantum double ratchet specification +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP queue operations used by the Agent +- [XFTP Protocol](../protocol/xftp.md) — data packet operations for file transfer +- [Push Notifications Protocol](../protocol/push-notifications.md) — NTF token and subscription management +- [XRCP Protocol](../protocol/xrcp.md) — remote control protocol for cross-device Agent access + +## Module specs + +- [Agent](../spec/modules/Simplex/Messaging/Agent.md) — main Agent module, connection lifecycle, message processing +- [Agent Client](../spec/modules/Simplex/Messaging/Agent/Client.md) — worker threads, router connections, subscription management +- [Agent Protocol](../spec/modules/Simplex/Messaging/Agent/Protocol.md) — ACommand/AEvent types, connection invitations +- [Agent Store Interface](../spec/modules/Simplex/Messaging/Agent/Store/Interface.md) — database abstraction for SQLite/Postgres +- [Agent Store (AgentStore)](../spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md) — connection, queue, and message persistence +- [NtfSubSupervisor](../spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) — notification subscription management +- [XFTP Agent](../spec/modules/Simplex/FileTransfer/Agent.md) — file transfer orchestration +- [Ratchet](../spec/modules/Simplex/Messaging/Crypto/Ratchet.md) — double ratchet implementation +- [SNTRUP761](../spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md) — post-quantum KEM diff --git a/docs/CLIENT.md b/docs/CLIENT.md index 6e2ca0c53..8534e4f1a 100644 --- a/docs/CLIENT.md +++ b/docs/CLIENT.md @@ -2,21 +2,21 @@ SimpleX client libraries provide low-level protocol access to SimpleX routers. They implement the wire protocols ([SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md)) and handle connection lifecycle, but leave encryption, identity management, and connection orchestration to the application. -This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds bidirectional encrypted connections on top of these libraries. +This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds duplex encrypted connections on top of these libraries. ## SMP Client -**Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs) +**Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Client.md`](../spec/modules/Simplex/Messaging/Client.md) -The SMP client connects to SMP routers and manages messaging queues — the fundamental addressing primitive of the SimpleX Network. Each queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. +The SMP client connects to SMP routers and manages simplex messaging queues — the fundamental addressing primitive of the SimpleX Network. Each simplex queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. The queue model and command set are defined in the [SMP protocol](../protocol/simplex-messaging.md). ### Capabilities -- **Queue management**: create, secure, subscribe to, and delete queues on any SMP router +- **Queue management**: create, secure, subscribe to, and delete queues on any SMP router. Queue operations use the [SMP command set](../protocol/simplex-messaging.md) (NEW, KEY, SUB, DEL, etc.). - **Message sending and receiving**: send messages to a queue's sender address; receive messages from a queue's recipient address -- **Command authentication**: each queue operation is authenticated with per-queue cryptographic keys (Ed25519, Ed448, or X25519) +- **Command authentication**: each queue operation is authenticated with per-queue cryptographic keys (Ed25519, Ed448, or X25519). See the [SMP protocol security model](../protocol/simplex-messaging.md) for key roles. - **Keep-alive**: automatic ping loop detects and recovers from half-open connections -- **Proxy forwarding**: send messages through a proxy router via 2-hop onion routing (PRXY/PFWD/RFWD commands), protecting the sender's IP address from the destination router +- **Proxy forwarding**: send messages through a proxy router via 2-hop onion routing (PRXY/PFWD/RFWD commands), protecting the sender's IP address from the destination router. See [proxy forwarding details](../spec/modules/Simplex/Messaging/Client.md) in the module spec. - **Batched commands**: multiple commands can be sent in a single transmission for efficiency ### API model @@ -33,37 +33,33 @@ Routers are identified by the SHA-256 hash of their CA certificate fingerprint, ## XFTP Client -**Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs) +**Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs) — **Module spec**: [`spec/modules/Simplex/FileTransfer/Client.md`](../spec/modules/Simplex/FileTransfer/Client.md) -The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. +The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). ### Capabilities -- **Data packet creation**: create data packets on routers with sender, recipient, and optional additional recipient credentials -- **Upload**: send encrypted data in a single HTTP/2 streaming request (command + body) -- **Download**: retrieve data packets with per-download ephemeral Diffie-Hellman key exchange, providing forward secrecy — compromising one download key does not reveal other downloads +- **Data packet creation**: create data packets on routers with sender, recipient, and optional additional recipient credentials. See the [XFTP protocol](../protocol/xftp.md) for credential roles and packet lifecycle. +- **Send** (FPUT): send encrypted data to the router in a single HTTP/2 streaming request (command + body) +- **Receive** (FGET): receive data packets with per-request ephemeral Diffie-Hellman key exchange, providing forward secrecy — compromising one DH key does not reveal other received data packets - **Acknowledgment and deletion**: recipients acknowledge receipt; senders delete data packets after delivery -### Size selection - -`prepareChunkSizes` selects data packet sizes using a threshold algorithm: if the remaining payload exceeds 75% of the next larger size, it uses the larger size. This balances storage efficiency against the number of round trips. Single-chunk payloads (e.g., redirect descriptors) can use `singleChunkSize` to verify they fit in one data packet. - ## Use cases These libraries are appropriate when the application manages its own encryption and connection logic: - **IoT sensor data collection**: a sensor creates an SMP queue and sends readings; a collector subscribes and receives them. The queue address (router + queue ID + keys) is provisioned once, out-of-band. - **Device control**: a controller sends commands to an actuator's queue. Separate queues for commands and telemetry provide unidirectional isolation. -- **Bulk data delivery**: an application encrypts and chunks a file, uploads data packets to XFTP routers, and shares the packet addresses with the recipient out-of-band. +- **Bulk data delivery**: an application encrypts and chunks a file, sends data packets to XFTP routers, and shares the packet addresses with the recipient out-of-band. - **Custom protocols**: any application that needs unidirectional, router-mediated packet delivery without the overhead of the Agent's connection management. ## What this layer does NOT provide The following capabilities require the [Agent](AGENT.md) (Layer 3): -- **Bidirectional connections** — the Agent pairs two unidirectional queues into a duplex connection +- **Duplex connections** — the Agent pairs two simplex queues into a duplex connection - **End-to-end encryption** — the Agent manages double ratchet with post-quantum extensions -- **File transfer** — the Agent handles chunking, encryption, padding, multi-router upload, and reassembly +- **File transfer** — the Agent handles chunking, encryption, padding, multi-router distribution, and reassembly - **Queue rotation** — the Agent transparently rotates queues to limit metadata correlation - **Connection discovery** — connection links, short links, and contact addresses are Agent-level abstractions - **Push notifications** — notification token management and subscription is Agent-level @@ -73,3 +69,13 @@ The following capabilities require the [Agent](AGENT.md) (Layer 3): - [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format, commands, and security properties - [XFTP Protocol](../protocol/xftp.md) — XFTP wire format, data packet lifecycle - [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model, and design rationale + +## Module specs + +- [SMP Client](../spec/modules/Simplex/Messaging/Client.md) — proxy forwarding, batching, connection lifecycle, keepalive +- [XFTP Client](../spec/modules/Simplex/FileTransfer/Client.md) — handshake, data packet operations, forward secrecy +- [SMP Protocol types](../spec/modules/Simplex/Messaging/Protocol.md) — command types, queue addresses, message encoding +- [XFTP Protocol types](../spec/modules/Simplex/FileTransfer/Protocol.md) — data packet types, XFTP commands +- [Transport](../spec/modules/Simplex/Messaging/Transport.md) — TLS transport, session handshake +- [HTTP/2 Client](../spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md) — HTTP/2 transport layer +- [Crypto](../spec/modules/Simplex/Messaging/Crypto.md) — cryptographic primitives used by clients diff --git a/docs/ROUTERS.md b/docs/ROUTERS.md index 3169d480e..77337a814 100644 --- a/docs/ROUTERS.md +++ b/docs/ROUTERS.md @@ -6,7 +6,7 @@ This document covers deployment and advanced configuration. For an overview of t ## SMP Router -The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). +The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). **Module spec**: [`spec/modules/Simplex/Messaging/Server.md`](../spec/modules/Simplex/Messaging/Server.md). ### Advanced configuration @@ -35,13 +35,13 @@ echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile ## XFTP Router -The XFTP router provides data packet storage — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. +The XFTP router accepts and delivers data packets — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. **Module spec**: [`spec/modules/Simplex/FileTransfer/Server.md`](../spec/modules/Simplex/FileTransfer/Server.md). Initialize with `xftp-server init` and configure storage quota in `xftp-server.ini`. ## NTF Router -The NTF router bridges SimpleX Network to platform push notification services (APNS). It implements the [Push Notifications protocol](../protocol/push-notifications.md). Mobile clients register push tokens with the NTF router, which subscribes to their SMP queues and sends push notifications when messages arrive. The push notification contains only a notification ID, not message content. +The NTF router bridges SimpleX Network to platform push notification services (APNS). It implements the [Push Notifications protocol](../protocol/push-notifications.md). Mobile clients register push tokens with the NTF router, which subscribes to their SMP queues and sends push notifications when messages arrive. The push notification contains only a notification ID, not message content. **Module spec**: [`spec/modules/Simplex/Messaging/Notifications/Server.md`](../spec/modules/Simplex/Messaging/Notifications/Server.md). Initialize with `ntf-server init` and configure APNS credentials in `ntf-server.ini`. @@ -184,7 +184,7 @@ smp-server init [-l] -n ## Monitoring -SMP and XFTP routers expose Prometheus metrics via a control port. The control port also supports commands for runtime inspection (queue counts, client counts, statistics). See module specs for details on available metrics and control commands. +SMP and XFTP routers expose Prometheus metrics via a control port. The control port also supports commands for runtime inspection (queue counts, client counts, statistics). See [SMP Server Prometheus](../spec/modules/Simplex/Messaging/Server/Prometheus.md), [SMP Server Control](../spec/modules/Simplex/Messaging/Server/Control.md), and [NTF Server Control](../spec/modules/Simplex/Messaging/Notifications/Server/Control.md) module specs for available metrics and control commands. ## Protocol references @@ -192,3 +192,30 @@ SMP and XFTP routers expose Prometheus metrics via a control port. The control p - [XFTP Protocol](../protocol/xftp.md) — data packet protocol - [Push Notifications Protocol](../protocol/push-notifications.md) — NTF protocol - [SimpleX Network overview](../protocol/overview-tjr.md) — architecture and trust model + +## Module specs + +### SMP Router +- [Server](../spec/modules/Simplex/Messaging/Server.md) — main server module, client handling, message routing +- [Server Main](../spec/modules/Simplex/Messaging/Server/Main.md) — server startup, initialization +- [QueueStore](../spec/modules/Simplex/Messaging/Server/QueueStore.md) — queue persistence abstraction +- [QueueStore Postgres](../spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md) — PostgreSQL queue store +- [MsgStore](../spec/modules/Simplex/Messaging/Server/MsgStore.md) — message storage abstraction +- [StoreLog](../spec/modules/Simplex/Messaging/Server/StoreLog.md) — append-only store log for queue persistence +- [Server Control](../spec/modules/Simplex/Messaging/Server/Control.md) — control port commands +- [Server Prometheus](../spec/modules/Simplex/Messaging/Server/Prometheus.md) — metrics export +- [Server Stats](../spec/modules/Simplex/Messaging/Server/Stats.md) — statistics collection + +### XFTP Router +- [Server](../spec/modules/Simplex/FileTransfer/Server.md) — main server module, data packet handling +- [Server Main](../spec/modules/Simplex/FileTransfer/Server/Main.md) — server startup +- [Server Store](../spec/modules/Simplex/FileTransfer/Server/Store.md) — data packet storage +- [Server StoreLog](../spec/modules/Simplex/FileTransfer/Server/StoreLog.md) — store log for packet persistence +- [Server Stats](../spec/modules/Simplex/FileTransfer/Server/Stats.md) — statistics + +### NTF Router +- [Server](../spec/modules/Simplex/Messaging/Notifications/Server.md) — main server module +- [Server Main](../spec/modules/Simplex/Messaging/Notifications/Server/Main.md) — server startup +- [Server Store Postgres](../spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) — PostgreSQL store for tokens and subscriptions +- [APNS Push](../spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) — Apple push notification delivery +- [Server Control](../spec/modules/Simplex/Messaging/Notifications/Server/Control.md) — control port commands From cbf32a33399a0a92f5908ebf135ca1b595aeb7b1 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 22:59:28 +0000 Subject: [PATCH 31/61] ntf --- README.md | 6 +++--- docs/CLIENT.md | 18 ++++++++++++++++-- docs/ROUTERS.md | 2 +- 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 4e3327c18..eada2d49f 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ See the [SimpleX Network overview](./protocol/overview-tjr.md) for the full prot Routers are the network infrastructure — they accept, buffer, and deliver packets. Three router types serve different purposes: - **SMP routers** provide messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes). Protocol: [SMP](./protocol/simplex-messaging.md). Module spec: [`Simplex.Messaging.Server`](./spec/modules/Simplex/Messaging/Server.md). -- **XFTP routers** accept and deliver data packets — individually addressed blocks in fixed sizes (64KB–4MB) for larger payloads. Protocol: [XFTP](./protocol/xftp.md). Module spec: [`Simplex.FileTransfer.Server`](./spec/modules/Simplex/FileTransfer/Server.md). +- **XFTP routers** accept and deliver data packets over HTTP/2 — individually addressed blocks in fixed sizes (64KB–4MB) for larger payloads. Protocol: [XFTP](./protocol/xftp.md). Module spec: [`Simplex.FileTransfer.Server`](./spec/modules/Simplex/FileTransfer/Server.md). - **NTF routers** bridge to platform push services (APNS) for mobile notification delivery. Protocol: [Push Notifications](./protocol/push-notifications.md). Module spec: [`Simplex.Messaging.Notifications.Server`](./spec/modules/Simplex/Messaging/Notifications/Server.md). #### Running an SMP router @@ -50,9 +50,9 @@ See [docs/ROUTERS.md](./docs/ROUTERS.md) for XFTP/NTF router setup, advanced con ### SimpleX Client Libraries -[Client libraries](./docs/CLIENT.md) provide low-level protocol access to SimpleX routers. They implement the wire protocols (SMP, XFTP) and handle connection lifecycle, command authentication, and keep-alive. +[Client libraries](./docs/CLIENT.md) provide low-level protocol access to SimpleX routers. They implement the wire protocols (SMP, XFTP, NTF) and handle connection lifecycle, command authentication, and keep-alive. -The [SMP client](./src/Simplex/Messaging/Client.hs) ([module spec](./spec/modules/Simplex/Messaging/Client.md)) offers a functional Haskell API with STM queues for asynchronous event delivery. The [XFTP client](./src/Simplex/FileTransfer/Client.hs) ([module spec](./spec/modules/Simplex/FileTransfer/Client.md)) sends and receives data packets with per-request forward secrecy. +The [SMP client](./src/Simplex/Messaging/Client.hs) ([module spec](./spec/modules/Simplex/Messaging/Client.md)) offers a functional Haskell API with STM queues for asynchronous event delivery. The [XFTP client](./src/Simplex/FileTransfer/Client.hs) ([module spec](./spec/modules/Simplex/FileTransfer/Client.md)) sends and receives data packets over HTTP/2 with per-request forward secrecy. The [NTF client](./src/Simplex/Messaging/Notifications/Client.hs) ([module spec](./spec/modules/Simplex/Messaging/Notifications/Client.md)) manages push notification tokens and subscriptions. Applications that manage their own encryption and connection logic — IoT devices, sensors, simple data pipelines — can use this layer directly. See [docs/CLIENT.md](./docs/CLIENT.md). diff --git a/docs/CLIENT.md b/docs/CLIENT.md index 8534e4f1a..0f3f97528 100644 --- a/docs/CLIENT.md +++ b/docs/CLIENT.md @@ -1,6 +1,6 @@ # SimpleX Client Libraries -SimpleX client libraries provide low-level protocol access to SimpleX routers. They implement the wire protocols ([SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md)) and handle connection lifecycle, but leave encryption, identity management, and connection orchestration to the application. +SimpleX client libraries provide low-level protocol access to SimpleX routers. They implement the wire protocols ([SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md), [NTF](../protocol/push-notifications.md)) and handle connection lifecycle, but leave encryption, identity management, and connection orchestration to the application. This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds duplex encrypted connections on top of these libraries. @@ -35,7 +35,7 @@ Routers are identified by the SHA-256 hash of their CA certificate fingerprint, **Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs) — **Module spec**: [`spec/modules/Simplex/FileTransfer/Client.md`](../spec/modules/Simplex/FileTransfer/Client.md) -The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). +The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The XFTP protocol runs over HTTP/2, simplifying browser integration. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). ### Capabilities @@ -44,6 +44,18 @@ The XFTP client connects to XFTP routers and manages data packets — individual - **Receive** (FGET): receive data packets with per-request ephemeral Diffie-Hellman key exchange, providing forward secrecy — compromising one DH key does not reveal other received data packets - **Acknowledgment and deletion**: recipients acknowledge receipt; senders delete data packets after delivery +## NTF Client + +**Source**: [`Simplex.Messaging.Notifications.Client`](../src/Simplex/Messaging/Notifications/Client.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Notifications/Client.md`](../spec/modules/Simplex/Messaging/Notifications/Client.md) + +The NTF client connects to NTF (notification) routers and manages push notification tokens and subscriptions. It implements the [Push Notifications protocol](../protocol/push-notifications.md). + +### Capabilities + +- **Token management**: register, verify, replace, and delete push notification tokens on NTF routers +- **Subscription management**: create, check, and delete notification subscriptions that link SMP queues to push tokens +- **Batch operations**: create or check multiple subscriptions in a single request, with per-item error handling for partial success + ## Use cases These libraries are appropriate when the application manages its own encryption and connection logic: @@ -74,8 +86,10 @@ The following capabilities require the [Agent](AGENT.md) (Layer 3): - [SMP Client](../spec/modules/Simplex/Messaging/Client.md) — proxy forwarding, batching, connection lifecycle, keepalive - [XFTP Client](../spec/modules/Simplex/FileTransfer/Client.md) — handshake, data packet operations, forward secrecy +- [NTF Client](../spec/modules/Simplex/Messaging/Notifications/Client.md) — token and subscription operations, batch commands - [SMP Protocol types](../spec/modules/Simplex/Messaging/Protocol.md) — command types, queue addresses, message encoding - [XFTP Protocol types](../spec/modules/Simplex/FileTransfer/Protocol.md) — data packet types, XFTP commands +- [NTF Protocol types](../spec/modules/Simplex/Messaging/Notifications/Protocol.md) — notification commands, token/subscription types - [Transport](../spec/modules/Simplex/Messaging/Transport.md) — TLS transport, session handshake - [HTTP/2 Client](../spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md) — HTTP/2 transport layer - [Crypto](../spec/modules/Simplex/Messaging/Crypto.md) — cryptographic primitives used by clients diff --git a/docs/ROUTERS.md b/docs/ROUTERS.md index 77337a814..7ebc0f9ee 100644 --- a/docs/ROUTERS.md +++ b/docs/ROUTERS.md @@ -35,7 +35,7 @@ echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile ## XFTP Router -The XFTP router accepts and delivers data packets — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. **Module spec**: [`spec/modules/Simplex/FileTransfer/Server.md`](../spec/modules/Simplex/FileTransfer/Server.md). +The XFTP router accepts and delivers data packets over HTTP/2 — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. The use of HTTP/2 simplifies browser integration. **Module spec**: [`spec/modules/Simplex/FileTransfer/Server.md`](../spec/modules/Simplex/FileTransfer/Server.md). Initialize with `xftp-server init` and configure storage quota in `xftp-server.ini`. From ca847b101a89cc08db8bc89660cd1e6ceda09e89 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Fri, 13 Mar 2026 23:04:00 +0000 Subject: [PATCH 32/61] update --- docs/AGENT.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/AGENT.md b/docs/AGENT.md index 8079736e8..e6fe32b08 100644 --- a/docs/AGENT.md +++ b/docs/AGENT.md @@ -20,10 +20,10 @@ The Agent turns simplex (unidirectional) SMP queues into duplex connections, imp The Agent provides end-to-end encryption with forward secrecy and break-in recovery, specified in the [Post-Quantum Double Ratchet protocol](../protocol/pqdr.md): -- **Double ratchet**: messages are encrypted using a double ratchet protocol derived from the Signal protocol. Each message uses a unique key; compromising one key does not reveal past or future messages. See the [PQDR specification](../protocol/pqdr.md) for the full ratchet state machine. +- **Double ratchet**: messages are encrypted using a double ratchet protocol. Each message uses a unique key; compromising one key does not reveal past or future messages. See the [PQDR specification](../protocol/pqdr.md) for the full ratchet state machine. - **Post-quantum extensions**: the ratchet supports hybrid key exchange using SNTRUP761 (a lattice-based KEM) combined with X25519 DH. This provides protection against future quantum computers that could break classical DH. See the [SNTRUP761 module spec](../spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md) and [Ratchet module spec](../spec/modules/Simplex/Messaging/Crypto/Ratchet.md) for implementation details. - **Ratchet synchronization**: if the ratchet state becomes desynchronized (e.g., due to message loss or device restore), the Agent detects this and can negotiate resynchronization with the peer. -- **Per-queue encryption**: in addition to end-to-end encryption, each queue has a separate encryption layer between sender and router, preventing traffic correlation even if TLS is compromised. See the [SMP protocol security model](../protocol/simplex-messaging.md). +- **Per-queue encryption**: in addition to end-to-end encryption, the [SMP protocol](../protocol/simplex-messaging.md) provides a separate encryption layer on each queue between sender and router, preventing traffic correlation even if TLS is compromised. ## File Transfer @@ -66,7 +66,7 @@ The Agent is designed to be embedded as a Haskell library: | Encryption | Application's responsibility | Double ratchet with PQ extensions | | File transfer | Raw data packet send/receive | Chunking, encryption, reassembly | | Identity | Per-queue keys | Per-connection, rotatable | -| Notifications | Not available | NTF router integration | +| Notifications | Direct NTF protocol operations | Automated subscription supervision | ## Protocol references @@ -76,7 +76,9 @@ The Agent is designed to be embedded as a Haskell library: - [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP queue operations used by the Agent - [XFTP Protocol](../protocol/xftp.md) — data packet operations for file transfer - [Push Notifications Protocol](../protocol/push-notifications.md) — NTF token and subscription management -- [XRCP Protocol](../protocol/xrcp.md) — remote control protocol for cross-device Agent access +## Peer library: Remote Control + +The Agent exposes the [XRCP protocol](../protocol/xrcp.md) API for cross-device remote control (e.g., controlling a mobile app from a desktop). The actual logic is in the standalone [`Simplex.RemoteControl.Client`](../src/Simplex/RemoteControl/Client.hs) library — the Agent provides thin wrappers that pass through its random and multicast state. XRCP is not a managed Agent capability (no workers, persistence, or background supervision). See the [RemoteControl module specs](../spec/modules/Simplex/RemoteControl/Types.md). ## Module specs From a7c6dde39f9e145debe69b121f811c4003857ec6 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 09:07:26 +0000 Subject: [PATCH 33/61] router diagrams --- docs/ROUTERS.md | 37 +--- spec/diagrams/ntf-router.svg | 208 ++++++++++++++++++ spec/diagrams/smp-router.svg | 191 ++++++++++++++++ spec/diagrams/xftp-router.svg | 133 +++++++++++ spec/modules/Simplex/FileTransfer/Server.md | 2 + .../Simplex/Messaging/Notifications/Server.md | 2 + spec/modules/Simplex/Messaging/Server.md | 2 + spec/routers.md | 164 ++++++++++++++ 8 files changed, 707 insertions(+), 32 deletions(-) create mode 100644 spec/diagrams/ntf-router.svg create mode 100644 spec/diagrams/smp-router.svg create mode 100644 spec/diagrams/xftp-router.svg create mode 100644 spec/routers.md diff --git a/docs/ROUTERS.md b/docs/ROUTERS.md index 7ebc0f9ee..29d518d65 100644 --- a/docs/ROUTERS.md +++ b/docs/ROUTERS.md @@ -2,11 +2,11 @@ SimpleX routers are the network infrastructure of the [SimpleX Network](../protocol/overview-tjr.md). They accept, buffer, and deliver data packets between endpoints. Each router operates independently and can be run by any party on standard computing hardware. -This document covers deployment and advanced configuration. For an overview of the router architecture and trust model, see the [SimpleX Network overview](../protocol/overview-tjr.md). +This document covers deployment and advanced configuration. For an overview of the router architecture and trust model, see the [SimpleX Network overview](../protocol/overview-tjr.md). For internal architecture diagrams (thread topology, command processing flows), see [`spec/routers.md`](../spec/routers.md). ## SMP Router -The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). **Module spec**: [`spec/modules/Simplex/Messaging/Server.md`](../spec/modules/Simplex/Messaging/Server.md). +The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). For architecture and module specs, see [SMP Router](../spec/routers.md#smp-router). ### Advanced configuration @@ -35,13 +35,13 @@ echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile ## XFTP Router -The XFTP router accepts and delivers data packets over HTTP/2 — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. The use of HTTP/2 simplifies browser integration. **Module spec**: [`spec/modules/Simplex/FileTransfer/Server.md`](../spec/modules/Simplex/FileTransfer/Server.md). +The XFTP router accepts and delivers data packets over HTTP/2 — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. The use of HTTP/2 simplifies browser integration. For architecture and module specs, see [XFTP Router](../spec/routers.md#xftp-router). Initialize with `xftp-server init` and configure storage quota in `xftp-server.ini`. ## NTF Router -The NTF router bridges SimpleX Network to platform push notification services (APNS). It implements the [Push Notifications protocol](../protocol/push-notifications.md). Mobile clients register push tokens with the NTF router, which subscribes to their SMP queues and sends push notifications when messages arrive. The push notification contains only a notification ID, not message content. **Module spec**: [`spec/modules/Simplex/Messaging/Notifications/Server.md`](../spec/modules/Simplex/Messaging/Notifications/Server.md). +The NTF router bridges SimpleX Network to platform push notification services (APNS). It implements the [Push Notifications protocol](../protocol/push-notifications.md). Mobile clients register push tokens with the NTF router, which subscribes to their SMP queues and sends push notifications when messages arrive. The push notification contains only a notification ID, not message content. For architecture and module specs, see [NTF Router](../spec/routers.md#ntf-router). Initialize with `ntf-server init` and configure APNS credentials in `ntf-server.ini`. @@ -184,7 +184,7 @@ smp-server init [-l] -n ## Monitoring -SMP and XFTP routers expose Prometheus metrics via a control port. The control port also supports commands for runtime inspection (queue counts, client counts, statistics). See [SMP Server Prometheus](../spec/modules/Simplex/Messaging/Server/Prometheus.md), [SMP Server Control](../spec/modules/Simplex/Messaging/Server/Control.md), and [NTF Server Control](../spec/modules/Simplex/Messaging/Notifications/Server/Control.md) module specs for available metrics and control commands. +SMP and XFTP routers expose Prometheus metrics via a control port. The control port also supports commands for runtime inspection (queue counts, client counts, statistics). See module specs linked from each router section in [`spec/routers.md`](../spec/routers.md) (Control, Prometheus, Stats). ## Protocol references @@ -192,30 +192,3 @@ SMP and XFTP routers expose Prometheus metrics via a control port. The control p - [XFTP Protocol](../protocol/xftp.md) — data packet protocol - [Push Notifications Protocol](../protocol/push-notifications.md) — NTF protocol - [SimpleX Network overview](../protocol/overview-tjr.md) — architecture and trust model - -## Module specs - -### SMP Router -- [Server](../spec/modules/Simplex/Messaging/Server.md) — main server module, client handling, message routing -- [Server Main](../spec/modules/Simplex/Messaging/Server/Main.md) — server startup, initialization -- [QueueStore](../spec/modules/Simplex/Messaging/Server/QueueStore.md) — queue persistence abstraction -- [QueueStore Postgres](../spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md) — PostgreSQL queue store -- [MsgStore](../spec/modules/Simplex/Messaging/Server/MsgStore.md) — message storage abstraction -- [StoreLog](../spec/modules/Simplex/Messaging/Server/StoreLog.md) — append-only store log for queue persistence -- [Server Control](../spec/modules/Simplex/Messaging/Server/Control.md) — control port commands -- [Server Prometheus](../spec/modules/Simplex/Messaging/Server/Prometheus.md) — metrics export -- [Server Stats](../spec/modules/Simplex/Messaging/Server/Stats.md) — statistics collection - -### XFTP Router -- [Server](../spec/modules/Simplex/FileTransfer/Server.md) — main server module, data packet handling -- [Server Main](../spec/modules/Simplex/FileTransfer/Server/Main.md) — server startup -- [Server Store](../spec/modules/Simplex/FileTransfer/Server/Store.md) — data packet storage -- [Server StoreLog](../spec/modules/Simplex/FileTransfer/Server/StoreLog.md) — store log for packet persistence -- [Server Stats](../spec/modules/Simplex/FileTransfer/Server/Stats.md) — statistics - -### NTF Router -- [Server](../spec/modules/Simplex/Messaging/Notifications/Server.md) — main server module -- [Server Main](../spec/modules/Simplex/Messaging/Notifications/Server/Main.md) — server startup -- [Server Store Postgres](../spec/modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) — PostgreSQL store for tokens and subscriptions -- [APNS Push](../spec/modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) — Apple push notification delivery -- [Server Control](../spec/modules/Simplex/Messaging/Notifications/Server/Control.md) — control port commands diff --git a/spec/diagrams/ntf-router.svg b/spec/diagrams/ntf-router.svg new file mode 100644 index 000000000..fb35fe804 --- /dev/null +++ b/spec/diagrams/ntf-router.svg @@ -0,0 +1,208 @@ + + + + + + + + + + + + NTF Router -- Component Topology + + + + per client (raceAny_) + + + net + + + + + receive + + + + rcvQ + + + + client + + + + sndQ + + + + send + + + net + + + TNEW, TVFY, TRPL, TDEL + SNEW, SCHK, SDEL + + + + store + + + + SMP Client Agent (connects to SMP routers) + + + SMP routers + + + + + SMPClientAgent + + + + msgQ + + + + agentQ + + + + ntfSubscriber/receiveSMP + + + + receiveAgent + + + race_ + + + + pushQ + + + + store + + + + runSMPSubscriber + (one per SMP router) + + + subscriberSubQ + + + + tokens / subscriptions / tokenLastNtfs + (in-memory TMap + PostgreSQL) + + + + push delivery pipeline + + + + pushQ + + + + ntfPush + + + + + APNS provider + + + + periodicNtfsThread + + + + pushQ + + + + reads + + + + optional + + + logServerStats + + + prometheus + + + controlPort + + + resubscribe + + + + + + per-client thread + + + singleton thread + + + storage + + + external connection + + + Solid arrows: TBQueue connections. Dashed: store access. + + + diff --git a/spec/diagrams/smp-router.svg b/spec/diagrams/smp-router.svg new file mode 100644 index 000000000..796bbf80f --- /dev/null +++ b/spec/diagrams/smp-router.svg @@ -0,0 +1,191 @@ + + + + + + + + + + + + SMP Router -- Component Topology + + + + per client connection (raceAny_ -- any thread exit tears down connection) + + + + receive + + + + rcvQ + + + + client + + + + sndQ + + + + send + + + + msgQ + + + + sendMsg + + + net + + + + + + net + + + + + + + + + + QueueStore + (STM or Postgres) + + + + MsgStore + (STM or Postgres) + + + + StoreLog (optional) + + + + + subQ + + + + singleton threads (one instance each, all in raceAny_) + + + + serverThread + (SMP subscriptions) + + + + serverThread + (NTF subscriptions) + + + + pendingEvents + + + + deliverNtfs + + + + sendPendingEvts + + + + expireMessages + + + + expireNtfs + + + + proxyAgent + + + + optional + + + + logServerStats + + + + prometheus + + + + controlPort + + + + + + per-client thread + + + singleton thread + + + storage + + + optional + + + Solid arrows: TBQueue connections. Dashed blue: subQ linking per-client to singleton threads. + + + diff --git a/spec/diagrams/xftp-router.svg b/spec/diagrams/xftp-router.svg new file mode 100644 index 000000000..bf60f000d --- /dev/null +++ b/spec/diagrams/xftp-router.svg @@ -0,0 +1,133 @@ + + + + + + + + + + + + XFTP Router -- Component Topology + + + + per request (inline HTTP/2 callback, no spawned threads) + + + net + + + + + HTTP/2 handler + + + + Handshake State (per session) + None -> Sent -> Accepted + + + + sessions + + + + + + + Command Processing (FNEW, FADD, FPUT, FGET, FACK, FDEL) + + + + + + + + + + FileStore + (TMap in STM) + + + + Disk Storage + filesPath / senderId / data + + + quota-managed via usedStorage TVar + + + + StoreLog (append-only) + + + + + + + net + + + + background threads (singleton, in raceAny_) + + + + expireFiles + + + + logServerStats + + + + prometheus + + + + controlPort + + + + + + request handler (no threads) + + + storage + + + per-session state + + + background thread + + diff --git a/spec/modules/Simplex/FileTransfer/Server.md b/spec/modules/Simplex/FileTransfer/Server.md index cb64adad2..b695fe908 100644 --- a/spec/modules/Simplex/FileTransfer/Server.md +++ b/spec/modules/Simplex/FileTransfer/Server.md @@ -16,6 +16,8 @@ The XFTP router runs several concurrent threads via `raceAny_`: | `savePrometheusMetrics` | Periodic Prometheus metrics dump | | `runCPServer` | Control port for admin commands | +See [spec/routers.md](../../routers.md) for component and sequence diagrams. + ## Non-obvious behavior ### 1. Three-state handshake with session caching diff --git a/spec/modules/Simplex/Messaging/Notifications/Server.md b/spec/modules/Simplex/Messaging/Notifications/Server.md index b87f64ce8..0f7ebc67d 100644 --- a/spec/modules/Simplex/Messaging/Notifications/Server.md +++ b/spec/modules/Simplex/Messaging/Notifications/Server.md @@ -18,6 +18,8 @@ The NTF router runs several concurrent threads via `raceAny_`: Each client connection spawns `receive`, `send`, and `client` threads via `raceAny_`. +See [spec/routers.md](../../../routers.md) for component and sequence diagrams. + ## Non-obvious behavior ### 1. Timing attack mitigation on entity lookup diff --git a/spec/modules/Simplex/Messaging/Server.md b/spec/modules/Simplex/Messaging/Server.md index 5cfdfa24a..7d991fbb7 100644 --- a/spec/modules/Simplex/Messaging/Server.md +++ b/spec/modules/Simplex/Messaging/Server.md @@ -10,6 +10,8 @@ The router runs as `raceAny_` over many threads — any thread exit stops the entire router process. The thread set includes: one `serverThread` per subscription type (SMP, NTF), a notification delivery thread, a pending events thread, a proxy agent receiver, a SIGINT handler, plus per-transport listener threads and optional expiration/stats/prometheus/control-port threads. `E.finally` ensures `stopServer` runs on any exit. +See [spec/routers.md](../../routers.md) for component and sequence diagrams. + ## serverThread — subscription lifecycle with split STM See comment on `serverThread`. It reads the subscription request from `subQ`, then looks up the client **outside** STM (via `getServerClient`), then enters an STM transaction (`updateSubscribers`) to compute which old subscriptions to end, then runs `endPreviousSubscriptions` in IO. If the client disconnects between lookup and transaction, `updateSubscribers` handles `Nothing` by still sending END/DELD to other subscribed clients. diff --git a/spec/routers.md b/spec/routers.md new file mode 100644 index 000000000..f146ca8af --- /dev/null +++ b/spec/routers.md @@ -0,0 +1,164 @@ +# Router Architecture + +SimpleX routers are the Layer 1 network infrastructure. This document shows their internal architecture: component topology and command processing flows. + +For deployment and configuration, see [docs/ROUTERS.md](../docs/ROUTERS.md). For protocol specifications, see [SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md), [Push Notifications](../protocol/push-notifications.md). + +--- + +## SMP Router + +**Module specs**: [Server](modules/Simplex/Messaging/Server.md) · [Main](modules/Simplex/Messaging/Server/Main.md) · [QueueStore](modules/Simplex/Messaging/Server/QueueStore.md) · [QueueStore Postgres](modules/Simplex/Messaging/Server/QueueStore/Postgres.md) · [MsgStore](modules/Simplex/Messaging/Server/MsgStore.md) · [StoreLog](modules/Simplex/Messaging/Server/StoreLog.md) · [Control](modules/Simplex/Messaging/Server/Control.md) · [Prometheus](modules/Simplex/Messaging/Server/Prometheus.md) · [Stats](modules/Simplex/Messaging/Server/Stats.md) + +### Component topology + +![SMP Router — Component Topology](diagrams/smp-router.svg) + +### Packet delivery flow + +```mermaid +sequenceDiagram + participant S as Sender + + box SMP Router + participant auth as Command
Authorization + participant QS as QueueStore + participant MS as MsgStore + participant del as Packet
Delivery + end + + participant R as Recipient + + S->>auth: SEND (queue ID + packet) + auth->>QS: verify sender key (constant-time) + auth->>MS: store packet + auth->>S: OK (via sndQ) + + auth->>del: tryDeliverMessage + + alt recipient has active SUB + del->>R: MSG (via recipient's sndQ) + R->>auth: ACK + auth->>MS: delete packet + else no active subscriber + Note over MS: packet waits in MsgStore + R->>auth: SUB (subscribe to queue) + auth->>MS: fetch pending packets + del->>R: MSG + end +``` + +### Proxy forwarding flow + +```mermaid +sequenceDiagram + participant C as Client + participant P as Proxy Router + participant D as Destination Router + + C->>P: PRXY (destination address) + P->>D: connect (if not already connected) + P->>C: PKEY (proxy session key) + + C->>P: PFWD (encrypted command for destination) + P->>D: RFWD (relay forwarded command) + D->>P: command result + P->>C: command result +``` + +--- + +## XFTP Router + +**Module specs**: [Server](modules/Simplex/FileTransfer/Server.md) · [Main](modules/Simplex/FileTransfer/Server/Main.md) · [Store](modules/Simplex/FileTransfer/Server/Store.md) · [StoreLog](modules/Simplex/FileTransfer/Server/StoreLog.md) · [Stats](modules/Simplex/FileTransfer/Server/Stats.md) · [Transport](modules/Simplex/FileTransfer/Transport.md) + +### Component topology + +![XFTP Router — Component Topology](diagrams/xftp-router.svg) + +### Data packet delivery flow + +```mermaid +sequenceDiagram + participant S as Sender + + box XFTP Router + participant HS as Handshake + participant CP as Command
Processing + participant FS as FileStore + participant D as Disk + end + + participant R as Recipient + + S->>HS: HELLO + HS->>S: server DH key + version + + S->>CP: FNEW (create data packet) + CP->>FS: create FileRec, reserve quota + CP->>S: sender ID + recipient IDs + + S->>CP: FPUT (send encrypted data) + CP->>D: write to disk + CP->>FS: commit filePath + CP->>S: OK + + R->>HS: HELLO + HS->>R: server DH key + version + + R->>CP: FGET (recipient DH key) + CP->>CP: DH key agreement + CP->>D: read file + CP->>R: encrypted data stream + + R->>CP: FACK + CP->>FS: delete recipient entry +``` + +--- + +## NTF Router + +**Module specs**: [Server](modules/Simplex/Messaging/Notifications/Server.md) · [Main](modules/Simplex/Messaging/Notifications/Server/Main.md) · [Store Postgres](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) · [APNS](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) · [Control](modules/Simplex/Messaging/Notifications/Server/Control.md) · [Client](modules/Simplex/Messaging/Notifications/Client.md) · [Protocol](modules/Simplex/Messaging/Notifications/Protocol.md) + +### Component topology + +![NTF Router — Component Topology](diagrams/ntf-router.svg) + +### Token registration and notification delivery + +```mermaid +sequenceDiagram + participant App + + box NTF Router + participant cl as client thread + participant Store + participant sub as ntfSubscriber + participant push as ntfPush + end + + participant SMP as SMP Router + participant APNS + + App->>cl: TNEW (push token + DH key) + cl->>Store: create token (NTRegistered) + cl->>push: PNVerification (via pushQ) + push->>APNS: verification push + APNS-->>App: verification code (encrypted) + App->>cl: TVFY (code) + cl->>Store: token -> NTActive + + App->>cl: SNEW (subscribe to SMP queue) + cl->>Store: create subscription + cl->>SMP: NKEY (subscribe for notifications) + SMP->>cl: OK (notifier ID) + + Note over SMP: message arrives on queue + SMP->>sub: NMSG (via msgQ) + sub->>Store: update tokenLastNtfs + sub->>push: PNMessage (via pushQ) + push->>APNS: push notification + APNS-->>App: notification (ID only) + App->>SMP: connect and retrieve message +``` From abcc6da9a09f85527211749909ce52dce56528f7 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 10:35:36 +0000 Subject: [PATCH 34/61] fixes --- spec/diagrams/ntf-router.svg | 21 +++++++++++++-------- spec/diagrams/smp-router.svg | 5 +++++ spec/diagrams/xftp-router.svg | 4 ++-- spec/routers.md | 3 ++- 4 files changed, 22 insertions(+), 11 deletions(-) diff --git a/spec/diagrams/ntf-router.svg b/spec/diagrams/ntf-router.svg index fb35fe804..b42429459 100644 --- a/spec/diagrams/ntf-router.svg +++ b/spec/diagrams/ntf-router.svg @@ -52,14 +52,19 @@ net - TNEW, TVFY, TRPL, TDEL - SNEW, SCHK, SDEL + TNEW, TVFY, TCHK, TRPL, TDEL, TCRN + SNEW, SCHK, SDEL, PING - + store + + + pushQ + @@ -104,9 +109,9 @@ pushQ - - store + store subscriberSubQ - tokens / subscriptions / tokenLastNtfs - (in-memory TMap + PostgreSQL) + (PostgreSQL) StoreLog (optional) + + + NtfStore (STM TMap) + - Command Processing (FNEW, FADD, FPUT, FGET, FACK, FDEL) + Command Processing (FNEW, FADD, FPUT, FGET, FACK, FDEL, PING) Disk Storage - filesPath / senderId / data + filesPath / base64(senderId) quota-managed via usedStorage TVar diff --git a/spec/routers.md b/spec/routers.md index f146ca8af..b66c52ce4 100644 --- a/spec/routers.md +++ b/spec/routers.md @@ -95,10 +95,11 @@ sequenceDiagram HS->>S: server DH key + version S->>CP: FNEW (create data packet) - CP->>FS: create FileRec, reserve quota + CP->>FS: create FileRec CP->>S: sender ID + recipient IDs S->>CP: FPUT (send encrypted data) + CP->>FS: reserve quota CP->>D: write to disk CP->>FS: commit filePath CP->>S: OK From 4df501efe4f076b00c6890a6f2cdcf32371327bb Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 11:24:12 +0000 Subject: [PATCH 35/61] client diagrams --- docs/CLIENT.md | 20 +--- spec/clients.md | 165 +++++++++++++++++++++++++++++ spec/diagrams/smp-client-agent.svg | 146 +++++++++++++++++++++++++ spec/diagrams/smp-client.svg | 149 ++++++++++++++++++++++++++ spec/diagrams/xftp-client.svg | 83 +++++++++++++++ 5 files changed, 547 insertions(+), 16 deletions(-) create mode 100644 spec/clients.md create mode 100644 spec/diagrams/smp-client-agent.svg create mode 100644 spec/diagrams/smp-client.svg create mode 100644 spec/diagrams/xftp-client.svg diff --git a/docs/CLIENT.md b/docs/CLIENT.md index 0f3f97528..6cd4f2321 100644 --- a/docs/CLIENT.md +++ b/docs/CLIENT.md @@ -2,11 +2,11 @@ SimpleX client libraries provide low-level protocol access to SimpleX routers. They implement the wire protocols ([SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md), [NTF](../protocol/push-notifications.md)) and handle connection lifecycle, but leave encryption, identity management, and connection orchestration to the application. -This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds duplex encrypted connections on top of these libraries. +This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers themselves; Layer 3 is the [Agent](AGENT.md), which builds duplex encrypted connections on top of these libraries. For internal architecture diagrams (thread topology, command processing flows), see [`spec/clients.md`](../spec/clients.md). ## SMP Client -**Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Client.md`](../spec/modules/Simplex/Messaging/Client.md) +**Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs). For architecture and module specs, see [SMP Client](../spec/clients.md#smp-client-protocolclient). The SMP client connects to SMP routers and manages simplex messaging queues — the fundamental addressing primitive of the SimpleX Network. Each simplex queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. The queue model and command set are defined in the [SMP protocol](../protocol/simplex-messaging.md). @@ -33,7 +33,7 @@ Routers are identified by the SHA-256 hash of their CA certificate fingerprint, ## XFTP Client -**Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs) — **Module spec**: [`spec/modules/Simplex/FileTransfer/Client.md`](../spec/modules/Simplex/FileTransfer/Client.md) +**Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs). For architecture and module specs, see [XFTP Client](../spec/clients.md#xftp-client). The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The XFTP protocol runs over HTTP/2, simplifying browser integration. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). @@ -46,7 +46,7 @@ The XFTP client connects to XFTP routers and manages data packets — individual ## NTF Client -**Source**: [`Simplex.Messaging.Notifications.Client`](../src/Simplex/Messaging/Notifications/Client.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Notifications/Client.md`](../spec/modules/Simplex/Messaging/Notifications/Client.md) +**Source**: [`Simplex.Messaging.Notifications.Client`](../src/Simplex/Messaging/Notifications/Client.hs). For architecture and module specs, see [NTF Client](../spec/clients.md#ntf-client). The NTF client connects to NTF (notification) routers and manages push notification tokens and subscriptions. It implements the [Push Notifications protocol](../protocol/push-notifications.md). @@ -81,15 +81,3 @@ The following capabilities require the [Agent](AGENT.md) (Layer 3): - [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format, commands, and security properties - [XFTP Protocol](../protocol/xftp.md) — XFTP wire format, data packet lifecycle - [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model, and design rationale - -## Module specs - -- [SMP Client](../spec/modules/Simplex/Messaging/Client.md) — proxy forwarding, batching, connection lifecycle, keepalive -- [XFTP Client](../spec/modules/Simplex/FileTransfer/Client.md) — handshake, data packet operations, forward secrecy -- [NTF Client](../spec/modules/Simplex/Messaging/Notifications/Client.md) — token and subscription operations, batch commands -- [SMP Protocol types](../spec/modules/Simplex/Messaging/Protocol.md) — command types, queue addresses, message encoding -- [XFTP Protocol types](../spec/modules/Simplex/FileTransfer/Protocol.md) — data packet types, XFTP commands -- [NTF Protocol types](../spec/modules/Simplex/Messaging/Notifications/Protocol.md) — notification commands, token/subscription types -- [Transport](../spec/modules/Simplex/Messaging/Transport.md) — TLS transport, session handshake -- [HTTP/2 Client](../spec/modules/Simplex/Messaging/Transport/HTTP2/Client.md) — HTTP/2 transport layer -- [Crypto](../spec/modules/Simplex/Messaging/Crypto.md) — cryptographic primitives used by clients diff --git a/spec/clients.md b/spec/clients.md new file mode 100644 index 000000000..871d1b4fd --- /dev/null +++ b/spec/clients.md @@ -0,0 +1,165 @@ +# Client Architecture + +SimpleX clients are the Layer 2 libraries that connect to routers. This document shows their internal architecture: component topology and command processing flows. + +For deployment and usage, see [docs/CLIENT.md](../docs/CLIENT.md). For protocol specifications, see [SMP](../protocol/simplex-messaging.md), [XFTP](../protocol/xftp.md), [Push Notifications](../protocol/push-notifications.md). + +--- + +## SMP Client (ProtocolClient) + +**Module specs**: [Client](modules/Simplex/Messaging/Client.md) · [Protocol](modules/Simplex/Messaging/Protocol.md) · [Transport](modules/Simplex/Messaging/Transport.md) · [Crypto](modules/Simplex/Messaging/Crypto.md) + +Generic protocol client used for both SMP and NTF connections. Manages a single TLS connection with multiplexed command/response matching via correlation IDs. + +### Component topology + +![SMP Client — Component Topology](diagrams/smp-client.svg) + +### Command/response flow + +```mermaid +sequenceDiagram + participant C as Caller
(Agent / router) + + box ProtocolClient + participant SC as sentCommands
(TMap CorrId Request) + participant SQ as sndQ + participant S as send thread + participant R as receive thread + participant RQ as rcvQ + participant P as process thread + end + + participant Router as SMP Router + + C->>SC: mkTransmission (generate CorrId, create Request with empty responseVar) + C->>SQ: write (Request, encoded command) + S->>SQ: read + S-->>S: check pending flag (drop if timed out) + S->>Router: tPutLog (transmit bytes) + + Router->>R: tGetClient (receive batch) + R->>RQ: write transmissions + + P->>RQ: read + P->>SC: lookup CorrId + alt command response (CorrId matches, pending) + P->>SC: remove CorrId + fill responseVar (TMVar) + else expired response (CorrId matches, already timed out) + P->>C: write to msgQ (STResponse) + else server event (empty CorrId) + P->>C: write to msgQ (STEvent) + end + + Note over C: getResponse: takeTMVar with timeout +``` + +--- + +## SMPClientAgent + +**Module specs**: [Client Agent](modules/Simplex/Messaging/Client/Agent.md) + +Connection manager that multiplexes multiple ProtocolClient connections. Tracks subscriptions, handles reconnection with backoff, and forwards server messages and connection events upward. Used by SMP router (proxying) and NTF router (subscriptions). + +### Component topology + +![SMPClientAgent — Component Topology](diagrams/smp-client-agent.svg) + +### Connection lifecycle + +```mermaid +sequenceDiagram + participant C as Consumer
(router / app) + participant A as SMPClientAgent + participant PC as ProtocolClient + participant Router as SMP Router + + C->>A: getSMPServerClient'' (server) + alt client exists in smpClients + A->>C: return existing client + else no client + A->>PC: connectClient (create new ProtocolClient) + PC->>Router: TLS handshake + A->>A: register disconnect handler + A->>C: return new client + end + + C->>A: subscribeQueuesNtfs (queueIds) + A->>A: add to pendingQueueSubs + A->>PC: sendProtocolCommands (SUB batch) + PC->>Router: SUB commands + Router->>PC: OK responses + A->>A: move pending → activeQueueSubs + A->>C: CASubscribed (via agentQ) + + Note over Router: connection drops + + PC->>A: disconnect handler fires + A->>A: filter by SessionId (only remove subs matching disconnected session) + A->>A: move active → pending (queue subs + service subs) + A->>C: CAServiceDisconnected (via agentQ, if service sub existed) + A->>C: CADisconnected (via agentQ, if queue subs existed) + A->>A: spawn smpSubWorker (retry with backoff) + A->>PC: reconnect + resubscribe pending subs + A->>C: CAConnected + CASubscribed (via agentQ) +``` + +--- + +## XFTP Client + +**Module specs**: [Client](modules/Simplex/FileTransfer/Client.md) · [Protocol](modules/Simplex/FileTransfer/Protocol.md) · [HTTP/2 Client](modules/Simplex/Messaging/Transport/HTTP2/Client.md) + +Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own — each operation is a synchronous HTTP/2 request/response. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. + +### Component topology + +![XFTP Client — Component Topology](diagrams/xftp-client.svg) + +### Upload/download flow + +```mermaid +sequenceDiagram + participant C as Caller
(Agent / app) + participant X as XFTPClient + participant H as HTTP2Client + participant Router as XFTP Router + + C->>X: createXFTPChunk (FNEW) + X->>H: HTTP/2 POST (encoded command) + H->>Router: request + Router->>H: response (sender ID + recipient IDs) + H->>X: decode response + X->>C: return IDs + + C->>X: uploadXFTPChunk (FPUT + file data) + X->>H: HTTP/2 POST (streaming body) + H->>Router: request with file stream + Router->>H: OK + H->>X: OK + X->>C: return OK + + C->>X: downloadXFTPChunk (FGET + ephemeral DH key) + X->>H: HTTP/2 POST (command) + H->>Router: request + Router->>H: streaming response (server DH key + nonce + encrypted data) + H->>X: streaming body + X->>X: compute DH secret, decrypt + save to file + X->>C: return () +``` + +--- + +## NTF Client + +**Module specs**: [Client](modules/Simplex/Messaging/Notifications/Client.md) · [Protocol](modules/Simplex/Messaging/Notifications/Protocol.md) + +Type alias for ProtocolClient — same architecture as SMP Client: + +```haskell +type NtfClient = ProtocolClient NTFVersion ErrorType NtfResponse +``` + +Same threads (send, receive, process, monitor), same queues (sndQ, rcvQ, sentCommands, msgQ), same command/response flow. Different command types: TNEW, TVFY, TCHK, TRPL, TDEL, TCRN, SNEW, SCHK, SDEL, PING. diff --git a/spec/diagrams/smp-client-agent.svg b/spec/diagrams/smp-client-agent.svg new file mode 100644 index 000000000..4257726ec --- /dev/null +++ b/spec/diagrams/smp-client-agent.svg @@ -0,0 +1,146 @@ + + + + + + + + + + + + SMPClientAgent -- Component Topology + + + consumer + (NTF router / + SMP proxy / + application) + + + + msgQ + (TBQueue, server messages) + + + + + + + agentQ + (TBQueue SMPClientAgentEvent) + + + + + + CAConnected + CADisconnected + CASubscribed / CASubError + CAServiceDisconnected + CAServiceSubscribed / SubError + + + + SMPClientAgent (connection manager) + + + + smpClients + (TMap SMPServer SMPClientVar) + + + + activeQueueSubs / pendingQueueSubs + (TMap SMPServer (TMap QueueId ...)) + activeServiceSubs / pendingServiceSubs + (TMap SMPServer (TVar (Maybe ...))) + + + + + + + + + + smpSubWorkers + (one per server) + + + + reconnect + resubscribe + + + getSMPServerClient'': get or create client + connectClient: create ProtocolClient, register disconnect handler + on disconnect: filter by SessionId, move active → pending, notify agentQ, spawn worker + worker: retry connect with backoff, resubscribe pending subs + subscribeQueuesNtfs / subscribeServiceNtfs: subscribe + track state + + + + ProtocolClient connections (one per SMP Router) + + + + ProtocolClient + → SMP Router A + + + + ProtocolClient + → SMP Router B + + + + ProtocolClient + → SMP Router N + + + ... + + + + + + + + + + ProtocolClient + + + state / queue + + + background worker + + + Solid arrows: TBQueue flow. Dashed: reconnection / resubscription. + + + diff --git a/spec/diagrams/smp-client.svg b/spec/diagrams/smp-client.svg new file mode 100644 index 000000000..d537ec8d0 --- /dev/null +++ b/spec/diagrams/smp-client.svg @@ -0,0 +1,149 @@ + + + + + + + + + + + + SMP Client (ProtocolClient) -- Component Topology + + + + per connection (raceAny_ -- any thread exit tears down connection) + + + caller + (Agent/ + router) + + + + commands + + + + sndQ + (TBQueue, 64) + + + + + + + send + + + + + + + receive + + + + + + SMP + Router + (TLS) + + + + + + + rcvQ + (TBQueue, 64) + + + + + + + process + + + + sentCommands + (TMap CorrId Request) + + + + match + + + + responseVar + (TMVar) + + + + monitor + + + + PING + + + optional + + + + msgQ (optional) + (TBQueue, server events) + + + + events (empty CorrId) + + + + to Agent / SMPClientAgent + + + + + + thread + + + queue / state + + + optional + + + Solid arrows: TBQueue flow. Dashed: STM lookups / TMVar responses. + + + diff --git a/spec/diagrams/xftp-client.svg b/spec/diagrams/xftp-client.svg new file mode 100644 index 000000000..847922ed9 --- /dev/null +++ b/spec/diagrams/xftp-client.svg @@ -0,0 +1,83 @@ + + + + + + + + + + + + XFTP Client -- Component Topology + + + + per connection (XFTPClient adds no threads; serialization in HTTP2Client) + + + caller + (Agent/ + router) + + + + + + + XFTPClient + sendXFTPCommand + + + + + + + HTTP2Client + (TLS + HTTP/2 streams) + + + + XFTP + Router + (HTTP/2) + + + + thParams (negotiated) + + + uploads: streaming request body + downloads: ephemeral DH + streaming + response body (per-chunk forward secrecy) + + + + + + client wrapper + + + external connection + + + state + + + XFTPClient adds no threads. HTTP2Client has internal reqQ + process thread. + + + From 1db93b936d616ad49bd7978ebd7f4f6da8af4df2 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 13:54:41 +0000 Subject: [PATCH 36/61] corrections --- docs/AGENT.md | 40 +++--- docs/CLIENT.md | 37 ++++-- docs/ROUTERS.md | 22 ++-- spec/clients.md | 22 ++-- spec/diagrams/ntf-router.svg | 3 - spec/diagrams/smp-client-agent.svg | 161 ++++++++++++------------ spec/diagrams/smp-client.svg | 191 ++++++++++++++++------------- spec/diagrams/smp-router.svg | 5 +- spec/diagrams/xftp-client.svg | 3 - spec/diagrams/xftp-router.svg | 3 - spec/routers.md | 6 +- 11 files changed, 258 insertions(+), 235 deletions(-) diff --git a/docs/AGENT.md b/docs/AGENT.md index e6fe32b08..83050013f 100644 --- a/docs/AGENT.md +++ b/docs/AGENT.md @@ -4,16 +4,16 @@ The SimpleX Agent builds duplex encrypted connections on top of [SimpleX client This is **Layer 3** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers; Layer 2 is the [client libraries](CLIENT.md) that speak the wire protocols. The Agent adds the connection semantics that applications need. -**Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs) — **Module spec**: [`spec/modules/Simplex/Messaging/Agent.md`](../spec/modules/Simplex/Messaging/Agent.md) +**Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs). **Module spec**: [`spec/modules/Simplex/Messaging/Agent.md`](../spec/modules/Simplex/Messaging/Agent.md) ## Connections The Agent turns simplex (unidirectional) SMP queues into duplex connections, implementing the [Agent protocol](../protocol/agent-protocol.md): -- **Duplex connections**: each connection uses a pair of SMP queues — one for each direction. The queues can be on different routers chosen independently by each party. See the [duplex connection procedure](../protocol/agent-protocol.md) for the full handshake. +- **Duplex connections**: each connection uses a pair of SMP queues - one for each direction. The queues can be on different routers chosen independently by each party. See the [duplex connection procedure](../protocol/agent-protocol.md) for the full handshake. - **Connection establishment**: one party creates a connection and generates an invitation (containing router address, queue ID, and public keys). The invitation is passed out-of-band (QR code, link, etc.). The other party joins by creating a reverse queue and completing the handshake. - **Connection links**: the Agent supports connection links (long and short) for sharing connection invitations via URLs. Short links use a separate SMP queue to store the full invitation, allowing compact QR codes. -- **Queue rotation**: the Agent periodically rotates the underlying SMP queues, limiting the window for metadata correlation. Rotation is transparent to the application — the connection identity is stable while the underlying queues change. +- **Queue rotation**: the Agent periodically rotates the underlying SMP queues, limiting the window for metadata correlation. Rotation is transparent to the application - the connection identity is stable while the underlying queues change. - **Redundant queues**: connections can use multiple queues for reliability. If one router becomes unreachable, messages flow through the remaining queues. ## Encryption @@ -46,7 +46,7 @@ The Agent manages push notification subscriptions for mobile devices, using the The Agent is designed to be embedded as a Haskell library: -- **STM queues**: the application communicates with the Agent via STM queues. Commands go in (`ACommand`), events come out (`AEvent`). No serialization or parsing — direct Haskell values. The command/event types are defined in the [Agent Protocol module](../spec/modules/Simplex/Messaging/Agent/Protocol.md). +- **STM queues**: the application communicates with the Agent via STM queues. Commands go in (`ACommand`), events come out (`AEvent`). No serialization or parsing - direct Haskell values. The command/event types are defined in the [Agent Protocol module](../spec/modules/Simplex/Messaging/Agent/Protocol.md). - **Async operation**: all network operations are asynchronous. The Agent manages internal worker threads for each router connection, message processing, and background tasks (cleanup, statistics, notification supervision). See the [Agent Client module spec](../spec/modules/Simplex/Messaging/Agent/Client.md) for worker architecture. - **Background mode**: on mobile platforms, the Agent can run in a reduced mode with only the message receiver active, minimizing resource usage when the app is backgrounded. - **Dual database backends**: the Agent supports both SQLite (for mobile/desktop) and PostgreSQL (for server deployments) as persistence backends, selected at compile time. See [Agent Store Interface](../spec/modules/Simplex/Messaging/Agent/Store/Interface.md) and [Agent Store Postgres](../spec/modules/Simplex/Messaging/Agent/Store/Postgres.md). @@ -70,24 +70,24 @@ The Agent is designed to be embedded as a Haskell library: ## Protocol references -- [Agent Protocol](../protocol/agent-protocol.md) — duplex connection procedure, message format -- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model -- [PQDR](../protocol/pqdr.md) — post-quantum double ratchet specification -- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP queue operations used by the Agent -- [XFTP Protocol](../protocol/xftp.md) — data packet operations for file transfer -- [Push Notifications Protocol](../protocol/push-notifications.md) — NTF token and subscription management +- [Agent Protocol](../protocol/agent-protocol.md) - duplex connection procedure, message format +- [SimpleX Network overview](../protocol/overview-tjr.md) - architecture, trust model +- [PQDR](../protocol/pqdr.md) - post-quantum double ratchet specification +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) - SMP queue operations used by the Agent +- [XFTP Protocol](../protocol/xftp.md) - data packet operations for file transfer +- [Push Notifications Protocol](../protocol/push-notifications.md) - NTF token and subscription management ## Peer library: Remote Control -The Agent exposes the [XRCP protocol](../protocol/xrcp.md) API for cross-device remote control (e.g., controlling a mobile app from a desktop). The actual logic is in the standalone [`Simplex.RemoteControl.Client`](../src/Simplex/RemoteControl/Client.hs) library — the Agent provides thin wrappers that pass through its random and multicast state. XRCP is not a managed Agent capability (no workers, persistence, or background supervision). See the [RemoteControl module specs](../spec/modules/Simplex/RemoteControl/Types.md). +The Agent exposes the [XRCP protocol](../protocol/xrcp.md) API for cross-device remote control (e.g., controlling a mobile app from a desktop). The actual logic is in the standalone [`Simplex.RemoteControl.Client`](../src/Simplex/RemoteControl/Client.hs) library - the Agent provides thin wrappers that pass through its random and multicast state. XRCP is not a managed Agent capability (no workers, persistence, or background supervision). See the [RemoteControl module specs](../spec/modules/Simplex/RemoteControl/Types.md). ## Module specs -- [Agent](../spec/modules/Simplex/Messaging/Agent.md) — main Agent module, connection lifecycle, message processing -- [Agent Client](../spec/modules/Simplex/Messaging/Agent/Client.md) — worker threads, router connections, subscription management -- [Agent Protocol](../spec/modules/Simplex/Messaging/Agent/Protocol.md) — ACommand/AEvent types, connection invitations -- [Agent Store Interface](../spec/modules/Simplex/Messaging/Agent/Store/Interface.md) — database abstraction for SQLite/Postgres -- [Agent Store (AgentStore)](../spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md) — connection, queue, and message persistence -- [NtfSubSupervisor](../spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) — notification subscription management -- [XFTP Agent](../spec/modules/Simplex/FileTransfer/Agent.md) — file transfer orchestration -- [Ratchet](../spec/modules/Simplex/Messaging/Crypto/Ratchet.md) — double ratchet implementation -- [SNTRUP761](../spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md) — post-quantum KEM +- [Agent](../spec/modules/Simplex/Messaging/Agent.md) - main Agent module, connection lifecycle, message processing +- [Agent Client](../spec/modules/Simplex/Messaging/Agent/Client.md) - worker threads, router connections, subscription management +- [Agent Protocol](../spec/modules/Simplex/Messaging/Agent/Protocol.md) - ACommand/AEvent types, connection invitations +- [Agent Store Interface](../spec/modules/Simplex/Messaging/Agent/Store/Interface.md) - database abstraction for SQLite/Postgres +- [Agent Store (AgentStore)](../spec/modules/Simplex/Messaging/Agent/Store/AgentStore.md) - connection, queue, and message persistence +- [NtfSubSupervisor](../spec/modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) - notification subscription management +- [XFTP Agent](../spec/modules/Simplex/FileTransfer/Agent.md) - file transfer orchestration +- [Ratchet](../spec/modules/Simplex/Messaging/Crypto/Ratchet.md) - double ratchet implementation +- [SNTRUP761](../spec/modules/Simplex/Messaging/Crypto/SNTRUP761.md) - post-quantum KEM diff --git a/docs/CLIENT.md b/docs/CLIENT.md index 6cd4f2321..4f71dab89 100644 --- a/docs/CLIENT.md +++ b/docs/CLIENT.md @@ -8,7 +8,7 @@ This is **Layer 2** of the [SimpleX Network architecture](../protocol/overview-t **Source**: [`Simplex.Messaging.Client`](../src/Simplex/Messaging/Client.hs). For architecture and module specs, see [SMP Client](../spec/clients.md#smp-client-protocolclient). -The SMP client connects to SMP routers and manages simplex messaging queues — the fundamental addressing primitive of the SimpleX Network. Each simplex queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. The queue model and command set are defined in the [SMP protocol](../protocol/simplex-messaging.md). +The SMP client connects to SMP routers and manages simplex messaging queues, the fundamental addressing primitive of the SimpleX Network. Each simplex queue is a unidirectional, ordered sequence of fixed-size packets (16,384 bytes) with separate cryptographic credentials for sending and receiving. The queue model and command set are defined in the [SMP protocol](../protocol/simplex-messaging.md). ### Capabilities @@ -31,17 +31,30 @@ The client uses a functional Haskell API with STM queues for asynchronous event Routers are identified by the SHA-256 hash of their CA certificate fingerprint, not by hostname. The client validates the full X.509 certificate chain on every TLS connection and compares the CA fingerprint against the expected hash from the queue address. This means a DNS or IP-level attacker who cannot produce the correct certificate is detected at connection time. +## SMPClientAgent + +**Source**: [`Simplex.Messaging.Client.Agent`](../src/Simplex/Messaging/Client/Agent.hs). For architecture and module specs, see [SMPClientAgent](../spec/clients.md#smpclientagent). + +Connection manager that multiplexes multiple SMP client connections. Maintains one ProtocolClient per SMP router, tracks queue and service subscriptions, and handles reconnection with exponential backoff. Used by the SMP router (for proxy forwarding) and the NTF router (for message subscriptions). + +### Capabilities + +- **Connection pooling**: maintains a pool of ProtocolClient connections keyed by SMP router, creating connections on demand and reusing existing ones +- **Subscription tracking**: tracks active and pending subscriptions (both queue-based and service-based) with automatic state transitions on connect/disconnect +- **Automatic reconnection**: on connection loss, moves subscriptions from active to pending, then spawns a background worker that retries with backoff and resubscribes +- **Session-scoped disconnect handling**: uses session IDs to ensure only subscriptions belonging to the disconnected session are affected, preventing races with newly established connections + ## XFTP Client **Source**: [`Simplex.FileTransfer.Client`](../src/Simplex/FileTransfer/Client.hs). For architecture and module specs, see [XFTP Client](../spec/clients.md#xftp-client). -The XFTP client connects to XFTP routers and manages data packets — individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The XFTP protocol runs over HTTP/2, simplifying browser integration. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). +The XFTP client connects to XFTP routers and manages data packets, individually addressed blocks used for larger payload delivery. Data packets come in fixed sizes (64KB, 256KB, 1MB, 4MB), hiding the actual payload size. The XFTP protocol runs over HTTP/2, simplifying browser integration. The data packet lifecycle and command set are defined in the [XFTP protocol](../protocol/xftp.md). ### Capabilities - **Data packet creation**: create data packets on routers with sender, recipient, and optional additional recipient credentials. See the [XFTP protocol](../protocol/xftp.md) for credential roles and packet lifecycle. - **Send** (FPUT): send encrypted data to the router in a single HTTP/2 streaming request (command + body) -- **Receive** (FGET): receive data packets with per-request ephemeral Diffie-Hellman key exchange, providing forward secrecy — compromising one DH key does not reveal other received data packets +- **Receive** (FGET): receive data packets with per-request ephemeral Diffie-Hellman key exchange, providing forward secrecy: compromising one DH key does not reveal other received data packets - **Acknowledgment and deletion**: recipients acknowledge receipt; senders delete data packets after delivery ## NTF Client @@ -69,15 +82,15 @@ These libraries are appropriate when the application manages its own encryption The following capabilities require the [Agent](AGENT.md) (Layer 3): -- **Duplex connections** — the Agent pairs two simplex queues into a duplex connection -- **End-to-end encryption** — the Agent manages double ratchet with post-quantum extensions -- **File transfer** — the Agent handles chunking, encryption, padding, multi-router distribution, and reassembly -- **Queue rotation** — the Agent transparently rotates queues to limit metadata correlation -- **Connection discovery** — connection links, short links, and contact addresses are Agent-level abstractions -- **Push notifications** — notification token management and subscription is Agent-level +- **Duplex connections** - the Agent pairs two simplex queues into a duplex connection +- **End-to-end encryption** - the Agent manages double ratchet with post-quantum extensions +- **File transfer** - the Agent handles chunking, encryption, padding, multi-router distribution, and reassembly +- **Queue rotation** - the Agent transparently rotates queues to limit metadata correlation +- **Connection discovery** - connection links, short links, and contact addresses are Agent-level abstractions +- **Push notifications** - notification token management and subscription is Agent-level ## Protocol references -- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format, commands, and security properties -- [XFTP Protocol](../protocol/xftp.md) — XFTP wire format, data packet lifecycle -- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture, trust model, and design rationale +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) - SMP wire format, commands, and security properties +- [XFTP Protocol](../protocol/xftp.md) - XFTP wire format, data packet lifecycle +- [SimpleX Network overview](../protocol/overview-tjr.md) - architecture, trust model, and design rationale diff --git a/docs/ROUTERS.md b/docs/ROUTERS.md index 29d518d65..d938dc40f 100644 --- a/docs/ROUTERS.md +++ b/docs/ROUTERS.md @@ -1,4 +1,4 @@ -# SimpleX Routers — Deployment and Configuration +# SimpleX Routers: Deployment and Configuration SimpleX routers are the network infrastructure of the [SimpleX Network](../protocol/overview-tjr.md). They accept, buffer, and deliver data packets between endpoints. Each router operates independently and can be run by any party on standard computing hardware. @@ -6,7 +6,7 @@ This document covers deployment and advanced configuration. For an overview of t ## SMP Router -The SMP router provides messaging queues — unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). For architecture and module specs, see [SMP Router](../spec/routers.md#smp-router). +The SMP router provides messaging queues - unidirectional, ordered sequences of fixed-size packets (16,384 bytes each). It implements the [SimpleX Messaging Protocol](../protocol/simplex-messaging.md). For architecture and module specs, see [SMP Router](../spec/routers.md#smp-router). ### Advanced configuration @@ -35,7 +35,7 @@ echo 'PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"' >> ~/.zprofile ## XFTP Router -The XFTP router accepts and delivers data packets over HTTP/2 — individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. The use of HTTP/2 simplifies browser integration. For architecture and module specs, see [XFTP Router](../spec/routers.md#xftp-router). +The XFTP router accepts and delivers data packets over HTTP/2 - individually addressed blocks in fixed sizes (64KB, 256KB, 1MB, 4MB). It implements the [XFTP protocol](../protocol/xftp.md). Data packets are used for larger payload delivery (files, media) where SMP queue packet sizes would be inefficient. The use of HTTP/2 simplifies browser integration. For architecture and module specs, see [XFTP Router](../spec/routers.md#xftp-router). Initialize with `xftp-server init` and configure storage quota in `xftp-server.ini`. @@ -66,7 +66,7 @@ Prebuilt images are available from [Docker Hub](https://hub.docker.com/r/simplex 2. Run: - **SMP router** — change `your_ip_or_domain`; `-e "PASS=password"` is optional: + **SMP router** - change `your_ip_or_domain`; `-e "PASS=password"` is optional: ```sh docker run -d \ -e "ADDR=your_ip_or_domain" \ @@ -77,7 +77,7 @@ Prebuilt images are available from [Docker Hub](https://hub.docker.com/r/simplex simplexchat/smp-server:latest ``` - **XFTP router** — change `your_ip_or_domain` and `maximum_storage`: + **XFTP router** - change `your_ip_or_domain` and `maximum_storage`: ```sh docker run -d \ -e "ADDR=your_ip_or_domain" \ @@ -160,12 +160,12 @@ Then run with the same Docker commands as above, replacing `simplexchat/smp-serv ### Linode StackScript -[Deploy via Linode StackScript](https://cloud.linode.com/stackscripts/748014) — Shared CPU Nanode with 1GB is sufficient. +[Deploy via Linode StackScript](https://cloud.linode.com/stackscripts/748014). Shared CPU Nanode with 1GB is sufficient. Configuration options: - SMP Server store log flag for queue persistence (recommended) - [Linode API token](https://www.linode.com/docs/guides/getting-started-with-the-linode-api#get-an-access-token) for automatic DNS and tagging (scopes: read/write for "linodes" and "domains") -- Domain name (e.g., `smp1.example.com`) — the [domain must exist](https://cloud.linode.com/domains/create) in your Linode account +- Domain name (e.g., `smp1.example.com`) - the [domain must exist](https://cloud.linode.com/domains/create) in your Linode account After deployment (up to 5 minutes), get the server address from Linode tags or SSH: `smp://@`. @@ -188,7 +188,7 @@ SMP and XFTP routers expose Prometheus metrics via a control port. The control p ## Protocol references -- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) — SMP wire format and security properties -- [XFTP Protocol](../protocol/xftp.md) — data packet protocol -- [Push Notifications Protocol](../protocol/push-notifications.md) — NTF protocol -- [SimpleX Network overview](../protocol/overview-tjr.md) — architecture and trust model +- [SimpleX Messaging Protocol](../protocol/simplex-messaging.md) - SMP wire format and security properties +- [XFTP Protocol](../protocol/xftp.md) - data packet protocol +- [Push Notifications Protocol](../protocol/push-notifications.md) - NTF protocol +- [SimpleX Network overview](../protocol/overview-tjr.md) - architecture and trust model diff --git a/spec/clients.md b/spec/clients.md index 871d1b4fd..a99686ab0 100644 --- a/spec/clients.md +++ b/spec/clients.md @@ -14,9 +14,9 @@ Generic protocol client used for both SMP and NTF connections. Manages a single ### Component topology -![SMP Client — Component Topology](diagrams/smp-client.svg) +![SMP Client - Component Topology](diagrams/smp-client.svg) -### Command/response flow +### Command/result flow ```mermaid sequenceDiagram @@ -33,7 +33,7 @@ sequenceDiagram participant Router as SMP Router - C->>SC: mkTransmission (generate CorrId, create Request with empty responseVar) + C->>SC: mkTransmission
(generate CorrId, create Request
with empty responseVar) C->>SQ: write (Request, encoded command) S->>SQ: read S-->>S: check pending flag (drop if timed out) @@ -65,15 +65,19 @@ Connection manager that multiplexes multiple ProtocolClient connections. Tracks ### Component topology -![SMPClientAgent — Component Topology](diagrams/smp-client-agent.svg) +![SMPClientAgent - Component Topology](diagrams/smp-client-agent.svg) ### Connection lifecycle ```mermaid sequenceDiagram participant C as Consumer
(router / app) - participant A as SMPClientAgent - participant PC as ProtocolClient + + box + participant A as SMPClientAgent + participant PC as ProtocolClient + end + participant Router as SMP Router C->>A: getSMPServerClient'' (server) @@ -112,11 +116,11 @@ sequenceDiagram **Module specs**: [Client](modules/Simplex/FileTransfer/Client.md) · [Protocol](modules/Simplex/FileTransfer/Protocol.md) · [HTTP/2 Client](modules/Simplex/Messaging/Transport/HTTP2/Client.md) -Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own — each operation is a synchronous HTTP/2 request/response. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. +Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own; each operation is a synchronous HTTP/2 request/response. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. ### Component topology -![XFTP Client — Component Topology](diagrams/xftp-client.svg) +![XFTP Client - Component Topology](diagrams/xftp-client.svg) ### Upload/download flow @@ -156,7 +160,7 @@ sequenceDiagram **Module specs**: [Client](modules/Simplex/Messaging/Notifications/Client.md) · [Protocol](modules/Simplex/Messaging/Notifications/Protocol.md) -Type alias for ProtocolClient — same architecture as SMP Client: +Type alias for ProtocolClient - same architecture as SMP Client: ```haskell type NtfClient = ProtocolClient NTFVersion ErrorType NtfResponse diff --git a/spec/diagrams/ntf-router.svg b/spec/diagrams/ntf-router.svg index b42429459..c6c9c2431 100644 --- a/spec/diagrams/ntf-router.svg +++ b/spec/diagrams/ntf-router.svg @@ -10,9 +10,6 @@ - - NTF Router -- Component Topology - diff --git a/spec/diagrams/smp-client-agent.svg b/spec/diagrams/smp-client-agent.svg index 4257726ec..be76d2233 100644 --- a/spec/diagrams/smp-client-agent.svg +++ b/spec/diagrams/smp-client-agent.svg @@ -1,4 +1,4 @@ - + @@ -10,137 +10,134 @@ - - SMPClientAgent -- Component Topology + + consumer + (NTF router / + SMP proxy / + application) - - consumer - (NTF router / - SMP proxy / - application) - - - + - msgQ - (TBQueue, server messages) + agentQ + (TBQueue SMPClientAgentEvent) - - + - - + CAConnected / Disconnected + CASubscribed / SubError + CAServiceDisconnected + CAServiceSubscribed / SubError + CAServiceUnavailable + + + - agentQ - (TBQueue SMPClientAgentEvent) + msgQ + (TBQueue, server messages) - - + - - CAConnected - CADisconnected - CASubscribed / CASubError - CAServiceDisconnected - CAServiceSubscribed / SubError - - - SMPClientAgent (connection manager) + SMPClientAgent (connection manager) - - smpClients - (TMap SMPServer SMPClientVar) - - - - activeQueueSubs / pendingQueueSubs - (TMap SMPServer (TMap QueueId ...)) - activeServiceSubs / pendingServiceSubs - (TMap SMPServer (TVar (Maybe ...))) + smpClients + (TMap SMPServer SMPClientVar) - - - - - - + - smpSubWorkers - (one per server) + smpSubWorkers + (one per server) - - smpClients (reconnect, dashed) --> + - reconnect + resubscribe + reconnect + + + + activeQueueSubs / pendingQueueSubs + (TMap SMPServer (TMap QueueId ...)) + activeServiceSubs / pendingServiceSubs + (TMap SMPServer (TVar (Maybe ...))) - - getSMPServerClient'': get or create client - connectClient: create ProtocolClient, register disconnect handler - on disconnect: filter by SessionId, move active → pending, notify agentQ, spawn worker - worker: retry connect with backoff, resubscribe pending subs - subscribeQueuesNtfs / subscribeServiceNtfs: subscribe + track state + + - - ProtocolClient connections (one per SMP Router) + ProtocolClient connections (one per SMP Router) - - ProtocolClient - → SMP Router A + ProtocolClient + → SMP Router A - - ProtocolClient - → SMP Router B + ProtocolClient + → SMP Router B - - ProtocolClient - → SMP Router N + ProtocolClient + → SMP Router N - ... + ... - - + + getSMPServerClient'': get or create client + connectClient: create ProtocolClient, register disconnect handler + on disconnect: filter by SessionId, move active to pending, notify agentQ, spawn worker + worker: retry connect with backoff, resubscribe pending subs + subscribeQueuesNtfs / subscribeServiceNtfs: subscribe + track state + - - - ProtocolClient + ProtocolClient - - state / queue + state / queue - - background worker + background worker - - Solid arrows: TBQueue flow. Dashed: reconnection / resubscription. + + Solid arrows: TBQueue flow. Dashed: reconnection. diff --git a/spec/diagrams/smp-client.svg b/spec/diagrams/smp-client.svg index d537ec8d0..754c41c89 100644 --- a/spec/diagrams/smp-client.svg +++ b/spec/diagrams/smp-client.svg @@ -1,4 +1,4 @@ - + @@ -10,139 +10,160 @@ - - SMP Client (ProtocolClient) -- Component Topology - - - per connection (raceAny_ -- any thread exit tears down connection) + per connection (raceAny_: any exit tears down all threads) + + + + monitor + optional + + + + PING - - caller - (Agent/ - router) + - - + caller + (Agent / + router) + + + - commands + + + + sendProtocolCommand + mkTransmission + + + - - sndQ - (TBQueue, 64) + sndQ + (TBQueue, 64) - - - send + send - - Router (exits per-connection box) --> + - - - receive + + SMP + Router + (TLS) + + + + + + insert + + + + sentCommands + (TMap CorrId Request) + + + + responseVar + (TMVar) - - receive (enters per-connection box) --> + - - SMP - Router - (TLS) + + + receive - - - rcvQ - (TBQueue, 64) + rcvQ + (TBQueue, 64) + + - - - - process - - - - sentCommands - (TMap CorrId Request) - - - sentCommands (match CorrId, dashed upward) --> + - match + match - - - responseVar - (TMVar) - - - - monitor + + + process - - - PING + - - optional + + + events - - + - msgQ (optional) - (TBQueue, server events) - - - - events (empty CorrId) + msgQ (optional) + (TBQueue, server events) - - to Agent / SMPClientAgent + to Agent / SMPClientAgent - - - thread + thread - - queue / state + queue / state + + + API entry point - - optional + optional - + Solid arrows: TBQueue flow. Dashed: STM lookups / TMVar responses. diff --git a/spec/diagrams/smp-router.svg b/spec/diagrams/smp-router.svg index 09a198bc9..819da8a36 100644 --- a/spec/diagrams/smp-router.svg +++ b/spec/diagrams/smp-router.svg @@ -10,13 +10,10 @@ - - SMP Router -- Component Topology - - per client connection (raceAny_ -- any thread exit tears down connection) + per client connection (raceAny_: any exit tears down connection) - - XFTP Client -- Component Topology - diff --git a/spec/diagrams/xftp-router.svg b/spec/diagrams/xftp-router.svg index 6f13221f9..5a5146275 100644 --- a/spec/diagrams/xftp-router.svg +++ b/spec/diagrams/xftp-router.svg @@ -10,9 +10,6 @@ - - XFTP Router -- Component Topology - diff --git a/spec/routers.md b/spec/routers.md index b66c52ce4..6e8be5e8b 100644 --- a/spec/routers.md +++ b/spec/routers.md @@ -12,7 +12,7 @@ For deployment and configuration, see [docs/ROUTERS.md](../docs/ROUTERS.md). For ### Component topology -![SMP Router — Component Topology](diagrams/smp-router.svg) +![SMP Router - Component Topology](diagrams/smp-router.svg) ### Packet delivery flow @@ -74,7 +74,7 @@ sequenceDiagram ### Component topology -![XFTP Router — Component Topology](diagrams/xftp-router.svg) +![XFTP Router - Component Topology](diagrams/xftp-router.svg) ### Data packet delivery flow @@ -124,7 +124,7 @@ sequenceDiagram ### Component topology -![NTF Router — Component Topology](diagrams/ntf-router.svg) +![NTF Router - Component Topology](diagrams/ntf-router.svg) ### Token registration and notification delivery From 8e294cb72dcad242df118473346e85dadf980ce6 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 14:47:41 +0000 Subject: [PATCH 37/61] agent diagrams --- docs/AGENT.md | 2 +- spec/agent.md | 134 ++++++++++++++++++++++++-- spec/clients.md | 2 +- spec/diagrams/agent.svg | 202 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 331 insertions(+), 9 deletions(-) create mode 100644 spec/diagrams/agent.svg diff --git a/docs/AGENT.md b/docs/AGENT.md index 83050013f..415ba08f5 100644 --- a/docs/AGENT.md +++ b/docs/AGENT.md @@ -2,7 +2,7 @@ The SimpleX Agent builds duplex encrypted connections on top of [SimpleX client libraries](CLIENT.md). It manages the full lifecycle of secure communication: connection establishment, end-to-end encryption, queue rotation, file transfer, and push notifications. -This is **Layer 3** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers; Layer 2 is the [client libraries](CLIENT.md) that speak the wire protocols. The Agent adds the connection semantics that applications need. +This is **Layer 3** of the [SimpleX Network architecture](../protocol/overview-tjr.md). Layer 1 is the routers; Layer 2 is the [client libraries](CLIENT.md) that speak the wire protocols. The Agent adds the connection semantics that applications need. For internal architecture diagrams (thread topology, message processing flows), see [`spec/agent.md`](../spec/agent.md). **Source**: [`Simplex.Messaging.Agent`](../src/Simplex/Messaging/Agent.hs). **Module spec**: [`spec/modules/Simplex/Messaging/Agent.md`](../spec/modules/Simplex/Messaging/Agent.md) diff --git a/spec/agent.md b/spec/agent.md index 250bf2253..f77ccd4a4 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -1,13 +1,133 @@ -# SMP Agent +# Agent Architecture -> SMP agent implementation: duplex connections, queue rotation, ratchet sync, and notification subscriptions. +The SimpleX Agent is the Layer 3 connection manager. It builds duplex encrypted connections on top of Layer 2 client libraries. This document shows its internal architecture: component topology and message processing flows. -## Duplex Connections +For usage and API overview, see [docs/AGENT.md](../docs/AGENT.md). For protocol specifications, see [Agent Protocol](../protocol/agent-protocol.md), [PQDR](../protocol/pqdr.md). -## Queue Rotation +--- -## Ratchet Sync +**Module specs**: [Agent](modules/Simplex/Messaging/Agent.md) · [Agent Client](modules/Simplex/Messaging/Agent/Client.md) · [Agent Protocol](modules/Simplex/Messaging/Agent/Protocol.md) · [Store Interface](modules/Simplex/Messaging/Agent/Store/Interface.md) · [NtfSubSupervisor](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) · [XFTP Agent](modules/Simplex/FileTransfer/Agent.md) · [Ratchet](modules/Simplex/Messaging/Crypto/Ratchet.md) -## Notification Subscriptions +### Component topology -## Functions +![Agent - Component Topology](diagrams/agent.svg) + +### Message receive flow + +```mermaid +sequenceDiagram + participant R as SMP Router + + box Agent + participant SC as smpClients
(ProtocolClient pool) + participant MQ as msgQ
(TBQueue) + participant S as subscriber + participant St as Store + participant SQ as subQ
(TBQueue) + end + + participant App as Application + + R->>SC: MSG (encrypted packet) + SC->>MQ: write batch + + S->>MQ: read batch + S->>S: withConnLock
(serialize per connection) + S->>St: load ratchet state
(lockConnForUpdate) + S->>S: agentRatchetDecrypt
(double ratchet) + S->>S: checkMsgIntegrity
(sequence + hash chain) + S->>St: store received message,
update ratchet + S->>SQ: write AEvt (MSG + metadata) + + App->>SQ: read event + + Note over App: application processes message + + App->>S: ackMessage (agentMsgId) + Note over S,R: ACK is async
(enqueued as internal command) + S->>SC: ACK + SC->>R: ACK +``` + +### Message send flow + +```mermaid +sequenceDiagram + participant App as Application + + box Agent + participant API as sendMessage + participant St as Store + participant DW as deliveryWorker
(per send queue) + participant SC as smpClients
(ProtocolClient pool) + end + + participant R as SMP Router + + App->>API: sendMessage(connId, body) + API->>St: agentRatchetEncryptHeader
(advance ratchet, store
encrypt key + pending message) + API->>DW: signal doWork (TMVar) + API->>App: return msgId + + DW->>St: getPendingQueueMsg + DW->>DW: rcEncryptMsg
(encrypt body with stored key) + DW->>DW: encode AgentMsgEnvelope + DW->>SC: sendAgentMessage
(per-queue encrypt + SEND) + SC->>R: SEND (encrypted packet) + R->>SC: OK + + DW->>St: delete pending message + DW->>App: SENT msgId (via subQ) +``` + +### Connection establishment flow + +```mermaid +sequenceDiagram + participant A as Alice (initiator) + + box Agent A + participant AA as Agent + end + + participant SMP as SMP Router + + box Agent B + participant AB as Agent + end + + participant B as Bob (joiner) + + A->>AA: createConnection + AA->>SMP: NEW (Alice's receive queue) + SMP->>AA: queue ID + keys + AA->>A: invitation URI
(queue address + DH keys) + + Note over A,B: invitation passed out-of-band
(QR code, link) + + B->>AB: joinConnection(invitation) + AB->>AB: initSndRatchet
(PQ X3DH key agreement) + AB->>SMP: NEW (Bob's receive queue) + SMP->>AB: queue ID + AB->>SMP: KEY (secure Alice's queue) + AB->>SMP: SEND confirmation to
Alice's queue (Bob's queue
address + ratchet keys) + + SMP->>AA: MSG (confirmation) + AA->>AA: initRcvRatchet
(PQ X3DH key agreement),
decrypt confirmation + AA->>A: CONF (request approval) + A->>AA: allowConnection(confId) + AA->>SMP: SKEY (secure Alice's rcv queue) + AA->>SMP: NEW (Alice's send queue) + AA->>SMP: SEND reply to Bob's queue
(Alice's connection info) + + SMP->>AB: MSG (reply) + AB->>SMP: SKEY (secure Bob's rcv queue) + AB->>SMP: SEND HELLO to Alice + + SMP->>AA: MSG (HELLO) + AA->>SMP: SEND HELLO to Bob + AA->>A: CON (connected) + + SMP->>AB: MSG (HELLO) + AB->>B: CON (connected) +``` diff --git a/spec/clients.md b/spec/clients.md index a99686ab0..10634d0de 100644 --- a/spec/clients.md +++ b/spec/clients.md @@ -122,7 +122,7 @@ Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own; eac ![XFTP Client - Component Topology](diagrams/xftp-client.svg) -### Upload/download flow +### Packet delivery flow ```mermaid sequenceDiagram diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg new file mode 100644 index 000000000..4b7bf802c --- /dev/null +++ b/spec/diagrams/agent.svg @@ -0,0 +1,202 @@ + + + + + + + + + + + + + Application + + + + subQ + + + + main threads (raceAny_: any exit tears down all) + + + + subscriber + (reads msgQ) + + + ntfSubQ + + + + + ntfSupervisor + (reads ntfSubQ) + + + + cleanupManager + (periodic cleanup) + + + + logServersStats + (periodic stats) + + + + worker pools (on-demand, one per queue/connection/server) + + + + delivery + (per send queue) + + + asyncCmd + (per connection) + + + smpSub + (per session) + + + + xftpRcv + (per server) + + + xftpSnd + (per server) + + + xftpDel + (per server) + + + + ntfSMP + (per SMP server) + + + ntfWorkers + (per NTF server) + + + ntfTknDel + (per NTF server) + + + + ntf workers (dispatched by ntfSupervisor) + + + + + + + + store + + + + Store + (SQLite / Postgres) + + + currentSubs + (TSessionSubs) + + + Operation State + (5-op suspension cascade) + + + + protocol client pools (lazy singleton per router) + + + smpClients + (TMap SMPTransportSession) + + + xftpClients + (TMap XFTPTransportSession) + + + ntfClients + (TMap NtfTransportSession) + + + SMP Routers + + + XFTP Routers + + + NTF Routers + + + + + msgQ + + + + + + on-demand worker + + + singleton thread + + + storage / state + + + external connection + + + Solid arrows: TBQueue connections. Dashed: store access / dispatch. Workers connect to protocol clients in their column. + + + From 7410cebac5532dd60289dd59abcac3c30564381e Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 15:35:53 +0000 Subject: [PATCH 38/61] update agent diagram --- spec/agent.md | 52 +++++++ spec/diagrams/agent.svg | 292 +++++++++++++++++++++++----------------- 2 files changed, 220 insertions(+), 124 deletions(-) diff --git a/spec/agent.md b/spec/agent.md index f77ccd4a4..a6f944260 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -131,3 +131,55 @@ sequenceDiagram SMP->>AB: MSG (HELLO) AB->>B: CON (connected) ``` + +### File delivery flow (XFTP) + +```mermaid +sequenceDiagram + participant SA as Sender App + + box Sender Agent + participant S as xftpSnd workers + participant SS as Store + end + + participant XFTP as XFTP Routers + participant SMP as SMP Router + + box Receiver Agent + participant RS as Store + participant R as xftpRcv workers + end + + participant RA as Receiver App + + SA->>S: xftpSendFile(file) + S->>S: encrypt file
(XSalsa20-Poly1305, random key + nonce) + S->>S: split into chunks
(fixed sizes: 64KB - 4MB) + S->>SS: store SndFile + chunks + + loop each chunk + S->>XFTP: FNEW (create data packet) + XFTP->>S: sender ID + recipient IDs + S->>XFTP: FPUT (upload encrypted chunk) + end + + S->>S: assemble FileDescription
(chunk locations, replicas,
encryption key + nonce) + S->>SA: SFDONE
(sender + recipient descriptions) + + Note over SA,RA: recipient description sent as
SMP message (encrypted, via double ratchet) + + SA->>SMP: description in A_MSG + SMP->>RA: description in MSG + + RA->>R: xftpReceiveFile(description) + R->>RS: store RcvFile + chunks + + loop each chunk (parallel per server) + R->>XFTP: FGET (per-recipient auth key) + XFTP->>R: encrypted chunk stream + end + + R->>R: stream chunks through
stateful decrypt (key + nonce),
verify auth tag at end + R->>RA: RFDONE (decrypted file path) +``` diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg index 4b7bf802c..9ec32bf31 100644 --- a/spec/diagrams/agent.svg +++ b/spec/diagrams/agent.svg @@ -1,4 +1,4 @@ - + @@ -8,195 +8,239 @@ markerWidth="6" markerHeight="6" orient="auto-start-reverse"> + + + - - + - Application + Application - - API arrows (down) ===== --> + + + + + + - subQ + subQ - - - main threads (raceAny_: any exit tears down all) + - - - subscriber - (reads msgQ) + + + sendMessage, createConnection + joinConnection, subscribe... - - ntfSubQ - + + + xftpSendFile + xftpReceiveFile... - - - ntfSupervisor - (reads ntfSubQ) - - - - cleanupManager - (periodic cleanup) + + + registerNtfToken + toggleConnectionNtfs... - - - logServersStats - (periodic stats) + - - + - worker pools (on-demand, one per queue/connection/server) + SMP - - + + subscriber + (reads msgQ) + + + - delivery - (per send queue) + delivery + (per send queue) - + - asyncCmd - (per connection) + asyncCmd + (per connection) - + - smpSub - (per session) + smpSub + (per session) + + + + XFTP - - + - xftpRcv - (per server) + xftpRcv + (per server + local) - + - xftpSnd - (per server) + xftpSnd + (per server + local) - + - xftpDel - (per server) + xftpDel + (per server) - - + + NTF + + + + ntfSupervisor + (reads ntfSubQ) + + + - ntfSMP - (per SMP server) + ntfWorkers + (per NTF server) - + - ntfWorkers - (per NTF server) + ntfSMP + (per SMP server) - + - ntfTknDel - (per NTF server) + ntfTknDel + (per NTF server) - - - ntf workers (dispatched by ntfSupervisor) - + + + ntfSubQ - - + + + ntfSubQ (queue rotation) - - - store + + + ntfSMP uses smpClients - - - Store - (SQLite / Postgres) + + + cleanupManager + + + logServersStats + + shared singletons (all green run in raceAny_) - + - currentSubs - (TSessionSubs) + Store + (SQLite / Postgres) - - Operation State - (5-op suspension cascade) + Operation State + (5-op suspension cascade) - - - protocol client pools (lazy singleton per router) + + + store - + - smpClients - (TMap SMPTransportSession) + smpClients + (TMap SMPTransportSession) - - xftpClients - (TMap XFTPTransportSession) + xftpClients + (TMap XFTPTransportSession) - - ntfClients - (TMap NtfTransportSession) + ntfClients + (TMap NtfTransportSession) - - SMP Routers - + SMP Routers + - XFTP Routers - XFTP Routers + - NTF Routers - NTF Routers + - - msgQ + msgQ - - - on-demand worker + on-demand worker - - singleton thread + singleton thread - - storage / state + storage / state - - external connection + external connection + + + API entry point + + + cross-protocol - - Solid arrows: TBQueue connections. Dashed: store access / dispatch. Workers connect to protocol clients in their column. + + Solid arrows: TBQueue flow. Dashed grey: store access. Dashed red: cross-protocol link. Workers use clients in their column. From 1354918ed56163c94ee599d7339931a2a423180a Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:08:53 +0000 Subject: [PATCH 39/61] improve diagram --- spec/agent.md | 8 +- spec/diagrams/agent.svg | 270 +++++++++++++++++++++++----------------- 2 files changed, 158 insertions(+), 120 deletions(-) diff --git a/spec/agent.md b/spec/agent.md index a6f944260..9074006e2 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -107,21 +107,21 @@ sequenceDiagram B->>AB: joinConnection(invitation) AB->>AB: initSndRatchet
(PQ X3DH key agreement) + AB->>SMP: SKEY (sender auth on
Alice's queue) AB->>SMP: NEW (Bob's receive queue) SMP->>AB: queue ID - AB->>SMP: KEY (secure Alice's queue) AB->>SMP: SEND confirmation to
Alice's queue (Bob's queue
address + ratchet keys) SMP->>AA: MSG (confirmation) AA->>AA: initRcvRatchet
(PQ X3DH key agreement),
decrypt confirmation AA->>A: CONF (request approval) A->>AA: allowConnection(confId) - AA->>SMP: SKEY (secure Alice's rcv queue) - AA->>SMP: NEW (Alice's send queue) + AA->>SMP: KEY (register sender key
on Alice's rcv queue) + AA->>SMP: SKEY (sender auth on
Bob's queue) AA->>SMP: SEND reply to Bob's queue
(Alice's connection info) SMP->>AB: MSG (reply) - AB->>SMP: SKEY (secure Bob's rcv queue) + AB->>SMP: KEY (register sender key
on Bob's rcv queue) AB->>SMP: SEND HELLO to Alice SMP->>AA: MSG (HELLO) diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg index 9ec32bf31..d53e3ec13 100644 --- a/spec/diagrams/agent.svg +++ b/spec/diagrams/agent.svg @@ -1,4 +1,4 @@ - + @@ -15,231 +15,269 @@ - - Application + Application + + + + subQ + (TBQueue) + + + + + + + all threads + write subQ - - - - - - subQ - - - sendMessage, createConnection - joinConnection, subscribe... + sendMessage, createConnection + joinConnection, subscribe... - - xftpSendFile - xftpReceiveFile... + xftpSendFile + xftpReceiveFile... - - registerNtfToken - toggleConnectionNtfs... + registerNtfToken + toggleConnectionNtfs... - - + - SMP + SMP + + + + msgQ + (TBQueue) - - subscriber - (reads msgQ) + subscriber + (reads msgQ) + + + + + + worker pools (on-demand, one per queue / connection / session) - - delivery - (per send queue) + delivery + (per send queue) - - asyncCmd - (per connection) + asyncCmd + (per conn + server) - - smpSub - (per session) + smpSub + (per session) - - + - XFTP + XFTP + + + worker pools (on-demand, one per server) - - xftpRcv - (per server + local) + xftpRcv + (per server + local) - - xftpSnd - (per server + local) + xftpSnd + (per server + local) - - xftpDel - (per server) + xftpDel + (per server) - - + - NTF + NTF + + + + ntfSubQ + (TBQueue) - - ntfSupervisor - (reads ntfSubQ) + ntfSupervisor + (reads ntfSubQ) + + + + + + + + + worker pools (on-demand, one per server) - - ntfWorkers - (per NTF server) + ntfWorkers + (per NTF server) - - ntfSMP - (per SMP server) + ntfSMP + (per SMP server) - - ntfTknDel - (per NTF server) - - - - ntfSubQ + ntfTknDel + (per NTF server) - - ntfSubQ (queue rotation) + ntfSubQ (queue rotation) - - ntfSMP uses smpClients + ntfSMP uses smpClients - shared singletons (all green run in raceAny_) + - cleanupManager + cleanupManager - - logServersStats - - shared singletons (all green run in raceAny_) + logServersStats - - Store - (SQLite / Postgres) + Store + (SQLite / Postgres) - - Operation State - (5-op suspension cascade) + Operation State + (5-op suspension cascade) - - store + store - - smpClients - (TMap SMPTransportSession) + smpClients + (TMap SMPTransportSession) - - xftpClients - (TMap XFTPTransportSession) + xftpClients + (TMap XFTPTransportSession) - - ntfClients - (TMap NtfTransportSession) + ntfClients + (TMap NtfTransportSession) - SMP Routers - SMP Routers + - XFTP Routers - XFTP Routers + - NTF Routers - NTF Routers + - - msgQ + msgQ - - - on-demand worker + on-demand worker - - singleton thread + singleton thread - - storage / state + queue / state - - external connection + external connection - - API entry point + API entry point - - cross-protocol + cross-protocol - + Solid arrows: TBQueue flow. Dashed grey: store access. Dashed red: cross-protocol link. Workers use clients in their column. From 021d929e66acedde6d014eac1c1bf49afe0f3602 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:15:13 +0000 Subject: [PATCH 40/61] titles --- spec/agent.md | 4 ++-- spec/clients.md | 12 ++++++------ spec/routers.md | 12 ++++++------ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/spec/agent.md b/spec/agent.md index 9074006e2..6ec9b1b72 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -8,9 +8,9 @@ For usage and API overview, see [docs/AGENT.md](../docs/AGENT.md). For protocol **Module specs**: [Agent](modules/Simplex/Messaging/Agent.md) · [Agent Client](modules/Simplex/Messaging/Agent/Client.md) · [Agent Protocol](modules/Simplex/Messaging/Agent/Protocol.md) · [Store Interface](modules/Simplex/Messaging/Agent/Store/Interface.md) · [NtfSubSupervisor](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) · [XFTP Agent](modules/Simplex/FileTransfer/Agent.md) · [Ratchet](modules/Simplex/Messaging/Crypto/Ratchet.md) -### Component topology +### Agent components -![Agent - Component Topology](diagrams/agent.svg) +![Agent components](diagrams/agent.svg) ### Message receive flow diff --git a/spec/clients.md b/spec/clients.md index 10634d0de..3ab9d5868 100644 --- a/spec/clients.md +++ b/spec/clients.md @@ -12,9 +12,9 @@ For deployment and usage, see [docs/CLIENT.md](../docs/CLIENT.md). For protocol Generic protocol client used for both SMP and NTF connections. Manages a single TLS connection with multiplexed command/response matching via correlation IDs. -### Component topology +### SMP Client components -![SMP Client - Component Topology](diagrams/smp-client.svg) +![SMP Client components](diagrams/smp-client.svg) ### Command/result flow @@ -63,9 +63,9 @@ sequenceDiagram Connection manager that multiplexes multiple ProtocolClient connections. Tracks subscriptions, handles reconnection with backoff, and forwards server messages and connection events upward. Used by SMP router (proxying) and NTF router (subscriptions). -### Component topology +### SMPClientAgent components -![SMPClientAgent - Component Topology](diagrams/smp-client-agent.svg) +![SMPClientAgent components](diagrams/smp-client-agent.svg) ### Connection lifecycle @@ -118,9 +118,9 @@ sequenceDiagram Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own; each operation is a synchronous HTTP/2 request/response. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. -### Component topology +### XFTP Client components -![XFTP Client - Component Topology](diagrams/xftp-client.svg) +![XFTP Client components](diagrams/xftp-client.svg) ### Packet delivery flow diff --git a/spec/routers.md b/spec/routers.md index 6e8be5e8b..b7b8761ef 100644 --- a/spec/routers.md +++ b/spec/routers.md @@ -10,9 +10,9 @@ For deployment and configuration, see [docs/ROUTERS.md](../docs/ROUTERS.md). For **Module specs**: [Server](modules/Simplex/Messaging/Server.md) · [Main](modules/Simplex/Messaging/Server/Main.md) · [QueueStore](modules/Simplex/Messaging/Server/QueueStore.md) · [QueueStore Postgres](modules/Simplex/Messaging/Server/QueueStore/Postgres.md) · [MsgStore](modules/Simplex/Messaging/Server/MsgStore.md) · [StoreLog](modules/Simplex/Messaging/Server/StoreLog.md) · [Control](modules/Simplex/Messaging/Server/Control.md) · [Prometheus](modules/Simplex/Messaging/Server/Prometheus.md) · [Stats](modules/Simplex/Messaging/Server/Stats.md) -### Component topology +### SMP Router components -![SMP Router - Component Topology](diagrams/smp-router.svg) +![SMP Router components](diagrams/smp-router.svg) ### Packet delivery flow @@ -72,9 +72,9 @@ sequenceDiagram **Module specs**: [Server](modules/Simplex/FileTransfer/Server.md) · [Main](modules/Simplex/FileTransfer/Server/Main.md) · [Store](modules/Simplex/FileTransfer/Server/Store.md) · [StoreLog](modules/Simplex/FileTransfer/Server/StoreLog.md) · [Stats](modules/Simplex/FileTransfer/Server/Stats.md) · [Transport](modules/Simplex/FileTransfer/Transport.md) -### Component topology +### XFTP Router components -![XFTP Router - Component Topology](diagrams/xftp-router.svg) +![XFTP Router components](diagrams/xftp-router.svg) ### Data packet delivery flow @@ -122,9 +122,9 @@ sequenceDiagram **Module specs**: [Server](modules/Simplex/Messaging/Notifications/Server.md) · [Main](modules/Simplex/Messaging/Notifications/Server/Main.md) · [Store Postgres](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) · [APNS](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) · [Control](modules/Simplex/Messaging/Notifications/Server/Control.md) · [Client](modules/Simplex/Messaging/Notifications/Client.md) · [Protocol](modules/Simplex/Messaging/Notifications/Protocol.md) -### Component topology +### NTF Router components -![NTF Router - Component Topology](diagrams/ntf-router.svg) +![NTF Router components](diagrams/ntf-router.svg) ### Token registration and notification delivery From 958f030899ba5c06a28865881a8248ef1291bdcc Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:21:49 +0000 Subject: [PATCH 41/61] improve diagram --- spec/diagrams/agent.svg | 248 +++++++++++++++++++++------------------- 1 file changed, 128 insertions(+), 120 deletions(-) diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg index d53e3ec13..d885d4b25 100644 --- a/spec/diagrams/agent.svg +++ b/spec/diagrams/agent.svg @@ -1,4 +1,4 @@ - + @@ -29,256 +29,264 @@ - - + + + + + + + + + - all threads - write subQ - - - - - sendMessage, createConnection - joinConnection, subscribe... + sendMessage, createConnection + joinConnection, subscribe... - - xftpSendFile - xftpReceiveFile... + xftpSendFile + xftpReceiveFile... - - registerNtfToken - toggleConnectionNtfs... + registerNtfToken + toggleConnectionNtfs... - - SMP + SMP - - msgQ - (TBQueue) + msgQ + (TBQueue) - - subscriber - (reads msgQ) + subscriber + (reads msgQ) - - worker pools (on-demand, one per queue / connection / session) + worker pools (on-demand, one per queue / conn+server / session) - - delivery - (per send queue) + delivery + (per send queue) - - asyncCmd - (per conn + server) + asyncCmd + (per conn + server) - - smpSub - (per session) + smpSub + (per session) - - XFTP + XFTP - worker pools (on-demand, one per server) + worker pools (on-demand, one per server) - - xftpRcv - (per server + local) + xftpRcv + (per server + local) - - xftpSnd - (per server + local) + xftpSnd + (per server + local) - - xftpDel - (per server) + xftpDel + (per server) - - NTF + NTF - - ntfSubQ - (TBQueue) + ntfSubQ + (TBQueue) - - ntfSupervisor - (reads ntfSubQ) + ntfSupervisor + (reads ntfSubQ) - - - worker pools (on-demand, one per server) + worker pools (on-demand, one per server) - - ntfWorkers - (per NTF server) + ntfWorkers + (per NTF server) - - ntfSMP - (per SMP server) + ntfSMP + (per SMP server) - - ntfTknDel - (per NTF server) + ntfTknDel + (per NTF server) - - ntfSubQ (queue rotation) - - - - ntfSMP uses smpClients + ntfSubQ (queue rotation) - shared singletons (all green run in raceAny_) - shared singletons (all green run in raceAny_) + - cleanupManager + cleanupManager - - logServersStats + logServersStats + + + + ntfSMP uses smpClients - - Store - (SQLite / Postgres) + Store + (SQLite / Postgres) - - Operation State - (5-op suspension cascade) + Operation State + (5-op suspension cascade) - - store + store - - smpClients - (TMap SMPTransportSession) + smpClients + (TMap SMPTransportSession) - - xftpClients - (TMap XFTPTransportSession) + xftpClients + (TMap XFTPTransportSession) - - ntfClients - (TMap NtfTransportSession) + ntfClients + (TMap NtfTransportSession) - SMP Routers - SMP Routers + - XFTP Routers - XFTP Routers + - NTF Routers - NTF Routers + - - msgQ + msgQ - - - on-demand worker + on-demand worker - - singleton thread + singleton thread - - queue / state + queue / state - - external connection + external connection - - API entry point + API entry point - - cross-protocol + cross-protocol - - Solid arrows: TBQueue flow. Dashed grey: store access. Dashed red: cross-protocol link. Workers use clients in their column. + + Solid arrows: TBQueue flow. Dashed grey: store access. Dashed red: cross-protocol link. All threads and pools write events to subQ. From 1ed344405700d26695d244670f99d2d0ce5bbb30 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:36:54 +0000 Subject: [PATCH 42/61] improve diagram 2 --- spec/diagrams/agent.svg | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg index d885d4b25..2c4569f78 100644 --- a/spec/diagrams/agent.svg +++ b/spec/diagrams/agent.svg @@ -29,18 +29,15 @@ - + - - - - - + + + + - - + From 05824293e548452c0a9ea9cf9d569c5b3ab079b1 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:55:34 +0000 Subject: [PATCH 43/61] more diagram --- spec/diagrams/agent.svg | 47 ++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/spec/diagrams/agent.svg b/spec/diagrams/agent.svg index 2c4569f78..9d7226e52 100644 --- a/spec/diagrams/agent.svg +++ b/spec/diagrams/agent.svg @@ -75,20 +75,20 @@ fill="none" stroke="#888" stroke-dasharray="6,3" /> SMP - - + - msgQ - (TBQueue) + msgQ + (TBQueue) - - + - subscriber - (reads msgQ) + subscriber + (reads msgQ) - @@ -143,24 +143,24 @@ fill="none" stroke="#888" stroke-dasharray="6,3" /> NTF - - + - ntfSubQ - (TBQueue) + ntfSubQ + (TBQueue) - - + - ntfSupervisor - (reads ntfSubQ) + ntfSupervisor + (reads ntfSubQ) - - @@ -185,9 +185,9 @@ (per NTF server) - - ntfSubQ (queue rotation) + ntfSubQ (queue rotation) shared singletons (all green run in raceAny_) @@ -249,10 +249,9 @@ - - msgQ (left margin) ===== --> + - msgQ Date: Sat, 14 Mar 2026 17:03:53 +0000 Subject: [PATCH 44/61] sequence diagrams layout --- spec/agent.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/spec/agent.md b/spec/agent.md index 6ec9b1b72..02f93314b 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -64,7 +64,7 @@ sequenceDiagram participant R as SMP Router - App->>API: sendMessage(connId, body) + App->>API: sendMessage
(connId, body) API->>St: agentRatchetEncryptHeader
(advance ratchet, store
encrypt key + pending message) API->>DW: signal doWork (TMVar) API->>App: return msgId @@ -99,16 +99,16 @@ sequenceDiagram participant B as Bob (joiner) A->>AA: createConnection - AA->>SMP: NEW (Alice's receive queue) + AA->>SMP: NEW
(Alice's receive queue) SMP->>AA: queue ID + keys AA->>A: invitation URI
(queue address + DH keys) Note over A,B: invitation passed out-of-band
(QR code, link) - B->>AB: joinConnection(invitation) + B->>AB: joinConnection
(invitation) AB->>AB: initSndRatchet
(PQ X3DH key agreement) AB->>SMP: SKEY (sender auth on
Alice's queue) - AB->>SMP: NEW (Bob's receive queue) + AB->>SMP: NEW
(Bob's receive queue) SMP->>AB: queue ID AB->>SMP: SEND confirmation to
Alice's queue (Bob's queue
address + ratchet keys) @@ -172,7 +172,7 @@ sequenceDiagram SA->>SMP: description in A_MSG SMP->>RA: description in MSG - RA->>R: xftpReceiveFile(description) + RA->>R: xftpReceiveFile
(description) R->>RS: store RcvFile + chunks loop each chunk (parallel per server) @@ -181,5 +181,5 @@ sequenceDiagram end R->>R: stream chunks through
stateful decrypt (key + nonce),
verify auth tag at end - R->>RA: RFDONE (decrypted file path) + R->>RA: RFDONE
(decrypted file path) ``` From b62a22472e222a9cf6842a8c3b911dfc8c7e7c63 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 19:18:14 +0000 Subject: [PATCH 45/61] agent topics --- spec/agent/infrastructure.md | 197 +++++++++++++++++++++++++++++++++++ spec/agent/xrcp.md | 101 ++++++++++++++++++ 2 files changed, 298 insertions(+) create mode 100644 spec/agent/infrastructure.md create mode 100644 spec/agent/xrcp.md diff --git a/spec/agent/infrastructure.md b/spec/agent/infrastructure.md new file mode 100644 index 000000000..ad84fd6e7 --- /dev/null +++ b/spec/agent/infrastructure.md @@ -0,0 +1,197 @@ +# Agent Infrastructure + +The Agent's internal machinery: worker lifecycle, command dispatch, message delivery, subscription tracking, operation suspension, protocol client management, and dual-backend store. These cross-module patterns are not visible from any single module spec. + +This document covers the "big agent" (`Agent.hs` + `Agent/Client.hs`) used in client applications. The "small agent" (`SMPClientAgent`) used in routers is documented in [clients.md](../clients.md). + +For per-module details: [Agent](../modules/Simplex/Messaging/Agent.md) · [Agent Client](../modules/Simplex/Messaging/Agent/Client.md) · [Store Interface](../modules/Simplex/Messaging/Agent/Store/Interface.md) · [NtfSubSupervisor](../modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) · [XFTP Agent](../modules/Simplex/FileTransfer/Agent.md). For the component diagram, see [agent.md](../agent.md). + +- [Worker framework](#worker-framework) +- [Async command processing](#async-command-processing) +- [Message delivery](#message-delivery) +- [Subscription tracking](#subscription-tracking) +- [Operation suspension cascade](#operation-suspension-cascade) +- [SessionVar lifecycle](#sessionvar-lifecycle) +- [Dual-backend store](#dual-backend-store) + +--- + +## Worker framework + +**Source**: [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs), [Agent/Env/SQLite.hs](../../src/Simplex/Messaging/Agent/Env/SQLite.hs) (Worker type) + +All agent background processing - async commands, message delivery, notification workers, XFTP workers - uses a shared worker infrastructure defined in `Agent/Client.hs`. + +**Create-or-reuse**: `getAgentWorker` atomically checks a `TMap` for an existing worker keyed by the work item (connection+server, send queue address, etc.). If absent, creates a new `Worker` with a unique monotonic `workerId` from `workerSeq` and inserts it. If present and `hasWork=True`, signals the existing worker via `tryPutTMVar doWork ()`. + +**Fork and run**: `runWorkerAsync` uses bracket on the worker's `action` TMVar. If the taken value is `Nothing`, the worker is idle - start it. If `Just _`, it's already running - put it back and return. The `action` TMVar holds `Just (Weak ThreadId)` to avoid preventing GC of the worker thread. + +**Task retrieval race prevention**: `withWork` clears the `doWork` flag *before* calling `getWork` (not after). This prevents a race: query finds nothing → another thread adds work + signals → worker clears flag (losing the signal). By clearing first, any signal that arrives during the query is preserved. + +**Error classification**: `withWork` distinguishes two failure modes: +- *Work-item error* (`isWorkItemError`): the task itself is broken (likely recurring). Worker stops and sends `CRITICAL False`. +- *Store error*: transient database issue. Worker re-signals `doWork` and reports `INTERNAL` (retry may succeed). + +**Restart rate limiting**: On worker exit, `restartOrDelete` checks the `restarts` counter against `maxWorkerRestartsPerMin`. Under the limit: reset action, re-signal, restart. Over the limit: delete the worker from the map and send `CRITICAL True` (escalation to the application). A restart only proceeds if the `workerId` in the map still matches the current worker - a stale restart from a replaced worker is a no-op. + +**Consumers**: Four families use this framework: +- Async command workers - keyed by `(ConnId, Maybe SMPServer)`, in `asyncCmdWorkers` TMap +- Delivery workers - keyed by `SndQAddr`, in `smpDeliveryWorkers` TMap, paired with a `TMVar ()` retry lock +- NTF workers - three pools (`ntfWorkers` per NTF server, `ntfSMPWorkers` per SMP server, `ntfTknDelWorkers` for token deletion) in `NtfSubSupervisor` +- XFTP workers - three worker types (rcv, snd, del) with TMVar-based connection sharing + +--- + +## Async command processing + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs) (command types), [Agent/Store.hs](../../src/Simplex/Messaging/Agent/Store.hs) (internal command types) + +Async commands handle state transitions that require network calls but shouldn't block the API thread: securing queues, deleting old queues during rotation, acknowledging messages. The dispatch loop `runCommandProcessing` runs one worker per `(ConnId, Maybe SMPServer)` key. + +**Enqueueing**: API functions call `enqueueCommand`, which persists the command to the `commands` table (crash-safe) and spawns/wakes the worker via `getAsyncCmdWorker`. On agent startup, `resumeAllCommands` fetches all pending commands grouped by connection+server and signals their workers. + +**Command types**: Two categories share the same dispatch loop: +- *Client commands* (`AClientCommand`): `NEW`, `JOIN`, `LET` (allow connection), `ACK`, `LSET`/`LGET` (set/get connection link data), `SWCH` (switch queue), `DEL`. Triggered by application API calls. +- *Internal commands* (`AInternalCommand`): `ICAck` (ack to router), `ICAckDel` (ack + delete local message), `ICAllowSecure`/`ICDuplexSecure` (secure after confirmation), `ICQSecure` (secure queue during switch), `ICQDelete` (delete old queue after switch), `ICDeleteConn` (delete connection), `ICDeleteRcvQueue` (delete specific receive queue). Generated *during* message processing to handle state transitions asynchronously. + +**Retry and movement**: `tryMoveableCommand` wraps execution with `withRetryInterval`. On `temporaryOrHostError`, it retries with backoff. On cross-server errors (e.g., queue moved to different router), it updates the command's server field in the store (`CCMoved`) and retries against the new server. + +**Locking**: State-sensitive commands use `tryWithLock` / `tryMoveableWithLock`, which acquire `withConnLock` before execution. This serializes operations on the same connection, preventing races between concurrent command processing and message receipt. + +**Event overflow**: Events are written directly to `subQ` if there is room. When `subQ` is full, events overflow into a local `pendingCmds` list and are flushed to `subQ` after the command completes, providing backpressure handling. + +--- + +## Message delivery + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/RetryInterval.hs](../../src/Simplex/Messaging/Agent/RetryInterval.hs) + +Message delivery uses a split-phase encryption design: the ratchet advances in the API thread (serialized), while the actual body encryption happens in the per-queue delivery worker (parallel). This avoids ratchet lock contention across queues. + +**Phase 1 - API thread** (`enqueueMessageB`): +1. Encode the agent message with `internalSndId` + `prevMsgHash` (for the receiver's integrity chain) +2. Call `agentRatchetEncryptHeader` - advances the double ratchet, produces a message encryption key (MEK), padded length, and PQ encryption status +3. Store `SndMsg` with `SndMsgPrepData` (MEK, paddedLen, sndMsgBodyId) in the database +4. Create `SndMsgDelivery` record for each send queue +5. Increment `msgDeliveryOp.opsInProgress` (for suspension tracking) +6. Signal delivery workers via `getDeliveryWorker` + +**Phase 2 - delivery worker** (`runSmpQueueMsgDelivery`): +1. `throwWhenNoDelivery` - kills the worker thread if the queue's address has been removed from `smpDeliveryWorkers` (prevents delivery to queues replaced during switch) +2. `getPendingQueueMsg` - fetches the next pending message from the store, resolving the `sndMsgBodyId` reference into the actual message body and constructing `PendingMsgPrepData` +3. Re-encode the message with `internalSndId`/`prevMsgHash`, then `rcEncryptMsg` to encrypt with the stored MEK (no ratchet access needed) +4. `sendAgentMessage` - per-queue encrypt + SEND to the router + +**Connection info messages** (`AM_CONN_INFO`, `AM_CONN_INFO_REPLY`) skip split-phase encryption entirely - they are sent as plaintext confirmation bodies via `sendConfirmation`. + +**Retry with dual intervals**: Delivery uses `withRetryLock2`, which maintains two independent retry clocks (slow and fast). A background thread sleeps for the current interval, then signals the delivery worker via `tryPutTMVar`. When the router sends `QCONT` (queue buffer cleared), the agent calls `tryPutTMVar retryLock ()` to wake the delivery thread immediately, avoiding unnecessary delay. + +**Error handling**: +- `SMP QUOTA` - switch to slow retry, don't penalize (backpressure from router) +- `SMP AUTH` - permanent failure: for data messages, notify and delete; for handshake messages, report connection error; for queue-switch messages, report queue error +- `temporaryOrHostError` - retry with backoff +- Other errors - report to application, delete command + +--- + +## Subscription tracking + +**Source**: [Agent/TSessionSubs.hs](../../src/Simplex/Messaging/Agent/TSessionSubs.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +The agent tracks per-queue subscription state in `TSessionSubs` (defined in `Agent/TSessionSubs.hs`), keyed by `SMPTransportSession = (UserId, SMPServer, Maybe ByteString)` where the `ByteString` carries the entity ID in entity-session mode or `Nothing` in shared mode. Each transport session holds: + +``` +SessSubs +├── subsSessId :: TVar (Maybe SessionId) -- TLS session ID +├── activeSubs :: TMap RecipientId RcvQueueSub +├── pendingSubs :: TMap RecipientId RcvQueueSub +├── activeServiceSub :: TVar (Maybe ServiceSub) +└── pendingServiceSub :: TVar (Maybe ServiceSub) +``` + +**State machine**: Subscriptions move between three states: + +- **Pending → Active**: After subscription RPC succeeds, `addActiveSub'` promotes the queue - but only if the returned session ID matches the stored TLS session ID (`Just sessId == sessId'`). On mismatch (TLS reconnected between RPC send and response), the subscription is silently added to pending instead. No exception - concurrent resubscription paths handle this naturally. + +- **Active → Pending**: When `setSessionId` is called with a *different* session ID (TLS reconnect), all active subscriptions are atomically demoted to pending. Session ID is updated to the new value. + +- **Pending → Removed**: `failSubscriptions` moves permanently-failed queues (non-temporary SMP errors) to `removedSubs`. The removal is tracked for diagnostic reporting via `getSubscriptions`. + +**Service-associated queues**: Queues with `serviceAssoc=True` are *not* added to `activeSubs` individually. Instead, the service subscription's count is incremented and its `idsHash` XOR-accumulates the queue's hash. The router tracks individual queues via the service subscription; the agent only tracks the aggregate. Consequence: `hasActiveSub(rId)` returns `False` for service-associated queues - callers must check the service subscription separately. + +**Disconnect cleanup** (`smpClientDisconnected`): +1. `removeSessVar` with CAS check (monotonic `sessionVarId` prevents stale callbacks from removing newer clients) +2. `setSubsPending` - demote active→pending, filtered by matching `SessionId` only +3. Delete proxied relay sessions created by this client +4. Fire `DISCONNECT`, `DOWN` (affected connections), `SERVICE_DOWN` (if service sub existed) +5. Release GET locks for affected queues +6. Resubscribe: either spawn `resubscribeSMPSession` worker (entity-session mode) or directly resubscribe queues and services (other modes) + +**Resubscription worker**: Per-transport-session worker with exponential backoff. Loops until `pendingSubs` and `pendingServiceSub` are both empty. Uses `waitForUserNetwork` with bounded wait - proceeds even without network (prevents indefinite blocking). Worker self-cleans via `removeSessVar` on exit. + +**UP event deduplication**: After a batch subscription RPC, `UP` events are emitted only for connections that were *not* already in `activeSubs` before the batch. This prevents duplicate notifications for already-subscribed connections. + +--- + +## Operation suspension cascade + +**Source**: [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +Five `AgentOpState` TVars track in-flight operations for graceful shutdown. Each holds `{opSuspended :: Bool, opsInProgress :: Int}`. + +**Cascade ordering**: +``` +AONtfNetwork (independent - no cascading) + +AORcvNetwork → AOMsgDelivery → AOSndNetwork → AODatabase +``` + +**Mechanics**: `endAgentOperation` decrements `opsInProgress`. If the count reaches zero and the operation is suspended, it calls the cascade action: `AORcvNetwork` suspends `AOMsgDelivery`, which suspends `AOSndNetwork`, which suspends `AODatabase`. At the leaf (`AODatabase`), `notifySuspended` writes `SUSPENDED` to `subQ` and sets `agentState = ASSuspended`. + +**Blocking**: `beginAgentOperation` blocks (STM `retry`) while `opSuspended == True`. This means new operations of a suspended type cannot start - they wait until the operation is resumed. `agentOperationBracket` provides structured bracketing (begin on entry, end on exit). + +**Two wait modes**: +- `waitWhileSuspended` - blocks only during `ASSuspended`, proceeds during `ASSuspending` (allows in-flight operations to complete) +- `waitUntilForeground` - blocks during both `ASSuspending` and `ASSuspended` (stricter, for operations that need full foreground) + +**Usage**: `withStore` brackets all database access with `AODatabase`. Message delivery uses `AOSndNetwork` + `AOMsgDelivery`. Receive processing uses `AORcvNetwork`. This ensures that suspending receive processing cascades through delivery to database, and nothing touches the database after all operations drain. + +--- + +## SessionVar lifecycle + +**Source**: [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +Protocol client connections (SMP, XFTP, NTF) use a lazy singleton pattern via `SessionVar` - a `TMVar` in a `TMap` keyed by transport session. + +**Connection**: `getSessVar` atomically checks the TMap. Returns `Left newVar` (absent - caller must connect) or `Right existingVar` (present - wait for result). `newProtocolClient` wraps the connection attempt: on success, fills the TMVar with `Right client` and writes `CONNECT` event; on failure, fills with `Left (error, maybeRetryTime)` and re-throws. + +**Error caching**: Failed connections cache the error with an expiry timestamp based on `persistErrorInterval`. Future attempts during the interval immediately receive the cached error without reconnecting - this prevents connection storms when a router is down. When `persistErrorInterval == 0`, the SessionVar is removed immediately on failure (fresh connection on next attempt). + +**Compare-and-swap**: Each SessionVar has a monotonic `sessionVarId` from `workerSeq`. `removeSessVar` only removes if the `sessionVarId` matches the current map entry. This prevents a stale disconnect callback (from an old client) from removing a newer client that connected after the old one disconnected. + +**Service credential synchronization** (`updateClientService`): On SMP reconnect, the agent reconciles service credentials between client and router state - updating, creating, or removing service associations as needed. Router version downgrade (router loses service support) triggers client-side service deletion. + +**XFTP special case**: `getProtocolServerClient` ignores the caller's `NetworkRequestMode` parameter for XFTP, always using `NRMBackground` timing. XFTP connections always use background retry timing regardless of the caller's request. + +--- + +## Dual-backend store + +**Source**: [Agent/Store/SQLite.hs](../../src/Simplex/Messaging/Agent/Store/SQLite.hs), [Agent/Store/Postgres.hs](../../src/Simplex/Messaging/Agent/Store/Postgres.hs), [Agent/Store/AgentStore.hs](../../src/Simplex/Messaging/Agent/Store/AgentStore.hs) + +The agent supports SQLite and PostgreSQL via CPP compilation flags (`#if defined(dbPostgres)`). Three wrapper modules (`Interface.hs`, `Common.hs`, `DB.hs`) re-export the appropriate backend. A single binary compiles with one active backend. + +**Key behavioral differences**: + +| Aspect | SQLite | PostgreSQL | +|--------|--------|------------| +| Row locking | Single-writer model (no locking needed) | `FOR UPDATE` on reads preceding writes | +| Batch queries | Per-row `forM` loops | `IN ?` with `In` wrapper | +| Constraint violations | `SQL.ErrorConstraint` pattern match | `constraintViolation` function | +| Transaction savepoints | Not needed | Used in `createWithRandomId'` (failed statement aborts entire transaction without them) | +| Busy/locked errors | `ErrorBusy`/`ErrorLocked` → `SEDatabaseBusy` → `CRITICAL True` | All SQL errors → `SEInternal` | + +**Store access bracketing**: `withStore` wraps all database operations with `agentOperationBracket AODatabase`, connecting the store to the suspension cascade. `withStoreBatch` / `withStoreBatch'` run multiple operations in a single transaction with per-operation error catching. + +**Known bug**: `checkConfirmedSndQueueExists_` uses `#if defined(dpPostgres)` (typo - should be `dbPostgres`), so the `FOR UPDATE` clause is never included on either backend. diff --git a/spec/agent/xrcp.md b/spec/agent/xrcp.md new file mode 100644 index 000000000..864b4d802 --- /dev/null +++ b/spec/agent/xrcp.md @@ -0,0 +1,101 @@ +# XRCP - Cross-Device Remote Control + +XRCP enables a desktop application to control a mobile device over the local network. The protocol establishes an encrypted session between two devices using TLS, post-quantum hybrid key exchange, and optional multicast discovery. + +This document covers the cross-module flows that are not visible from individual module specs. For message formats and cryptographic operations, see [protocol/xrcp.md](../../protocol/xrcp.md). For per-module details: [Client](../modules/Simplex/RemoteControl/Client.md) · [Invitation](../modules/Simplex/RemoteControl/Invitation.md) · [Discovery](../modules/Simplex/RemoteControl/Discovery.md) · [Types](../modules/Simplex/RemoteControl/Types.md). + +**Terminology note**: in the code, "host" is the mobile device (being controlled) and "ctrl" is the desktop (controlling). The protocol spec uses the reverse convention - "host" serves, "controller" connects. This document uses the code convention. + +- [Session handshake flow](#session-handshake-flow) +- [KEM hybrid key exchange](#kem-hybrid-key-exchange) +- [Multicast discovery](#multicast-discovery) +- [Block framing and padding](#block-framing-and-padding) + +--- + +## Session handshake flow + +**Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs), [RemoteControl/Discovery.hs](../../src/Simplex/RemoteControl/Discovery.hs) + +The handshake spans `Client.connectRCHost` (controller side, despite the name), `Client.connectRCCtrl` (host side), `Invitation.mkInvitation`, and `Discovery.startTLSServer`. The full sequence: + +1. **Controller starts TLS server**: generates ephemeral session keys + DH keys, creates a signed invitation containing the CA fingerprint and identity key, starts a TLS server on an ephemeral port. The TLS hook `onNewHandshake` enforces single-session - a second connection attempt is rejected by checking whether the session TMVar is already filled. + +2. **Invitation delivery**: the invitation reaches the host either out-of-band (QR code scan for first pairing) or via encrypted multicast announcement (subsequent sessions - see [Multicast discovery](#multicast-discovery)). + +3. **Host connects via TLS**: `connectRCCtrl` establishes a TLS connection. Both sides validate 2-certificate chains (leaf + CA root). On reconnection, the host validates the controller's CA fingerprint against `KnownHostPairing`; on first pairing, it stores the fingerprint. + +4. **User confirmation barrier**: after TLS connects, the controller extracts the TLS channel binding (`tlsUniq`) as a session code. The application displays this code; the user verifies it on the host. `confirmCtrlSession` uses a double `putTMVar` - the first put signals the decision (accept/reject), the second blocks until the session thread consumes it, creating a synchronization point that prevents the session from proceeding before confirmation completes. + +5. **Hello exchange** (asymmetric encryption): + - Controller sends `RCHostEncHello`: DH public key in plaintext + encrypted body containing the KEM encapsulation key, CA fingerprint, and app info. Encrypted with `cbEncrypt` (classical DH secret). + - Host decrypts the hello, performs KEM encapsulation (see [KEM hybrid key exchange](#kem-hybrid-key-exchange)), derives the hybrid session key, and sends `RCCtrlEncHello` encrypted with `sbEncrypt` (post-quantum hybrid key). + - The asymmetry is deliberate: at the time the controller sends its hello, KEM hasn't completed yet, so only classical DH encryption is available. After the host encapsulates, both sides have the hybrid key. + +6. **Chain key initialization**: both sides call `sbcInit` with the hybrid key to derive send/receive chain keys. The controller explicitly **swaps** the key pair (`swap` call in `prepareCtrlSession`) - both sides derive keys in the same order from `sbcInit`, but have opposite send/receive roles, so the controller must reverse them. The host does not swap. + +7. **Error path**: if KEM encapsulation fails, the host sends `RCCtrlEncError` encrypted with the DH key (not the hybrid key, which doesn't exist yet). The controller can decrypt the error because it has the DH secret from step 5. + +--- + +## KEM hybrid key exchange + +**Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs) + +The session key combines classical Diffie-Hellman with SNTRUP761 (lattice-based KEM) via `SHA3_256(dhSecret || kemSharedKey)` (`kemHybridSecret` in Client.hs). This provides protection against quantum computers while maintaining classical security as a fallback. + +**First session** - KEM public key is too large for a QR code invitation, so it travels in the encrypted hello body: + +1. Controller generates DH + KEM key pairs, puts KEM encapsulation key in the hello body +2. Host decrypts hello with DH secret, extracts KEM encapsulation key +3. Host encapsulates: produces `(kemCiphertext, kemSharedKey)` +4. Host derives hybrid key: `SHA3_256(dhSecret || kemSharedKey)` +5. Host sends `kemCiphertext` in the controller hello body +6. Controller decapsulates `kemCiphertext` to recover `kemSharedKey`, derives the same hybrid key + +**Subsequent sessions** (via multicast) - the previous session's KEM secret is cached in the pairing: + +- Both sides already know each other's KEM capabilities from the previous session +- Fresh DH keys are generated per session for forward secrecy +- The hybrid key derivation uses the new DH secret + the cached KEM secret +- `updateKnownHost` (called in `prepareHostSession`) updates the stored DH public key for the next session + +**Key rotation and `prevDhPrivKey`**: when the host updates its DH key pair for a new session, it retains the previous private key in `RCCtrlPairing.prevDhPrivKey`. This is critical for multicast - during the transition window, the controller may send announcements encrypted with the old public key. `findRCCtrlPairing` tries decryption with both the current and previous DH keys. Without this fallback, key rotation would break multicast discovery. + +--- + +## Multicast discovery + +**Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs), [RemoteControl/Invitation.hs](../../src/Simplex/RemoteControl/Invitation.hs), [RemoteControl/Discovery.hs](../../src/Simplex/RemoteControl/Discovery.hs) + +For subsequent sessions (after initial QR pairing), the controller announces its presence via UDP multicast so the host can connect without scanning a new QR code. The flow spans `Client.announceRC`, `Client.discoverRCCtrl`, `Client.findRCCtrlPairing`, `Invitation.signInvitation`/`verifySignedInvitation`, and `Discovery.joinMulticast`/`withSender`. + +**Announcement creation** (`announceRC`): + +1. The invitation is signed with a dual-signature chain: the session key signs the invitation URI, then the identity key signs the URI + session signature concatenated. This chain means a compromised session key alone cannot forge a valid identity-signed announcement - the identity key must also be compromised. +2. The signed invitation is encrypted with a DH shared secret between the host's known DH public key and the controller's ephemeral DH private key. +3. The encrypted packet is padded to 900 bytes (privacy: all announcements are indistinguishable by size). +4. Sent 60 times at 1-second intervals to multicast group `224.0.0.251:5227`. +5. Runs as a cancellable async task - cancelled in `prepareHostSession` once the session is established. + +**Listener and discovery** (`discoverRCCtrl`): + +1. Host calls `joinMulticast` to subscribe to the multicast group. A shared `TMVar Int` counter tracks active listeners - OS-level `IP_ADD_MEMBERSHIP` is only issued on 0→1 transition, `IP_DROP_MEMBERSHIP` on 1→0. This prevents duplicate syscalls when multiple listeners are active. +2. For each received packet, `findRCCtrlPairing` iterates over known pairings and tries decryption with the current DH key, falling back to `prevDhPrivKey` if present. +3. After successful decryption, the invitation's `dh` field is verified against the announcement's `dhPubKey` to prevent relay attacks. +4. Dual signatures are verified: session signature first, then identity signature. +5. 30-second timeout on the entire discovery process (`RCENotDiscovered` on expiry). + +--- + +## Block framing and padding + +**Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs), [RemoteControl/Types.hs](../../src/Simplex/RemoteControl/Types.hs) + +XRCP uses three padding sizes at different protocol layers: + +- **16,384 bytes** - XRCP block size for all session messages (hello, commands, responses). Matches SMP's block size. Hides message content size variation within the TLS session. +- **12,288 bytes** - hello body padding within the 16,384-byte block, after encryption overhead. +- **900 bytes** - multicast announcement padding. Constrained by typical UDP MTU to avoid fragmentation. + +All padding uses the standard `pad`/`unPad` format (2-byte length prefix + `#` fill). The fixed sizes ensure that an observer monitoring network traffic cannot distinguish different XRCP operations by packet size. From 7aefcbf91d8d18823dc1c9ca4bdbf38281391ef3 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 20:21:26 +0000 Subject: [PATCH 46/61] agent connection topic --- spec/agent/connections.md | 232 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 232 insertions(+) create mode 100644 spec/agent/connections.md diff --git a/spec/agent/connections.md b/spec/agent/connections.md new file mode 100644 index 000000000..2ac83b294 --- /dev/null +++ b/spec/agent/connections.md @@ -0,0 +1,232 @@ +# Agent Connections + +Duplex connection lifecycle: establishment, queue rotation, ratchet synchronization, and message integrity. These cross-module flows span the Agent, protocol client, and store layers. + +For per-module details: [Agent](../modules/Simplex/Messaging/Agent.md) · [Agent Protocol](../modules/Simplex/Messaging/Agent/Protocol.md) · [Ratchet](../modules/Simplex/Messaging/Crypto/Ratchet.md) · [Store Interface](../modules/Simplex/Messaging/Agent/Store/Interface.md). For the component diagram, see [agent.md](../agent.md). For protocol specification, see [Agent Protocol](../../protocol/agent-protocol.md) and [PQDR](../../protocol/pqdr.md). + +- [Design constraints](#design-constraints) +- [Connection establishment](#connection-establishment) +- [Queue rotation](#queue-rotation) +- [Ratchet synchronization](#ratchet-synchronization) +- [Message envelope hierarchy](#message-envelope-hierarchy) +- [Integrity chain](#integrity-chain) + +--- + +## Design constraints + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +Two properties of the protocol drive much of the agent's complexity: + +**TOFU retry safety**: Queues and links are secured via trust-on-first-use - the router accepts the first key presented (SKEY, KEY) and rejects any subsequent different key. If a network call succeeds but the response is lost, the client must retry with the *same* key, or the router will reject it. This means all cryptographic keys must be generated and persisted *before* the network call that uses them. The agent's pervasive store-then-execute pattern (`enqueueCommand` persists to DB, then worker executes with stored keys) exists primarily to satisfy this constraint. + +**Network asymmetry**: After a client sends a message to a queue, the peer's response can arrive at the agent before the originating API call returns to the application. The application must already know the connection exists when it receives the event, otherwise it gets handshake events for an unknown connection. This drives split-phase APIs where the connection is registered locally before any network call. + +Together, these constraints explain why the agent separates key generation from network operations, why commands are persisted before execution, and why connection creation is split into prepare + create phases. + +--- + +## Connection establishment + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs) + +### Split-phase connection creation + +Connection creation is split into two phases to satisfy both design constraints: + +**`prepareConnectionLink`** (no network, no database): generates root Ed25519 signing key pair, queue-level X25519 DH keys, and short link key. Returns the connection link URI and `PreparedLinkParams` in memory. The application can now embed the link in link data (e.g., for short link resolution) before the queue exists. + +**`createConnectionForLink`** (single network call): uses the prepared parameters to create the queue on the router with SKEY (root signature). The sender ID is deterministically derived from the correlation nonce (`SMP.EntityId $ B.take 24 $ C.sha3_384 corrId`), so a lost response can be retried - the router validates the same sender ID. + +Without split-phase, the application would need to create the queue first, get the link, then update the queue with link data containing the link - requiring an extra round-trip. + +### Standard handshake + +The connection establishment flow is shown in [agent.md](../agent.md#connection-establishment-flow). The key non-obvious details: + +**Ratchet initialization is asymmetric**: The initiator (Alice) generates X3DH key parameters during `newRcvConnSrv` and stores them via `createRatchetX3dhKeys`, but does not initialize any ratchet yet. The *receiving* ratchet is only initialized later in `smpConfirmation` via `initRcvRatchet` when the confirmation arrives with the responder's parameters. The responder (Bob) initializes a *sending* ratchet during `startJoinInvitation` via `initSndRatchet`. The names `RcvE2ERatchetParams`/`SndE2ERatchetParams` are historical - what matters is that the responder initializes first (sending direction), and the initiator initializes second (receiving direction) using the responder's parameters. + +**Confirmation decryption proves key agreement**: In `smpConfirmation`, the initiator creates a fresh receiving ratchet from the responder's parameters and immediately uses it to decrypt the confirmation body. If decryption fails, the entire confirmation is rejected - there is no state where a connection has mismatched ratchets. + +**HELLO exchange completes the handshake**: After `allowConnection`, both sides have duplex queues but haven't confirmed liveness. The initiator sends HELLO (with `notification = True` in MsgFlags). The responder receives it and sends its own HELLO back (also with `notification = True`). The initiator emits `CON` when it receives the responder's HELLO. The responder emits `CON` when its own HELLO is *successfully delivered* (in the delivery callback, not on receiving a reply). There are exactly two HELLO messages, not three. + +### Contact URI async path + +For contact URIs (`joinConnectionAsync` with `CRContactUri`), the join is enqueued as an async command. The connection record is created locally (NewConnection state) before the network call, satisfying the network asymmetry constraint. The background worker then creates the receive queue, sends the invitation, and processes the handshake. + +### PQ key agreement + +PQ support is negotiated via version numbers: `agentVersion >= pqdrSMPAgentVersion && e2eVersion >= pqRatchetE2EEncryptVersion`. When both sides support PQ, the KEM public key travels in the confirmation body (too large for invitation URI). The responder encapsulates, producing `(kemCiphertext, kemSharedKey)`, and the hybrid key is derived via `SHA3_256(dhSecret || kemSharedKey)`. + +**PQ support is monotonic**: once enabled for a connection (`PQSupport PQSupportOn`), it cannot be downgraded. This affects header padding size (88 bytes without PQ vs 2310 bytes with PQ). + +### Connection type state machine + +``` +NewConnection + +-> RcvConnection (initiator, after newRcvConnSrv) + | +-> DuplexConnection (after allowConnection + connectReplyQueues) + | +-> ContactConnection (contact address case) + +-> SndConnection (responder, before reply queue created) + | +-> DuplexConnection (after reply queue created) + +-> ContactConnection (short link / contact address) +``` + +--- + +## Queue rotation + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Store.hs](../../src/Simplex/Messaging/Agent/Store.hs) + +Queue rotation replaces a receive queue with a new one on a different router, providing forward secrecy for the transport layer. The protocol uses a 4-message handshake. + +### Protocol sequence + +Rotation is initiated by `switchConnectionAsync` (client API) or by receiving QADD from the peer. Preconditions: connection must be duplex, no switch already in progress, ratchet must not be syncing. + +``` +Receiver (switching party) Sender (peer) + | | + |-- QADD (new queue address) ---------->| + | |-- creates SndQueue to new address + |<--------- QKEY (sender auth key) ----| + | | + |-- secures new queue (ICQSecure) ----->| + |-- QUSE (start using new queue) ----->| + | |-- switches delivery to new queue + |<--------- QTEST (on new queue) ------| + | | + |-- deletes old queue (ICQDelete) ---->| +``` + +### State machines + +**Receiver (RcvQueue switch)**: `RSSwitchStarted` -> `RSSendingQADD` -> `RSSendingQUSE` -> `RSReceivedMessage`. The switch becomes non-abortable at `RSSendingQUSE` - by this point the sender may have already deleted the old queue, so aborting would break the connection. `canAbortRcvSwitch` enforces this. + +**Sender (SndQueue switch)**: creates new SndQueue on QADD, sends QKEY, marks old as `SSSendingQKEY`. On QUSE: sends QTEST *only to the new queue*, marks as `SSSendingQTEST`. Completes when QTEST delivery succeeds. + +### Consecutive rotation handling + +`dbReplaceQId` tracks which old queue a new one replaces. Each new queue stores `dbReplaceQId = Just oldQueueId`. When QADD is processed, send queues whose `dbReplaceQId` points to the current queue's `dbQueueId` are found and deleted in bulk. This handles consecutive rotation requests - only the latest rotation survives. + +### Old queue deletion + +Three triggers delete the old queue: +1. **Sender-side**: QTEST delivery succeeds - old queue removed from `smpDeliveryWorkers` (worker thread stops) +2. **Receiver-side**: first message arrives on new queue - receiver marks old queue for deletion via `ICQDelete` +3. **Abort cleanup**: `abortConnectionSwitch` explicitly deletes new queues created during a failed switch attempt + +--- + +## Ratchet synchronization + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs) + +When double ratchet state becomes desynchronized (e.g., one side restores from backup), the agent can re-establish the ratchet without breaking the connection. + +### State machine + +``` +RSOk (synchronized) + | + v (crypto error detected) +RSAllowed / RSRequired + | + v (synchronizeRatchet called) +RSStarted (waiting for peer) + | + v (peer responds with own keys) +RSAgreed (both exchanged keys) + | + v (ratchet recreated, EREADY sent/received) +RSOk +``` + +**Send prohibition**: `ratchetSyncSendProhibited` returns `True` for `RSRequired`, `RSStarted`, and `RSAgreed`. This blocks *all* messages including queue rotation messages - preventing state corruption while the ratchet is being re-established. + +### Key exchange protocol + +1. Initiator calls `synchronizeRatchet`, which generates new X3DH keys and sends them in an `AgentRatchetKey` envelope (discriminant `'R'`). State becomes `RSStarted`. +2. Peer receives the ratchet key in `newRatchetKey`. If peer hasn't started sync, it generates own keys and sends a reply `AgentRatchetKey`. +3. Both sides now have each other's keys. State becomes `RSAgreed`. + +### Hash-ordered role assignment + +Both parties compute `rkHash = SHA256(pubKeyBytes k1 || pubKeyBytes k2)` for their own keys. The party with the *smaller* hash initializes the receiving ratchet (`pqX3dhRcv`); the party with the larger hash initializes the sending ratchet (`pqX3dhSnd`) and sends `EREADY`. This deterministic tie-breaking avoids a separate negotiation round. + +### EREADY completion + +`EREADY` carries `lastExternalSndId` - the ID of the last message sent with the old ratchet. The receiving party uses this to know when the old ratchet's messages are exhausted and the new ratchet is fully active. Until EREADY arrives, messages may arrive encrypted with either the old or new ratchet. + +### Error recovery + +- **Crypto error during decrypt**: `cryptoErrToSyncState` classifies the error and sets state to `RSAllowed` or `RSRequired`. Client is notified via `RSYNC`. +- **Successful decrypt during non-RSOk state**: if state is not `RSStarted` (which means sync is actively in progress), reset to `RSOk`. A successful message proves the ratchets are synchronized. +- **Duplicate handling**: `rkHash` of received keys is checked against stored hashes to prevent reprocessing the same ratchet key message. + +--- + +## Message envelope hierarchy + +**Source**: [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs) + +Messages use three nesting levels, each adding a layer of structure: + +### Level 1: AgentMsgEnvelope (transport) + +Four variants with single-character discriminants: + +| Variant | Disc. | Encryption | When | +|---------|-------|-----------|------| +| `AgentConfirmation` | `'C'` | Per-queue E2E only | Connection handshake | +| `AgentMsgEnvelope` | `'M'` | Double ratchet | Normal messages | +| `AgentInvitation` | `'I'` | Per-queue E2E only | Contact URI join | +| `AgentRatchetKey` | `'R'` | Per-queue E2E only | Ratchet sync | + +Only `AgentMsgEnvelope` is double-ratchet encrypted. The other three use only the per-queue E2E encryption (DH shared secret from queue creation). This is because during handshake and ratchet sync, the double ratchet is either not yet established or being replaced. + +### Level 2: AgentMessage (application) + +Inside the decrypted envelope: +- `AgentConnInfo` / `AgentConnInfoReply` - connection info during handshake (not double-ratchet encrypted) +- `AgentRatchetInfo` - ratchet sync payload (not double-ratchet encrypted) +- `AgentMessage APrivHeader AMessage` - user and control messages (double-ratchet encrypted) + +The private header (`APrivHeader`) carries `sndMsgId` and `prevMsgHash` for the integrity chain. + +### Level 3: AMessage (semantic) + +Message types with 1-2 character discriminants: +- User messages: `HELLO_`, `A_MSG_`, `A_RCVD_`, `A_QCONT_`, `EREADY_` +- Queue rotation: `QADD_`, `QKEY_`, `QUSE_`, `QTEST_` + +### ACK semantics + +- **User messages** (`A_MSG_`): NOT auto-ACKed. Agent returns `ACKPending`; application must call `ackMessage`. +- **Receipts** (`A_RCVD`): returns `ACKPending` when valid receipts are present (application must ACK after processing); auto-ACKed only when all receipts fail. +- **Other control messages** (HELLO, QADD, QKEY, QUSE, QTEST, EREADY): auto-ACKed by the agent. +- **Error during processing**: `handleNotifyAck` sends `ERR` to the application but still ACKs to the router, preventing re-delivery of a message that will fail again. + +--- + +## Integrity chain + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs) + +Each message in a connection commits to the previous message via two mechanisms: + +1. **External sender ID** (`lastExternalSndId`): monotonically increasing counter per connection +2. **Previous message hash** (`prevMsgHash`): SHA256 of the previous message body + +`checkMsgIntegrity` produces one of five outcomes: + +| Outcome | Condition | +|---------|-----------| +| `MsgOk` | Sequential ID and matching hash | +| `MsgBadId` | ID from the past (less than previous) | +| `MsgDuplicate` | Same ID as previous | +| `MsgSkipped` | Gap in IDs (messages lost) | +| `MsgBadHash` | Sequential ID but hash mismatch | + +**Non-rejecting semantics**: the agent does NOT reject messages with integrity failures. The result is reported to the application via `MsgMeta.integrity`. The application decides the policy - warn, ignore, or terminate the connection. From eafc84fd02c76ef740856fc2d0416560ff32110f Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 20:43:26 +0000 Subject: [PATCH 47/61] fix doc --- spec/agent/connections.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/spec/agent/connections.md b/spec/agent/connections.md index 2ac83b294..1926504fc 100644 --- a/spec/agent/connections.md +++ b/spec/agent/connections.md @@ -35,7 +35,7 @@ Together, these constraints explain why the agent separates key generation from Connection creation is split into two phases to satisfy both design constraints: -**`prepareConnectionLink`** (no network, no database): generates root Ed25519 signing key pair, queue-level X25519 DH keys, and short link key. Returns the connection link URI and `PreparedLinkParams` in memory. The application can now embed the link in link data (e.g., for short link resolution) before the queue exists. +**`prepareConnectionLink`** (no network, no database): generates root Ed25519 signing key pair and queue-level X25519 DH keys. Derives a short link key as `SHA3_256` of the encoded fixed link data. Returns the connection link URI and `PreparedLinkParams` in memory. The application can now embed the link in link data (e.g., for short link resolution) before the queue exists. **`createConnectionForLink`** (single network call): uses the prepared parameters to create the queue on the router with SKEY (root signature). The sender ID is deterministically derived from the correlation nonce (`SMP.EntityId $ B.take 24 $ C.sha3_384 corrId`), so a lost response can be retried - the router validates the same sender ID. @@ -49,7 +49,7 @@ The connection establishment flow is shown in [agent.md](../agent.md#connection- **Confirmation decryption proves key agreement**: In `smpConfirmation`, the initiator creates a fresh receiving ratchet from the responder's parameters and immediately uses it to decrypt the confirmation body. If decryption fails, the entire confirmation is rejected - there is no state where a connection has mismatched ratchets. -**HELLO exchange completes the handshake**: After `allowConnection`, both sides have duplex queues but haven't confirmed liveness. The initiator sends HELLO (with `notification = True` in MsgFlags). The responder receives it and sends its own HELLO back (also with `notification = True`). The initiator emits `CON` when it receives the responder's HELLO. The responder emits `CON` when its own HELLO is *successfully delivered* (in the delivery callback, not on receiving a reply). There are exactly two HELLO messages, not three. +**HELLO exchange completes the handshake**: After `allowConnection`, both sides have duplex queues but haven't confirmed liveness. The responder (Bob) sends the first HELLO (with `notification = True` in MsgFlags), triggered by `ICDuplexSecure`. The initiator (Alice) receives it and sends her own HELLO back (also with `notification = True`). The initiator emits `CON` in the *delivery callback* of her HELLO (her rcvQueue is already Active from receiving Bob's HELLO). The responder emits `CON` when he *receives* the initiator's reply HELLO (his sndQueue is already Active from his own HELLO delivery). There are exactly two HELLO messages. ### Contact URI async path @@ -57,7 +57,7 @@ For contact URIs (`joinConnectionAsync` with `CRContactUri`), the join is enqueu ### PQ key agreement -PQ support is negotiated via version numbers: `agentVersion >= pqdrSMPAgentVersion && e2eVersion >= pqRatchetE2EEncryptVersion`. When both sides support PQ, the KEM public key travels in the confirmation body (too large for invitation URI). The responder encapsulates, producing `(kemCiphertext, kemSharedKey)`, and the hybrid key is derived via `SHA3_256(dhSecret || kemSharedKey)`. +PQ support is negotiated via version numbers: `agentVersion >= pqdrSMPAgentVersion && e2eVersion >= pqRatchetE2EEncryptVersion`. When both sides support PQ, the KEM public key travels in the confirmation body (too large for invitation URI). The responder encapsulates, producing `(kemCiphertext, kemSharedKey)`, and the hybrid key is derived via HKDF-SHA512 over the concatenation of three X3DH shared secrets plus the KEM shared secret, with info string `"SimpleXX3DH"`. **PQ support is monotonic**: once enabled for a connection (`PQSupport PQSupportOn`), it cannot be downgraded. This affects header padding size (88 bytes without PQ vs 2310 bytes with PQ). @@ -108,7 +108,7 @@ Receiver (switching party) Sender (peer) ### Consecutive rotation handling -`dbReplaceQId` tracks which old queue a new one replaces. Each new queue stores `dbReplaceQId = Just oldQueueId`. When QADD is processed, send queues whose `dbReplaceQId` points to the current queue's `dbQueueId` are found and deleted in bulk. This handles consecutive rotation requests - only the latest rotation survives. +`dbReplaceQueueId` tracks which old queue a new one replaces. Each new queue stores `dbReplaceQueueId = Just oldQueueId`. When QADD is processed, send queues whose `dbReplaceQueueId` points to the current queue's `dbQueueId` are found and deleted in bulk. This handles consecutive rotation requests - only the latest rotation survives. ### Old queue deletion @@ -205,7 +205,7 @@ Message types with 1-2 character discriminants: - **User messages** (`A_MSG_`): NOT auto-ACKed. Agent returns `ACKPending`; application must call `ackMessage`. - **Receipts** (`A_RCVD`): returns `ACKPending` when valid receipts are present (application must ACK after processing); auto-ACKed only when all receipts fail. -- **Other control messages** (HELLO, QADD, QKEY, QUSE, QTEST, EREADY): auto-ACKed by the agent. +- **Other control messages** (HELLO, QADD, QKEY, QUSE, QTEST, EREADY, A_QCONT): auto-ACKed by the agent. - **Error during processing**: `handleNotifyAck` sends `ERR` to the application but still ACKs to the router, preventing re-delivery of a message that will fail again. --- From c8de00872a2396e6eca581f23b6054e595e15096 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 21:51:43 +0000 Subject: [PATCH 48/61] fixes --- spec/agent/connections.md | 12 ++++----- spec/agent/infrastructure.md | 13 +++++----- spec/agent/xrcp.md | 48 +++++++++++++++++------------------- 3 files changed, 35 insertions(+), 38 deletions(-) diff --git a/spec/agent/connections.md b/spec/agent/connections.md index 1926504fc..2cda714a3 100644 --- a/spec/agent/connections.md +++ b/spec/agent/connections.md @@ -37,7 +37,7 @@ Connection creation is split into two phases to satisfy both design constraints: **`prepareConnectionLink`** (no network, no database): generates root Ed25519 signing key pair and queue-level X25519 DH keys. Derives a short link key as `SHA3_256` of the encoded fixed link data. Returns the connection link URI and `PreparedLinkParams` in memory. The application can now embed the link in link data (e.g., for short link resolution) before the queue exists. -**`createConnectionForLink`** (single network call): uses the prepared parameters to create the queue on the router with SKEY (root signature). The sender ID is deterministically derived from the correlation nonce (`SMP.EntityId $ B.take 24 $ C.sha3_384 corrId`), so a lost response can be retried - the router validates the same sender ID. +**`createConnectionForLink`** (single network call): uses the prepared parameters to create the queue on the router with NEW (root signing key as owner auth). The sender ID is deterministically derived from the correlation nonce (`SMP.EntityId $ B.take 24 $ C.sha3_384 corrId`), so a lost response can be retried - the router validates the same sender ID. Without split-phase, the application would need to create the queue first, get the link, then update the queue with link data containing the link - requiring an extra round-trip. @@ -83,7 +83,7 @@ Queue rotation replaces a receive queue with a new one on a different router, pr ### Protocol sequence -Rotation is initiated by `switchConnectionAsync` (client API) or by receiving QADD from the peer. Preconditions: connection must be duplex, no switch already in progress, ratchet must not be syncing. +Rotation is initiated by the switching party calling `switchConnectionAsync` (client API), which sends QADD. The peer responds to QADD by creating a new send queue and replying with QKEY. Preconditions: connection must be duplex, no switch already in progress, ratchet must not be syncing. ``` Receiver (switching party) Sender (peer) @@ -157,7 +157,7 @@ Both parties compute `rkHash = SHA256(pubKeyBytes k1 || pubKeyBytes k2)` for the ### EREADY completion -`EREADY` carries `lastExternalSndId` - the ID of the last message sent with the old ratchet. The receiving party uses this to know when the old ratchet's messages are exhausted and the new ratchet is fully active. Until EREADY arrives, messages may arrive encrypted with either the old or new ratchet. +`EREADY` carries `lastExternalSndId` - the ID of the last message the sender received from the peer before switching ratchets. The receiving party uses this to know when the old ratchet's messages are exhausted and the new ratchet is fully active. Until EREADY arrives, messages may arrive encrypted with either the old or new ratchet. ### Error recovery @@ -179,17 +179,17 @@ Four variants with single-character discriminants: | Variant | Disc. | Encryption | When | |---------|-------|-----------|------| -| `AgentConfirmation` | `'C'` | Per-queue E2E only | Connection handshake | +| `AgentConfirmation` | `'C'` | Per-queue E2E (outer) + double ratchet (inner `encConnInfo`) | Connection handshake | | `AgentMsgEnvelope` | `'M'` | Double ratchet | Normal messages | | `AgentInvitation` | `'I'` | Per-queue E2E only | Contact URI join | | `AgentRatchetKey` | `'R'` | Per-queue E2E only | Ratchet sync | -Only `AgentMsgEnvelope` is double-ratchet encrypted. The other three use only the per-queue E2E encryption (DH shared secret from queue creation). This is because during handshake and ratchet sync, the double ratchet is either not yet established or being replaced. +`AgentMsgEnvelope` is fully double-ratchet encrypted. `AgentConfirmation` uses per-queue E2E for the outer envelope but also contains `encConnInfo` which is double-ratchet encrypted (the ratchet is initialized during confirmation processing). `AgentInvitation` and `AgentRatchetKey` use only per-queue E2E - the double ratchet is either not yet established or being replaced. ### Level 2: AgentMessage (application) Inside the decrypted envelope: -- `AgentConnInfo` / `AgentConnInfoReply` - connection info during handshake (not double-ratchet encrypted) +- `AgentConnInfo` / `AgentConnInfoReply` - connection info during handshake (double-ratchet encrypted inside `encConnInfo`) - `AgentRatchetInfo` - ratchet sync payload (not double-ratchet encrypted) - `AgentMessage APrivHeader AMessage` - user and control messages (double-ratchet encrypted) diff --git a/spec/agent/infrastructure.md b/spec/agent/infrastructure.md index ad84fd6e7..5e3c828e6 100644 --- a/spec/agent/infrastructure.md +++ b/spec/agent/infrastructure.md @@ -30,7 +30,7 @@ All agent background processing - async commands, message delivery, notification **Error classification**: `withWork` distinguishes two failure modes: - *Work-item error* (`isWorkItemError`): the task itself is broken (likely recurring). Worker stops and sends `CRITICAL False`. -- *Store error*: transient database issue. Worker re-signals `doWork` and reports `INTERNAL` (retry may succeed). +- *Other error*: any non-work-item error (e.g., transient database issue). Worker re-signals `doWork` and reports `INTERNAL` (retry may succeed). **Restart rate limiting**: On worker exit, `restartOrDelete` checks the `restarts` counter against `maxWorkerRestartsPerMin`. Under the limit: reset action, re-signal, restart. Over the limit: delete the worker from the map and send `CRITICAL True` (escalation to the application). A restart only proceeds if the `workerId` in the map still matches the current worker - a stale restart from a replaced worker is a no-op. @@ -54,7 +54,7 @@ Async commands handle state transitions that require network calls but shouldn't - *Client commands* (`AClientCommand`): `NEW`, `JOIN`, `LET` (allow connection), `ACK`, `LSET`/`LGET` (set/get connection link data), `SWCH` (switch queue), `DEL`. Triggered by application API calls. - *Internal commands* (`AInternalCommand`): `ICAck` (ack to router), `ICAckDel` (ack + delete local message), `ICAllowSecure`/`ICDuplexSecure` (secure after confirmation), `ICQSecure` (secure queue during switch), `ICQDelete` (delete old queue after switch), `ICDeleteConn` (delete connection), `ICDeleteRcvQueue` (delete specific receive queue). Generated *during* message processing to handle state transitions asynchronously. -**Retry and movement**: `tryMoveableCommand` wraps execution with `withRetryInterval`. On `temporaryOrHostError`, it retries with backoff. On cross-server errors (e.g., queue moved to different router), it updates the command's server field in the store (`CCMoved`) and retries against the new server. +**Retry and movement**: `tryMoveableCommand` wraps execution with `withRetryInterval`. On `temporaryOrHostError`, it retries with backoff. Individual command handlers can return `CCMoved` (e.g., when a queue has moved to a different router) after updating the command's server field in the store - `tryMoveableCommand` then exits cleanly, letting the moved command be picked up by the appropriate worker. **Locking**: State-sensitive commands use `tryWithLock` / `tryMoveableWithLock`, which acquire `withConnLock` before execution. This serializes operations on the same connection, preventing races between concurrent command processing and message receipt. @@ -73,8 +73,7 @@ Message delivery uses a split-phase encryption design: the ratchet advances in t 2. Call `agentRatchetEncryptHeader` - advances the double ratchet, produces a message encryption key (MEK), padded length, and PQ encryption status 3. Store `SndMsg` with `SndMsgPrepData` (MEK, paddedLen, sndMsgBodyId) in the database 4. Create `SndMsgDelivery` record for each send queue -5. Increment `msgDeliveryOp.opsInProgress` (for suspension tracking) -6. Signal delivery workers via `getDeliveryWorker` +5. `submitPendingMsg` - increments `msgDeliveryOp.opsInProgress` (for suspension tracking) and signals delivery workers via `getDeliveryWorker` **Phase 2 - delivery worker** (`runSmpQueueMsgDelivery`): 1. `throwWhenNoDelivery` - kills the worker thread if the queue's address has been removed from `smpDeliveryWorkers` (prevents delivery to queues replaced during switch) @@ -82,9 +81,9 @@ Message delivery uses a split-phase encryption design: the ratchet advances in t 3. Re-encode the message with `internalSndId`/`prevMsgHash`, then `rcEncryptMsg` to encrypt with the stored MEK (no ratchet access needed) 4. `sendAgentMessage` - per-queue encrypt + SEND to the router -**Connection info messages** (`AM_CONN_INFO`, `AM_CONN_INFO_REPLY`) skip split-phase encryption entirely - they are sent as plaintext confirmation bodies via `sendConfirmation`. +**Connection info messages** (`AM_CONN_INFO`, `AM_CONN_INFO_REPLY`) skip split-phase encryption entirely - they are sent as per-queue E2E encrypted confirmation bodies via `sendConfirmation` (encrypted with `agentCbEncrypt`, not with the double ratchet). -**Retry with dual intervals**: Delivery uses `withRetryLock2`, which maintains two independent retry clocks (slow and fast). A background thread sleeps for the current interval, then signals the delivery worker via `tryPutTMVar`. When the router sends `QCONT` (queue buffer cleared), the agent calls `tryPutTMVar retryLock ()` to wake the delivery thread immediately, avoiding unnecessary delay. +**Retry with dual intervals**: Delivery uses `withRetryLock2`, which maintains two retry interval states (slow and fast) but only one wait is active at a time. A background thread sleeps for the current interval, then signals the delivery worker via `tryPutTMVar`. When the router sends `QCONT` (queue buffer cleared), the agent calls `tryPutTMVar retryLock ()` to wake the delivery thread immediately, avoiding unnecessary delay. **Error handling**: - `SMP QUOTA` - switch to slow retry, don't penalize (backpressure from router) @@ -115,7 +114,7 @@ SessSubs - **Active → Pending**: When `setSessionId` is called with a *different* session ID (TLS reconnect), all active subscriptions are atomically demoted to pending. Session ID is updated to the new value. -- **Pending → Removed**: `failSubscriptions` moves permanently-failed queues (non-temporary SMP errors) to `removedSubs`. The removal is tracked for diagnostic reporting via `getSubscriptions`. +- **Pending → Removed**: `failSubscriptions` moves permanently-failed queues (non-temporary SMP errors) to `removedSubs` - a separate `TMap` in `AgentClient`, not part of `TSessionSubs`. The removal is tracked for diagnostic reporting via `getSubscriptions`. **Service-associated queues**: Queues with `serviceAssoc=True` are *not* added to `activeSubs` individually. Instead, the service subscription's count is incremented and its `idsHash` XOR-accumulates the queue's hash. The router tracks individual queues via the service subscription; the agent only tracks the aggregate. Consequence: `hasActiveSub(rId)` returns `False` for service-associated queues - callers must check the service subscription separately. diff --git a/spec/agent/xrcp.md b/spec/agent/xrcp.md index 864b4d802..883e64c5c 100644 --- a/spec/agent/xrcp.md +++ b/spec/agent/xrcp.md @@ -23,18 +23,18 @@ The handshake spans `Client.connectRCHost` (controller side, despite the name), 2. **Invitation delivery**: the invitation reaches the host either out-of-band (QR code scan for first pairing) or via encrypted multicast announcement (subsequent sessions - see [Multicast discovery](#multicast-discovery)). -3. **Host connects via TLS**: `connectRCCtrl` establishes a TLS connection. Both sides validate 2-certificate chains (leaf + CA root). On reconnection, the host validates the controller's CA fingerprint against `KnownHostPairing`; on first pairing, it stores the fingerprint. +3. **Host connects via TLS**: `connectRCCtrl` establishes a TLS connection. Both sides validate certificate chains. On the controller side, `onClientCertificate` explicitly checks for a 2-certificate chain (leaf + CA root) and validates the host's CA fingerprint against `KnownHostPairing.hostFingerprint` (or stores it on first pairing). On the host side, the controller's CA fingerprint is validated against `RCCtrlPairing.ctrlFingerprint` in `updateCtrlPairing`. -4. **User confirmation barrier**: after TLS connects, the controller extracts the TLS channel binding (`tlsUniq`) as a session code. The application displays this code; the user verifies it on the host. `confirmCtrlSession` uses a double `putTMVar` - the first put signals the decision (accept/reject), the second blocks until the session thread consumes it, creating a synchronization point that prevents the session from proceeding before confirmation completes. +4. **User confirmation barrier**: after TLS connects, both sides extract the TLS channel binding (`tlsUniq`) as a session code. The application displays this code on both devices for the user to verify. On the host side, `confirmCtrlSession` uses a double `putTMVar` - the first put signals the decision (accept/reject), the second blocks until the session thread acknowledges the value, ensuring `confirmCtrlSession` does not return prematurely. 5. **Hello exchange** (asymmetric encryption): - - Controller sends `RCHostEncHello`: DH public key in plaintext + encrypted body containing the KEM encapsulation key, CA fingerprint, and app info. Encrypted with `cbEncrypt` (classical DH secret). - - Host decrypts the hello, performs KEM encapsulation (see [KEM hybrid key exchange](#kem-hybrid-key-exchange)), derives the hybrid session key, and sends `RCCtrlEncHello` encrypted with `sbEncrypt` (post-quantum hybrid key). - - The asymmetry is deliberate: at the time the controller sends its hello, KEM hasn't completed yet, so only classical DH encryption is available. After the host encapsulates, both sides have the hybrid key. + - Host sends `RCHostEncHello` (`prepareHostHello`): DH public key in plaintext + encrypted body containing the KEM encapsulation key, CA fingerprint, and app info. Encrypted with `cbEncrypt` (classical DH secret). + - Controller decrypts the hello, performs KEM encapsulation (see [KEM hybrid key exchange](#kem-hybrid-key-exchange)), derives the hybrid session key, initializes a chain via `sbcInit`, and sends `RCCtrlEncHello` (`prepareHostSession`) encrypted with a key derived from the chain (`sbcHkdf` + `sbEncrypt`). + - The asymmetry is deliberate: at the time the host sends its hello, KEM hasn't completed yet, so only classical DH encryption is available. After the controller encapsulates, both sides have the hybrid key. -6. **Chain key initialization**: both sides call `sbcInit` with the hybrid key to derive send/receive chain keys. The controller explicitly **swaps** the key pair (`swap` call in `prepareCtrlSession`) - both sides derive keys in the same order from `sbcInit`, but have opposite send/receive roles, so the controller must reverse them. The host does not swap. +6. **Chain key initialization**: both sides call `sbcInit` with the hybrid key to derive send/receive chain keys. The host explicitly **swaps** the key pair (`swap` call in `prepareCtrlSession`, which runs on the host side despite its name) - both sides derive keys in the same order from `sbcInit`, but have opposite send/receive roles, so the host must reverse them. The controller does not swap. -7. **Error path**: if KEM encapsulation fails, the host sends `RCCtrlEncError` encrypted with the DH key (not the hybrid key, which doesn't exist yet). The controller can decrypt the error because it has the DH secret from step 5. +7. **Error path**: if KEM encapsulation fails, the controller sends `RCCtrlEncError` (a variant of `RCCtrlEncHello`) encrypted with the DH key (not the hybrid key, which doesn't exist yet). The host can decrypt the error because it has the DH secret from step 5. Note: this error path is not yet fully implemented in the code. --- @@ -42,23 +42,20 @@ The handshake spans `Client.connectRCHost` (controller side, despite the name), **Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs) -The session key combines classical Diffie-Hellman with SNTRUP761 (lattice-based KEM) via `SHA3_256(dhSecret || kemSharedKey)` (`kemHybridSecret` in Client.hs). This provides protection against quantum computers while maintaining classical security as a fallback. +The session key combines classical Diffie-Hellman with SNTRUP761 (lattice-based KEM) via `SHA3_256(dhSecret || kemSharedKey)` (`kemHybridSecret` in `Crypto/SNTRUP761.hs`). This provides protection against quantum computers while maintaining classical security as a fallback. -**First session** - KEM public key is too large for a QR code invitation, so it travels in the encrypted hello body: +The KEM public key is too large for a QR code invitation, so it travels in the encrypted hello body. Fresh KEM keys are generated every session - no KEM state is cached between sessions. -1. Controller generates DH + KEM key pairs, puts KEM encapsulation key in the hello body -2. Host decrypts hello with DH secret, extracts KEM encapsulation key -3. Host encapsulates: produces `(kemCiphertext, kemSharedKey)` -4. Host derives hybrid key: `SHA3_256(dhSecret || kemSharedKey)` -5. Host sends `kemCiphertext` in the controller hello body -6. Controller decapsulates `kemCiphertext` to recover `kemSharedKey`, derives the same hybrid key +1. Host generates a fresh KEM key pair (`prepareHostHello`), puts the KEM public key in the host hello body +2. Controller decrypts hello with DH secret, extracts KEM public key +3. Controller encapsulates (`sntrup761Enc`): produces `(kemCiphertext, kemSharedKey)` +4. Controller derives hybrid key: `SHA3_256(dhSecret || kemSharedKey)` +5. Controller sends `kemCiphertext` in the ctrl hello body (`RCCtrlEncHello`) +6. Host decapsulates `kemCiphertext` (`sntrup761Dec`) to recover `kemSharedKey`, derives the same hybrid key -**Subsequent sessions** (via multicast) - the previous session's KEM secret is cached in the pairing: +The KEM exchange is identical for first and subsequent sessions. The only difference between sessions is how the invitation is delivered (QR code vs multicast) and whether TLS fingerprints are stored for the first time or verified against known pairings. -- Both sides already know each other's KEM capabilities from the previous session -- Fresh DH keys are generated per session for forward secrecy -- The hybrid key derivation uses the new DH secret + the cached KEM secret -- `updateKnownHost` (called in `prepareHostSession`) updates the stored DH public key for the next session +`updateKnownHost` (called in `prepareHostSession` on the controller) updates the stored host DH public key (`hostDhPubKey` in `KnownHostPairing`) - this is used for encrypting multicast announcements in subsequent sessions, not for KEM. **Key rotation and `prevDhPrivKey`**: when the host updates its DH key pair for a new session, it retains the previous private key in `RCCtrlPairing.prevDhPrivKey`. This is critical for multicast - during the transition window, the controller may send announcements encrypted with the old public key. `findRCCtrlPairing` tries decryption with both the current and previous DH keys. Without this fallback, key rotation would break multicast discovery. @@ -68,23 +65,24 @@ The session key combines classical Diffie-Hellman with SNTRUP761 (lattice-based **Source**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs), [RemoteControl/Invitation.hs](../../src/Simplex/RemoteControl/Invitation.hs), [RemoteControl/Discovery.hs](../../src/Simplex/RemoteControl/Discovery.hs) -For subsequent sessions (after initial QR pairing), the controller announces its presence via UDP multicast so the host can connect without scanning a new QR code. The flow spans `Client.announceRC`, `Client.discoverRCCtrl`, `Client.findRCCtrlPairing`, `Invitation.signInvitation`/`verifySignedInvitation`, and `Discovery.joinMulticast`/`withSender`. +For subsequent sessions (after initial QR pairing), the controller announces its presence via UDP multicast so the host can connect without scanning a new QR code. The flow spans `Client.announceRC`, `Client.discoverRCCtrl`, `Client.findRCCtrlPairing`, `Invitation.signInvitation`/`verifySignedInvitation`, and `Discovery.withListener`/`withSender`. **Announcement creation** (`announceRC`): -1. The invitation is signed with a dual-signature chain: the session key signs the invitation URI, then the identity key signs the URI + session signature concatenated. This chain means a compromised session key alone cannot forge a valid identity-signed announcement - the identity key must also be compromised. +1. The invitation is signed with a dual-signature chain: the session key signs the invitation URI, then the identity key signs the concatenation `URI + "&ssig=" + sessionSignature`. This chain means a compromised session key alone cannot forge a valid identity-signed announcement - the identity key must also be compromised. 2. The signed invitation is encrypted with a DH shared secret between the host's known DH public key and the controller's ephemeral DH private key. 3. The encrypted packet is padded to 900 bytes (privacy: all announcements are indistinguishable by size). 4. Sent 60 times at 1-second intervals to multicast group `224.0.0.251:5227`. -5. Runs as a cancellable async task - cancelled in `prepareHostSession` once the session is established. +5. Runs as a cancellable async task - cancelled in `connectRCHost` after `prepareHostSession` returns, once the session is established. **Listener and discovery** (`discoverRCCtrl`): 1. Host calls `joinMulticast` to subscribe to the multicast group. A shared `TMVar Int` counter tracks active listeners - OS-level `IP_ADD_MEMBERSHIP` is only issued on 0→1 transition, `IP_DROP_MEMBERSHIP` on 1→0. This prevents duplicate syscalls when multiple listeners are active. 2. For each received packet, `findRCCtrlPairing` iterates over known pairings and tries decryption with the current DH key, falling back to `prevDhPrivKey` if present. 3. After successful decryption, the invitation's `dh` field is verified against the announcement's `dhPubKey` to prevent relay attacks. -4. Dual signatures are verified: session signature first, then identity signature. -5. 30-second timeout on the entire discovery process (`RCENotDiscovered` on expiry). +4. The source IP address is checked against the invitation's `host` field - prevents re-broadcasting a legitimate announcement from a different host. +5. Dual signatures are verified: session signature first, then identity signature. +6. 30-second timeout on the entire discovery process (`RCENotDiscovered` on expiry). --- From 38fa104c7ec11fd8fe35431e255f9db802f10727 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 22:40:10 +0000 Subject: [PATCH 49/61] subscriptions --- spec/topics/subscriptions.md | 222 +++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 spec/topics/subscriptions.md diff --git a/spec/topics/subscriptions.md b/spec/topics/subscriptions.md new file mode 100644 index 000000000..b7bc9fa1c --- /dev/null +++ b/spec/topics/subscriptions.md @@ -0,0 +1,222 @@ +# Subscriptions + +How messages reach recipients: router subscription model, subscription-driven delivery, cross-layer subscription flow, and reconnection. This is the cross-cutting view spanning all three layers (router, client, agent). + +For agent-internal subscription tracking (TSessionSubs, pending/active state machine, UP event deduplication), see [agent/infrastructure.md](../agent/infrastructure.md#subscription-tracking). For service subscription lifecycle, see [client-services.md](client-services.md). For the SMP protocol specification, see [simplex-messaging.md](../../protocol/simplex-messaging.md). + +- [Router subscription model](#router-subscription-model) +- [Subscription-driven delivery](#subscription-driven-delivery) +- [Cross-layer subscription flow](#cross-layer-subscription-flow) +- [Reconnection and resubscription](#reconnection-and-resubscription) +- [Service subscriptions](#service-subscriptions) + +--- + +## Router subscription model + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Server/Env/STM.hs](../../src/Simplex/Messaging/Server/Env/STM.hs) + +The router tracks which client connection is subscribed to each queue. At most one client can be subscribed to a given queue at a time - a new subscription displaces the previous one. + +### SubscribedClients - the TVar-of-Maybe pattern + +`SubscribedClients` is a `TMap EntityId (TVar (Maybe (Client s)))`. The indirection through `TVar (Maybe ...)` serves two purposes: + +1. **STM re-evaluation**: any transaction reading the TVar automatically re-evaluates when the subscriber changes (disconnects, gets displaced). This is used by `tryDeliverMessage` - if the subscriber disconnects mid-delivery, the STM transaction retries and sees `Nothing`. + +2. **Reconnection continuity**: when a mobile client disconnects and reconnects, the TVar is reused rather than recreated. Subscriptions that were made at any point are never removed from the map - this is a deliberate trade-off for intermittently connected mobile clients. + +The `SubscribedClients` constructor is not exported from `Server/Env/STM.hs` (only the type is). All access goes through `getSubscribedClient` (IO, outside STM) and `upsertSubscribedClient` (STM). This prevents accidental use of `TM.lookup` inside STM transactions, which would add the entire TMap to the transaction's read set. + +Two instances exist: `queueSubscribers` for individually-subscribed queues and `serviceSubscribers` for service-subscribed queues. + +### serverThread - split-STM processing + +`serverThread` processes subscription registration events from `subQ`. It runs separately from the client handler threads and uses a split-STM pattern to reduce contention: + +``` +subQ (TQueue) -- (A) STM: read event + → getServerClient clientId -- (B) IO: lookup client outside STM + → updateSubscribers -- (C) STM: register in SubscribedClients + → endPreviousSubscriptions -- (D) IO: notify displaced clients +``` + +Step (B) is deliberately outside STM. If the client lookup were inside the transaction, the transaction would re-evaluate every time the clients `IntMap` TVar changes (e.g., when any client connects or disconnects). By reading in IO, only the `updateSubscribers` transaction needs to be STM. + +If the client disconnects between steps (B) and (C), `updateSubscribers` handles `Nothing` - it still sends END/DELD to any existing subscriber for the same queue. + +### Subscription displacement + +When `upsertSubscribedClient` finds a different client already subscribed to the same entity, it returns the previous client. `endPreviousSubscriptions` then: + +1. Queues `(entityId, END)` or `(entityId, DELD)` into `pendingEvents` (a `TVar (IntMap (NonEmpty ...))` keyed by client ID). +2. Removes the subscription from the displaced client's local `subscriptions` map and cancels any delivery thread. + +A separate `sendPendingEvtsThread` flushes `pendingEvents` on a timer (`pendingENDInterval`), delivering END/DELD events to displaced clients via their `sndQ`. If the client's `sndQ` is full, it forks a blocking thread rather than stalling the flush. + +For service subscriptions, the displacement event is `ENDS n idsHash` rather than `END`. + +### GET vs SUB mutual exclusion + +When `GET` is used on a queue, the server creates a `ProhibitSub` subscription. This prevents `SUB` on the same queue in the same connection (`CMD PROHIBITED`). Conversely, if `SUB` is active, `GET` is prohibited. GET clients are not added to `ServerSubscribers` and do not receive END events. + +--- + +## Subscription-driven delivery + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs) + +The router delivers at most one unacknowledged message per subscription. The `delivered :: TVar (Maybe (MsgId, SystemSeconds))` in each `Sub` record is the gate: `Just _` means a message is in flight (awaiting ACK), `Nothing` means the next message can be delivered. + +### Three delivery triggers + +**1. SUB** - `subscribeQueueAndDeliver`: after registering the subscription, the server peeks the first pending message (`tryPeekMsg`). If one exists, it is delivered alongside the `SOK` response in the same transmission batch. `setDelivered` records the message ID and timestamp. + +**2. ACK** - `acknowledgeMsg`: when the client ACKs a message, the server clears `delivered`, then calls `tryDelPeekMsg` which deletes the ACK'd message AND peeks the next. If a next message exists, it is immediately delivered in the ACK response and `setDelivered` is called again. This means ACK responses can piggyback the next message - minimizing round-trips. + +**3. SEND to empty queue** - `tryDeliverMessage`: when a sender writes a message to a previously empty queue (`wasEmpty = True`), the server attempts to push it to the subscribed recipient immediately. + +### Sync/async split in tryDeliverMessage + +`tryDeliverMessage` has a three-phase structure optimized for the common case: + +**Phase 1 - outside STM**: `getSubscribedClient` reads the `SubscribedClients` TMap via `readTVarIO` (IO, not STM). If no subscriber exists, the function returns immediately without entering any STM transaction. This avoids transaction overhead for queues with no active subscriber. + +**Phase 2 - STM transaction** (`deliverToSub`): reads the client TVar (inside STM, so the transaction re-evaluates if the subscriber changes), checks `subThread == NoSub` and `delivered == Nothing`. Then: + +- If the client's `sndQ` is **not full**: delivers the message directly in the same STM transaction (`writeTBQueue sndQ`), sets `delivered`. No thread is needed. This is the fast path. +- If the client's `sndQ` is **full**: sets `subThread = SubPending` and returns the client + sub for phase 3. + +**Phase 3 - forked thread** (`forkDeliver`): a `deliverThread` is spawned that blocks until the `sndQ` has room. Before delivering, it re-checks that the subscriber is still the same client and `delivered` is still `Nothing` - handling the race where the client disconnected and a new one subscribed between phases 2 and 3. + +### Per-queue encryption + +The server encrypts every message before delivery using `encryptMsg`: `XSalsa20-Poly1305` with the per-queue DH shared secret (`rcvDhSecret` from `QueueRec`) and a nonce derived from the message ID. This is the server-to-recipient transport encryption layer - independent of the end-to-end encryption between sender and recipient. + +--- + +## Cross-layer subscription flow + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs), [Client.hs](../../src/Simplex/Messaging/Client.hs) + +### Subscribe path (agent → router) + +``` +subscribeConnections' + ├── getConnSubs (DB) → load RcvQueueSub per connection + └── subscribeConnections_ + ├── partition: send-only/new → immediate results; duplex/rcv → subscribe + ├── resumeDelivery, resumeConnCmds + └── subscribeQueues + ├── checkQueues (filter GET-locked queues) + ├── batchQueues by SMPTransportSession + ├── addPendingSubs (mark pending in currentSubs) + └── mapConcurrently per session: + subscribeSessQueues_ + ├── getSMPServerClient (get/create TCP connection) + ├── subscribeSMPQueues (protocol client: batch TLS write) + ├── processSubResults (STM: pending → active, record failures) + └── notify UP (for newly active connections) +``` + +**Batching**: `batchQueues` groups queues by `SMPTransportSession = (UserId, SMPServer, Maybe ByteString)`. The third field carries the connection ID in entity-session mode (each connection gets its own TCP session) or `Nothing` in shared mode (all queues to the same server share one session). Per-session batches are subscribed concurrently via `mapConcurrently`. + +**Protocol client**: `subscribeSMPQueues` maps each queue to a `SUB` command, batches them into physical TLS writes (respecting server block size limits via `batchTransmissions'`), and awaits responses concurrently. `processSUBResponse_` classifies responses: `OK`/`SOK serviceId` (success), `MSG` (immediate message delivery piggybacked on response), or error. + +### Receive path (router → application) + +``` +Router MSG → TLS → protocol client rcvQ + → processMsg: server push (empty corrId) → STEvent → msgQ + → subscriber thread (Agent.hs): + readTBQueue msgQ → processSMPTransmissions + ├── STEvent MSG → processSMP → withConnLock → decrypt → subQ → Application + ├── STEvent END → removeSubscription → subQ END + ├── STEvent DELD → removeSubscription → subQ DELD + └── STResponse SUB OK → processSubOk → addActiveSub → accumulate UP +``` + +The protocol client's `processMsg` thread classifies each incoming transmission: +- **Non-empty corrId**: response to a pending command - delivered to the waiting `getResponse` caller via `responseVar`. +- **Empty corrId**: server-initiated push (MSG, END, DELD, ENDS) - wrapped as `STEvent` and forwarded to `msgQ`. +- **Expired/unexpected responses**: also forwarded to `msgQ` as `STResponse`. + +The agent's `subscriber` thread reads from `msgQ` and processes all events under `agentOperationBracket AORcvNetwork`. + +### Dual UP event sources + +UP events can originate from two paths: +- **Synchronous** (`subscribeSessQueues_`): after `processSubResults` promotes pending → active, notifies `UP srv connIds` for newly active connections. Used during initial subscription. +- **Asynchronous** (`processSMPTransmissions`): when SUB responses arrive via `msgQ` (e.g., after reconnection), `processSubOk` promotes pending → active and accumulates `upConnIds`, which are batch-notified at the end of the transmission batch. + +Both paths guard against duplicates: they only emit UP for connections that were not already in `activeSubs`. + +--- + +## Reconnection and resubscription + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +### Server-side disconnect cleanup + +When a client disconnects (`clientDisconnected`): + +1. `connected = False` - any STM transaction reading this TVar re-evaluates. +2. All `subscriptions` and `ntfSubscriptions` are swapped to empty maps. +3. Each subscription's delivery thread is killed (`cancelSub`). +4. `deleteSubcribedClient` sets each queue's `TVar (Maybe Client)` to `Nothing` and removes the entry from the `SubscribedClients` map. The `sameClient` check (comparing `clientId`) prevents removing a newer subscriber that connected after the disconnect. +5. The client is removed from `subClients` IntSet. + +After disconnect, the queue's messages remain stored. The next client to SUB the same queue will receive the first pending message in the SUB response. + +### Agent-side reconnection + +When the protocol client detects a TLS disconnect, `smpClientDisconnected` fires in the agent: + +1. `removeSessVar` with CAS check (monotonic `sessionVarId` prevents stale callbacks from removing newer clients). +2. `setSubsPending` demotes all active subscriptions for the matching session to pending in `currentSubs`. +3. `DOWN srv connIds` is sent to the application for affected connections. +4. Resubscription begins - the mechanism depends on transport session mode: + - **Entity-session mode**: `resubscribeSMPSession` spawns a persistent worker thread. + - **Shared mode**: directly calls `subscribeQueues` and `subscribeClientService` without a persistent worker. + +In entity-session mode, the resubscription worker loops with exponential backoff until all pending subscriptions are resubscribed: + +1. Gets or creates a new SMP client connection to the server. +2. Reads pending subscriptions for the session. +3. Calls `subscribeSessQueues_` with `withEvents = True` to re-send SUB commands. +4. On success, subscriptions move from pending → active and `UP` events are emitted. +5. On temporary error, backs off and retries. +6. Worker self-cleans on exit via `removeSessVar`. + +### Stale response protection + +Both subscription paths (synchronous `processSubResults` and asynchronous `processSubOk`) verify that the queue is still pending in `currentSubs` for the **current** session before promoting to active. If a session was replaced between sending SUB and receiving the response, the stale response is silently discarded. This prevents a response from an old TLS session from marking a queue as active when it should be pending for the new session. + +--- + +## Service subscriptions + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Protocol.hs](../../src/Simplex/Messaging/Protocol.hs) + +Service subscriptions are a bulk mechanism where one `SUBS n idsHash` command subscribes all queues associated with a service identity. The service identity is derived from a long-term TLS client certificate presented during the transport handshake. + +### How service subscriptions differ from individual subscriptions + +| Aspect | Individual (SUB) | Service (SUBS) | +|--------|------------------|----------------| +| Granularity | One queue per SUB command | All associated queues in one command | +| Subscriber tracking | `queueSubscribers` (keyed by QueueId) | `serviceSubscribers` (keyed by ServiceId) | +| Displacement signal | `END` per queue | `ENDS n idsHash` per service | +| Message delivery | Immediate (first message in SUB response) | Iterative (`deliverServiceMessages` iterates all queues, sends `ALLS` when complete) | +| Association | Implicit (queue + subscriber) | Explicit (`rcvServiceId` in QueueRec, set via `setQueueService`) | + +### SUBS flow on the router + +1. `sharedSubscribeService` checks the actual queue count and IDs hash against the stored service state, and enqueues a `CSService` event to `subQ` for `serverThread` to process (registration in `serviceSubscribers` happens asynchronously). +2. If this is a new service subscription (not previously subscribed): `deliverServiceMessages` iterates all service-associated queues via `foldRcvServiceMessages`, creates per-queue `Sub` entries, and delivers pending messages. +3. After iteration completes, `ALLS` is sent to signal the client that all pending messages have been delivered. + +For notification servers, `NSUBS` uses the same `sharedSubscribeService` for registration but does not deliver pending messages (no `deliverServiceMessages` call) - notification subscriptions only register for future `NMSG` events. + +For service certificate lifecycle and agent-side service management, see [client-services.md](client-services.md). From 7d6e319fc23c2075445abf58bdbe7e81685e06ec Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 22:52:16 +0000 Subject: [PATCH 50/61] client services --- spec/topics/client-services.md | 221 +++++++++++++++++++++++++++++++++ 1 file changed, 221 insertions(+) create mode 100644 spec/topics/client-services.md diff --git a/spec/topics/client-services.md b/spec/topics/client-services.md new file mode 100644 index 000000000..998fe51fe --- /dev/null +++ b/spec/topics/client-services.md @@ -0,0 +1,221 @@ +# Client Services + +How service certificates enable bulk queue subscriptions: identity lifecycle, queue association, service subscription flow, tracking, reconnection, and notification server usage. This is the cross-cutting view spanning transport, protocol, server, client, agent, and store layers. + +For agent-internal subscription tracking (TSessionSubs service state, active/pending promotion), see [agent/infrastructure.md](../agent/infrastructure.md#subscription-tracking). For the router subscription model and delivery mechanics, see [subscriptions.md](subscriptions.md). For the full implementation reference with types, wire encoding, test gaps, security invariants, and risk analysis, see [rcv-services.md](../rcv-services.md). + +- [Overview](#overview) +- [Service identity lifecycle](#service-identity-lifecycle) +- [Queue-service association](#queue-service-association) +- [Service subscription flow](#service-subscription-flow) +- [Service tracking in TSessionSubs](#service-tracking-in-tsessionsubs) +- [Reconnection and graceful degradation](#reconnection-and-graceful-degradation) +- [Notification server usage](#notification-server-usage) + +--- + +## Overview + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Client.hs](../../src/Simplex/Messaging/Client.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +A **service client** is a high-volume SMP client (notification router, chat relay, directory service) that presents a TLS client certificate during handshake. The router assigns it a persistent `ServiceId` derived from the certificate fingerprint. Individual queues are then associated with this ServiceId via per-queue `SUB` commands carrying a service signature. Once associated, the service client can bulk-subscribe all its queues with a single `SUBS` command instead of O(n) individual `SUB` commands on each reconnection. + +``` +Service client SMP Router + | | + |---- TLS + service cert --------->| Three-way handshake + |<--- ServiceId -------------------| (Transport layer) + | | + |---- SUB + service sig ---------->| Per-queue association + |<--- SOK(ServiceId) --------------| (one-time per queue) + | | + |---- SUBS count idsHash --------->| Bulk subscribe + |<--- SOKS count' idsHash' --------| (server's actual state) + |<--- MSG ... MSG ... MSG ---------| Buffered messages + |<--- ALLS ------------------------| All delivered +``` + +Two version gates control feature availability: `serviceCertsSMPVersion` (v16) enables the service handshake, `SOK`, and dual signatures; `rcvServiceSMPVersion` (v19) adds count+hash parameters to `SUBS`/`NSUBS`/`SOKS`/`ENDS` and enables the messaging service role (`SRMessaging`). Below v19, `SUBS`/`NSUBS` exist but are sent without parameters. + +--- + +## Service identity lifecycle + +**Source**: [Transport.hs](../../src/Simplex/Messaging/Transport.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs), [Agent/Store/AgentStore.hs](../../src/Simplex/Messaging/Agent/Store/AgentStore.hs) + +### Credential generation + +The agent generates a self-signed X.509 certificate per (userId, server) pair on first use via `getServiceCredentials`. The certificate is generated with `genCredentials` using a long validity period and is stored in the `client_services` table along with the private signing key and certificate fingerprint. The `ServiceId` column is NULL until the first successful handshake. + +### Three-way handshake + +Standard SMP handshake is two messages (server sends `SMPServerHandshake`, client sends `SMPClientHandshake`). When the client includes service credentials, an optional third message is added: + +1. **Router -> Client**: standard `SMPServerHandshake` +2. **Client -> Router**: `SMPClientHandshake` with `SMPClientHandshakeService {serviceRole, serviceCertKey}`. The `serviceCertKey` contains the TLS client certificate chain plus a proof-of-possession - a fresh per-session Ed25519 key pair signed by the X.509 signing key. +3. **Router -> Client**: `SMPServerHandshakeResponse {serviceId}`. The router verifies the certificate chain matches the TLS peer certificate, extracts the fingerprint, and calls `getCreateService` to find or create a `ServiceId` for that fingerprint. + +The per-session Ed25519 key (not the X.509 key) is used to sign `SUBS`/`NSUBS` commands. This limits exposure - compromising a session key does not compromise the long-term service identity. + +### Dual signature scheme + +When the TLS handshake established a service identity (the client has a `THClientService`) and the command is `NEW`, `SUB`, or `NSUB` (per `useServiceAuth`), `authTransmission` appends two signatures: + +1. The entity key signs over `serviceCertHash || transmission` - binding the service identity to the queue operation +2. The service session key signs over `transmission` alone + +This prevents MITM service substitution within TLS: an attacker cannot replace the service certificate hash without invalidating the entity key signature. + +### Version-gated role filtering + +Messaging services (`SRMessaging`) are suppressed below v19 - `mkClientService` returns `Nothing` for messaging role when the router version is below `rcvServiceSMPVersion`. Notifier services (`SRNotifier`) are sent at v16+. This allows gradual rollout - routers can support notification service certificates before full messaging service support. + +--- + +## Queue-service association + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Server/QueueStore.hs](../../src/Simplex/Messaging/Server/QueueStore.hs) + +Queues are associated with services through per-queue `SUB` commands (with service signature) or at creation time via `NEW`. The router stores `rcvServiceId :: Maybe ServiceId` on each `QueueRec`. + +### sharedSubscribeQueue - four cases + +`sharedSubscribeQueue` handles the intersection of client type and existing association: + +**Case 1: Service client, queue already associated with this service** - Duplicate association (retry after lost response). If no service subscription exists yet, increments the client's service queue count. + +**Case 2: Service client, queue not yet associated** (or different service) - Calls `setQueueService` to persist the association in `QueueRec`, increments client's `serviceSubsCount` by `(1, queueIdHash rId)`. + +**Case 3: Non-service client, queue has service association** - Calls `setQueueService` with `Nothing` to **remove** the association. This is the migration path when a user disables services. + +**Case 4: Non-service client, no service association** - Standard per-queue subscription, no service involvement. + +### Association persistence + +The `setQueueService` function in QueueStore updates `rcvServiceId` on the queue record and maintains the service's aggregate queue set (`STMService.serviceRcvQueues`). The set and its XOR hash are updated atomically. Associations persist across client disconnect - only live subscription state is cleaned up, not the stored `rcvServiceId`. + +### IdsHash - XOR-based drift detection + +`IdsHash` is a 16-byte value computed as XOR of MD5 hashes of individual queue IDs. XOR is self-inverse, so both `addServiceSubs` and `subtractServiceSubs` use the same `<>` (XOR) operator for the hash component. The count field prevents collision - two different queue sets with the same XOR could have different counts. + +--- + +## Service subscription flow + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Client.hs](../../src/Simplex/Messaging/Client.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +### SUBS command processing + +1. `subscribeServiceMessages` receives `SUBS count idsHash` from the client. +2. `sharedSubscribeService` queries `getServiceQueueCountHash` for the router's actual count and hash, sets `clientServiceSubscribed = True`, and enqueues a `CSService` event to `subQ`. `serverThread` processes this asynchronously: adds the client to `subClients`, adjusts `totalServiceSubs`, and upserts into `serviceSubscribers` (displacing any previous subscriber). +3. Returns `SOKS count' idsHash'` immediately - the client can compare expected vs actual to detect drift. + +### deliverServiceMessages and ALLS + +If this is a new subscription (not duplicate), the router forks `deliverServiceMessages`: + +1. `foldRcvServiceMessages` iterates all queues associated with the service. +2. For each queue with a pending message: `getSubscription` creates a `Sub` in the client's `subscriptions` TMap (if not already present), sets `delivered`, and writes the MSG event to `msgQ` immediately. +3. Queue errors are accumulated in a list whose initial value is `[(NoCorrId, NoEntity, ALLS)]`. Errors are prepended, so ALLS ends up as the last event. +4. After the fold completes, the accumulated events (errors plus ALLS) are written to `msgQ` in one batch. + +MSG events are delivered individually during the fold (not accumulated), while ALLS is deferred to the end - this ensures ALLS arrives only after all pending messages have been sent. + +If the subscription is a duplicate (`hasSub` is `True`), `deliverServiceMessages` is NOT forked - only `SOKS` is returned. + +### On-demand Sub creation for new messages + +When a new message arrives for a service-associated queue via `tryDeliverMessage`, the router looks up the subscriber in `serviceSubscribers` (by ServiceId) rather than `queueSubscribers` (by QueueId). If no `Sub` exists in the client's `subscriptions` TMap (the fold hasn't reached this queue yet, or the queue was associated after SUBS), `newServiceDeliverySub` creates one on the fly. The fold's `getSubscription` performs the same check. STM serialization ensures at most one path creates the Sub for a given queue. + +### Service displacement + +When a new service client subscribes to the same ServiceId and the previous subscriber is a different, still-connected client, `cancelServiceSubs` atomically zeros out the old client's `clientServiceSubs` counter and prepares an `ENDS count idsHash` event. `endPreviousSubscriptions` then swaps out the old client's individual subscription map, cancels per-queue Subs, and places ENDS in `pendingEvents` for deferred delivery via `sendPendingEvtsThread`. The old client's fold thread (if still running from `deliverServiceMessages`) continues writing to the old client's `msgQ` until ALLS, then exits. + +--- + +## Service tracking in TSessionSubs + +**Source**: [Agent/TSessionSubs.hs](../../src/Simplex/Messaging/Agent/TSessionSubs.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +### Aggregate tracking - service queues are not in activeSubs + +When a queue has both a matching `serviceId` and `serviceAssoc = True`, it is tracked only via the count and hash in `activeServiceSub`, **not** in the `activeSubs` TMap. Callers pre-separate queues into two lists before calling `batchAddActiveSubs`: non-service queues go to `activeSubs`, service-associated queues are counted via `updateActiveService`. A queue on a service-capable session but with `serviceAssoc = False` still lands in `activeSubs` normally. Consequence: `hasActiveSub(rId)` returns `False` for service-associated queues - callers must check the service subscription separately. + +### Session ID gating + +`setActiveServiceSub` only promotes the service subscription from pending to active if the session ID matches the current TLS session. If a reconnection occurred between sending SUBS and receiving SOKS, the stale response is kept as pending rather than promoted. This prevents a response from an old session from corrupting the new session's state. + +### State transitions + +- **setPendingServiceSub**: stores expected `ServiceSub` before SUBS is sent +- **setActiveServiceSub**: promotes to active after SOKS, with session ID validation +- **updateActiveService**: incrementally builds the active service sub as individual queues return `SOK(Just serviceId)` - used when per-queue SUBs succeed with service association +- **setServiceSubPending_**: demotes active to pending on disconnect (called by `setSubsPending`) +- **deleteServiceSub**: clears both active and pending on ENDS + +### Service events + +| Event | When | +|-------|------| +| `SERVICE_UP srv result` | SUBS succeeded; `ServiceSubResult` carries any drift errors (count/hash/serviceId mismatch) | +| `SERVICE_DOWN srv sub` | Client disconnected while service was subscribed | +| `SERVICE_ALL srv` | ALLS received - all buffered messages delivered | +| `SERVICE_END srv sub` | ENDS received - another service client took over | + +All are entity-less (`AENone`) events. + +--- + +## Reconnection and graceful degradation + +**Source**: [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +### updateClientService - credential synchronization + +After each SMP connection, `updateClientService` reconciles the agent's stored ServiceId with the router's: + +- **ServiceId matches**: normal path, no action needed +- **ServiceId changed** (router data was reset): calls `removeRcvServiceAssocs` to clear all queue-service associations for this server, forcing re-association via individual SUBs +- **Router lost service support** (version downgrade): calls `deleteClientService` to remove the local service record entirely +- **Router returned ServiceId without credentials**: logs error (should not happen) + +### Resubscription ordering + +On reconnect, the resubscription worker processes the pending service subscription **before** individual queues. This ensures the service context is established before queue-level SUB commands that depend on it (the router uses `clntServiceId` from the TLS session for queue-service association). + +### Fallback to individual subscriptions + +`resubscribeClientService` handles two error classes by falling back to `unassocSubscribeQueues`: + +- `SSErrorServiceId` - the router returned a different ServiceId than expected +- `clientServiceError` - matches `NO_SERVICE`, `SERVICE`, and `PROXY(BROKER NO_SERVICE)` errors + +`unassocSubscribeQueues` deletes the `client_services` row, sets `rcv_service_assoc = 0` on all queues, and resubscribes them individually. This is the nuclear recovery path - service state is fully reset, and the next connection will generate fresh credentials. + +### Agent store triggers + +The agent's `client_services` table tracks `service_queue_count` and `service_queue_ids_hash`. SQLite triggers on `rcv_queues` automatically maintain these counters when `rcv_service_assoc` changes. The triggers use `simplex_xor_md5_combine` - the SQLite equivalent of Haskell's `queueIdHash <>`. On credential update (new cert), `service_id` is set to NULL via `ON CONFLICT DO UPDATE`, forcing a fresh handshake. + +--- + +## Notification server usage + +**Source**: [Notifications/Server.hs](../../src/Simplex/Messaging/Notifications/Server.hs), [Notifications/Server/Env.hs](../../src/Simplex/Messaging/Notifications/Server/Env.hs) + +The notification server is the primary consumer of service certificates for the `SRNotifier` role. It manages thousands to millions of SMP queue subscriptions per SMP router. + +### Credential management + +`NtfServerConfig.useServiceCreds` controls whether the NTF server uses service certificates. On first use per SMP router, `mkDbService` generates a self-signed TLS certificate (stored in the `smp_servers` table) and reuses it across connections. + +### Startup subscription + +If a stored service subscription exists, `subscribeSrvSubs` sends `NSUBS` first (one command for all associated queues), then subscribes all queues individually in batches via `subscribeQueuesNtfs` (including service-associated queues, which were previously associated via `NSUB`). + +### Recovery path + +On `CAServiceUnavailable` (irrecoverable service error, e.g., ServiceId mismatch after cert rotation), `removeServiceAndAssociations` performs nuclear recovery: clears all service credentials, resets counters, removes all `ntf_service_assoc` flags, and resubscribes all queues individually. The Postgres schema uses `xor_combine` triggers (equivalent to the agent's SQLite triggers) to maintain per-SMP-server notifier count and hash. + +### NSUBS vs SUBS + +`NSUBS` uses the same `sharedSubscribeService` for registration in `serviceSubscribers` but does **not** fork `deliverServiceMessages`. Notification delivery is handled by the separate `deliverNtfsThread` which uses `serviceSubscribers` to look up the subscribed service client for each notification queue. Consequently, there is no `ALLS` signal for NSUBS subscriptions. From 9b15cdc5259812dcf41e3ecc29afa7b92e5f3ae6 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sat, 14 Mar 2026 23:34:36 +0000 Subject: [PATCH 51/61] notifications spec --- spec/topics/client-services.md | 8 +- spec/topics/notifications.md | 286 +++++++++++++++++++++++++++++++++ spec/topics/subscriptions.md | 15 +- 3 files changed, 297 insertions(+), 12 deletions(-) create mode 100644 spec/topics/notifications.md diff --git a/spec/topics/client-services.md b/spec/topics/client-services.md index 998fe51fe..f2e0f287b 100644 --- a/spec/topics/client-services.md +++ b/spec/topics/client-services.md @@ -107,7 +107,7 @@ The `setQueueService` function in QueueStore updates `rcvServiceId` on the queue ### SUBS command processing 1. `subscribeServiceMessages` receives `SUBS count idsHash` from the client. -2. `sharedSubscribeService` queries `getServiceQueueCountHash` for the router's actual count and hash, sets `clientServiceSubscribed = True`, and enqueues a `CSService` event to `subQ`. `serverThread` processes this asynchronously: adds the client to `subClients`, adjusts `totalServiceSubs`, and upserts into `serviceSubscribers` (displacing any previous subscriber). +2. `sharedSubscribeService` queries `getServiceQueueCountHash` for the router's actual count and hash. In one STM transaction, sets `clientServiceSubscribed = True` and swaps the client's service subs counter to the server's actual values (computing a delta). In a separate STM transaction, enqueues a `CSService` event (carrying the delta) to `subQ`. `serverThread` processes this asynchronously: adds the client to `subClients`, subtracts the delta from `totalServiceSubs` (preventing double-counting of per-queue accumulated counts), and upserts into `serviceSubscribers` (displacing any previous subscriber). 3. Returns `SOKS count' idsHash'` immediately - the client can compare expected vs actual to detect drift. ### deliverServiceMessages and ALLS @@ -115,7 +115,7 @@ The `setQueueService` function in QueueStore updates `rcvServiceId` on the queue If this is a new subscription (not duplicate), the router forks `deliverServiceMessages`: 1. `foldRcvServiceMessages` iterates all queues associated with the service. -2. For each queue with a pending message: `getSubscription` creates a `Sub` in the client's `subscriptions` TMap (if not already present), sets `delivered`, and writes the MSG event to `msgQ` immediately. +2. For each queue with a pending message: `getSubscription` creates a `Sub` in the client's `subscriptions` TMap if not already present (returning `Nothing` for duplicates). If a new Sub is created, `setDelivered` records the message and the MSG event is written to `msgQ` immediately. 3. Queue errors are accumulated in a list whose initial value is `[(NoCorrId, NoEntity, ALLS)]`. Errors are prepended, so ALLS ends up as the last event. 4. After the fold completes, the accumulated events (errors plus ALLS) are written to `msgQ` in one batch. @@ -129,7 +129,7 @@ When a new message arrives for a service-associated queue via `tryDeliverMessage ### Service displacement -When a new service client subscribes to the same ServiceId and the previous subscriber is a different, still-connected client, `cancelServiceSubs` atomically zeros out the old client's `clientServiceSubs` counter and prepares an `ENDS count idsHash` event. `endPreviousSubscriptions` then swaps out the old client's individual subscription map, cancels per-queue Subs, and places ENDS in `pendingEvents` for deferred delivery via `sendPendingEvtsThread`. The old client's fold thread (if still running from `deliverServiceMessages`) continues writing to the old client's `msgQ` until ALLS, then exits. +When a new service client subscribes to the same ServiceId and the previous subscriber is a different, still-connected client, `cancelServiceSubs` atomically zeros out the old client's service subs counter and prepares an `ENDS count idsHash` event. `endPreviousSubscriptions` first inserts ENDS into `pendingEvents` (for deferred delivery via `sendPendingEvtsThread`), then subtracts the changed subs from `totalServiceSubs`, swaps out the old client's individual subscription map to empty, and cancels per-queue Subs. The old client's fold thread (if still running from `deliverServiceMessages`) continues writing to the old client's `msgQ` until ALLS, then exits. --- @@ -214,7 +214,7 @@ If a stored service subscription exists, `subscribeSrvSubs` sends `NSUBS` first ### Recovery path -On `CAServiceUnavailable` (irrecoverable service error, e.g., ServiceId mismatch after cert rotation), `removeServiceAndAssociations` performs nuclear recovery: clears all service credentials, resets counters, removes all `ntf_service_assoc` flags, and resubscribes all queues individually. The Postgres schema uses `xor_combine` triggers (equivalent to the agent's SQLite triggers) to maintain per-SMP-server notifier count and hash. +On `CAServiceUnavailable` (irrecoverable service error, e.g., ServiceId mismatch after cert rotation), `removeServiceAndAssociations` performs nuclear DB cleanup: clears all service credentials, resets counters, and removes all `ntf_service_assoc` flags. The caller then resubscribes all queues individually via `subscribeSrvSubs`. The Postgres schema uses `xor_combine` triggers (equivalent to the agent's SQLite triggers) to maintain per-SMP-server notifier count and hash. ### NSUBS vs SUBS diff --git a/spec/topics/notifications.md b/spec/topics/notifications.md new file mode 100644 index 000000000..6f086afd2 --- /dev/null +++ b/spec/topics/notifications.md @@ -0,0 +1,286 @@ +# Notifications + +How push notifications work: encryption architecture, SMP server notification infrastructure, NTF server processing, agent subscription supervisor, and push notification delivery. This is the cross-cutting view spanning SMP server, NTF server, agent, and push provider layers. + +For service certificate lifecycle and NSUBS bulk subscription, see [client-services.md](client-services.md). For the router subscription model, see [subscriptions.md](subscriptions.md). For the worker framework used by NtfSubSupervisor, see [agent/infrastructure.md](../agent/infrastructure.md#worker-framework). + +- [End-to-end flow](#end-to-end-flow) +- [Encryption architecture](#encryption-architecture) +- [SMP server notification infrastructure](#smp-server-notification-infrastructure) +- [NTF server](#ntf-server) +- [Agent NtfSubSupervisor](#agent-ntfsubsupervisor) +- [Push notification processing](#push-notification-processing) + +--- + +## End-to-end flow + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Notifications/Server.hs](../../src/Simplex/Messaging/Notifications/Server.hs), [Agent.hs](../../src/Simplex/Messaging/Agent.hs) + +### Setup (one-time per device) + +1. App calls `registerNtfToken` with device token and `NMInstant` mode. +2. Agent sends `TNEW` to NTF server - NTF server sends verification code via APNs. +3. App receives push notification, extracts code, calls `verifyNtfToken` (sends `TVFY`). +4. Token becomes `NTActive`. Agent calls `initializeNtfSubs` for all active connections. + +### Per-connection subscription setup (dual worker pipeline) + +``` +ntfSubQ (NSCCreate) + -> NtfSubSupervisor: partitions queues by SMP server + -> SMP worker: NKEY authKey dhKey -> SMP server + <- SMP server: NID notifierId srvDhKey + -> Agent stores ClientNtfCreds (notifierId, rcvNtfDhSecret) + -> NTF worker: SNEW tknId (server, notifierId) ntfPrivKey -> NTF server + -> NTF server stores sub, sends NSUB to SMP server + <- SMP server registers NTF server as notification subscriber +``` + +### Message notification delivery + +``` +Sender -> SEND msg (notification=True) -> SMP server + -> enqueueNotification: encrypt NMsgMeta with rcvNtfDhSecret -> NtfStore + -> deliverNtfsThread (periodic): NMSG nonce encMeta -> NTF server + -> ntfSubscriber.receiveSMP: PNMessageData -> addTokenLastNtf -> pushQ + -> ntfPush: encrypt PNMessageData list with tknDhSecret -> APNs -> device + -> App wakes, calls getNotificationConns + -> Agent: decrypt with tknDhSecret, then decrypt encMeta with rcvNtfDhSecret + -> App fetches actual message from SMP server +``` + +--- + +## Encryption architecture + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Notifications/Server.hs](../../src/Simplex/Messaging/Notifications/Server.hs), [Agent.hs](../../src/Simplex/Messaging/Agent.hs) + +The notification system uses two independent encryption layers to ensure no single entity (other than the recipient) can correlate queue identity with message metadata. + +### Layer 1: SMP server to recipient (rcvNtfDhSecret) + +When the agent sends `NKEY authKey dhKey` to the SMP server, both sides compute a DH shared secret (`rcvNtfDhSecret`). The SMP server uses this to encrypt `NMsgMeta {msgId, msgTs}` inside each `NMSG`. The NTF server cannot decrypt this - it forwards the encrypted blob opaquely. + +### Layer 2: NTF server to device (tknDhSecret) + +During `TNEW`, the agent and NTF server establish `tknDhSecret` via DH exchange. The NTF server encrypts the entire `PNMessageData` list (containing `smpQueue`, `ntfTs`, `nmsgNonce`, `encNMsgMeta`) with this secret before sending via APNs. + +### What each entity can see + +| Entity | Queue identity | Message metadata | Message content | +|--------|---------------|-----------------|----------------| +| SMP server | Yes (stores queue) | Yes (creates NMsgMeta) | No (E2E encrypted) | +| NTF server | Yes (smpQueue in PNMessageData) | No (encNMsgMeta opaque) | No | +| Push provider (APNs) | No (tknDhSecret encrypted) | No | No | +| Recipient | Yes | Yes (two-layer decrypt) | Yes | + +### Device-side two-layer decryption + +In `getNotificationConns`, the agent decrypts in two steps: +1. Decrypt push payload with `tknDhSecret` (NTF-to-device) to get `PNMessageData` list +2. For each entry, decrypt `encNMsgMeta` with `rcvNtfDhSecret` (SMP-to-recipient) to get `NMsgMeta {msgId, msgTs}` + +--- + +## SMP server notification infrastructure + +**Source**: [Server.hs](../../src/Simplex/Messaging/Server.hs), [Server/NtfStore.hs](../../src/Simplex/Messaging/Server/NtfStore.hs) + +### Notifier credentials on queues + +Each queue's `QueueRec` has an optional `notifier :: Maybe NtfCreds` containing: +- `notifierId` - the entity ID the NTF server uses for NSUB +- `notifierKey` - public auth key for verifying NSUB commands +- `rcvNtfDhSecret` - shared secret for encrypting notification metadata +- `ntfServiceId` - optional service association for bulk NSUBS + +`NKEY` creates these credentials (generating the DH shared secret server-side). `NDEL` removes them and deletes pending notifications from NtfStore. + +### Notification generation + +When a sender sends a message with `notification msgFlags == True`, `enqueueNotification` creates a `MsgNtf` containing `NMsgMeta {msgId, msgTs}` encrypted with `rcvNtfDhSecret` and a random nonce. The notification is stored in the in-memory `NtfStore` (a `TMap NotifierId (TVar [MsgNtf])`) - multiple notifications can accumulate per queue. + +### deliverNtfsThread - periodic batch delivery + +Runs every `ntfDeliveryInterval` microseconds. Each cycle: + +1. Reads all pending notifications from `NtfStore`. +2. Calls `getQueueNtfServices` to partition notifications by service association. +3. For service-associated queues: delivers NMSG to the subscribed service client via `serviceSubscribers`. +4. For non-service queues: iterates through `subClients` and delivers to individually-subscribed clients. +5. Each NMSG contains `(ntfNonce, encNMsgMeta)` - the encrypted notification metadata. +6. All pending notifications for a given client are delivered in one cycle (no per-cycle cap). Transmissions are batched into TLS frames by the transport layer. +7. Notifications for deleted queues (discovered during partitioning) are cleaned up from `NtfStore`. + +This is periodic, not event-driven - there is a deliberate latency trade-off to reduce overhead. Notifications are not pushed immediately when a message arrives. + +--- + +## NTF server + +**Source**: [Notifications/Server.hs](../../src/Simplex/Messaging/Notifications/Server.hs), [Notifications/Server/Env.hs](../../src/Simplex/Messaging/Notifications/Server/Env.hs) + +### Architecture + +Three main concurrent threads: + +- **ntfSubscriber**: receives NMSG events from SMP servers and SMP client agent state changes +- **ntfPush**: sends push notifications (APNs/Firebase) from a bounded queue +- **periodicNtfsThread**: sends periodic "check messages" background notifications based on per-token cron intervals + +### Token lifecycle + +``` +NTRegistered (after TNEW, verification push sent) + -> NTConfirmed (APNs accepts verification push delivery) + -> NTActive (after TVFY with correct code) + +Any state -> NTInvalid (push provider reports token invalid during any push) +Any state -> NTExpired (provider reports token expired) +``` + +`NTNew` exists only on the agent side (pre-registration); the NTF server creates tokens directly in `NTRegistered`. `NTInvalid` can be reached from any state where a push delivery is attempted (including `NTRegistered` during verification), not only from `NTActive`. + +`allowTokenVerification` permits TVFY from `NTRegistered`, `NTConfirmed`, and `NTActive` states. `TRPL` replaces the device token (e.g., after OS token refresh) while keeping all subscriptions - it resets status to `NTRegistered` and re-sends verification. + +### Subscription handling + +`SNEW tknId (SMPQueueNtf smpServer notifierId) ntfPrivateKey` creates a subscription record and delegates to the SMP subscriber infrastructure: + +1. `subscribeNtfs` gets or creates a per-SMP-server `SMPSubscriber` thread. +2. The subscriber thread reads from its queue and calls `subscribeQueuesNtfs`, which sends `NSUB` to the SMP server using the `ntfPrivateKey` provided by the agent. +3. `SCHK` returns the current subscription status; the agent uses this for periodic health checks. + +### ntfSubscriber - receiving from SMP + +Runs two concurrent sub-threads: + +**receiveSMP**: reads from the SMP client agent's `msgQ`: +- `NMSG nmsgNonce encNMsgMeta`: Creates `PNMessageData`, calls `addTokenLastNtf` to look up the owning token and aggregate with other recent notifications, then enqueues `PNMessage` to `pushQ`. +- `END`: Updates subscription status to `NSEnd`. +- `DELD`: Updates subscription status to `NSDeleted`. + +**receiveAgent**: reads from `agentQ` for client state changes: +- `CAConnected`: Logs reconnection (no status update). +- `CADisconnected`: Updates affected subscriptions to `NSInactive`. +- `CASubscribed`: Marks subscriptions as `NSActive`. +- `CASubError`: Updates individual subscription errors. +- `CAServiceDisconnected` / `CAServiceSubError`: Logs only. +- `CAServiceSubscribed`: Logs, warns on count/hash mismatches. +- `CAServiceUnavailable`: Calls `removeServiceAndAssociations` - nuclear recovery (see [client-services.md](client-services.md#notification-server-usage)). + +### Token-level notification batching + +`addTokenLastNtf` is critical for push efficiency. The `last_notifications` table is keyed by `(token_id, subscription_id)` and UPSERT'd - each subscription contributes only its most recent notification. When a push is sent, multiple `PNMessageData` entries for the same token are combined into a single APNs payload. This means one push notification can carry metadata for messages across multiple queues. + +### Push notification types + +| Type | Content | Trigger | +|------|---------|---------| +| `PNVerification` | Encrypted registration code | TNEW / TRPL | +| `PNMessage` | Encrypted `PNMessageData` list | NMSG from SMP server | +| `PNCheckMessages` | `{"checkMessages": true}` | periodicNtfsThread (cron) | + +`PNMessage` is sent as a mutable-content alert ("Encrypted message or another app event"). `PNVerification` and `PNCheckMessages` are silent background notifications. + +--- + +## Agent NtfSubSupervisor + +**Source**: [Agent/NtfSubSupervisor.hs](../../src/Simplex/Messaging/Agent/NtfSubSupervisor.hs), [Agent/Env/SQLite.hs](../../src/Simplex/Messaging/Agent/Env/SQLite.hs) + +### Supervisor structure + +``` +NtfSupervisor + ntfTkn :: TVar (Maybe NtfToken) -- current active token + ntfSubQ :: TBQueue (NtfSupervisorCommand, NonEmpty ConnId) + ntfWorkers :: TMap NtfServer Worker -- per-NTF-server + ntfSMPWorkers :: TMap SMPServer Worker -- per-SMP-server + ntfTknDelWorkers :: TMap NtfServer Worker -- token deletion +``` + +The main loop (`runNtfSupervisor`) reads commands from `ntfSubQ` and dispatches to `processNtfCmd`. Commands are only enqueued when `hasInstantNotifications` is true (active token in `NMInstant` mode). + +### Dual worker pipeline + +SMP workers and NTF workers form a two-stage pipeline, communicating through the DB-persisted `NtfSubAction`: + +**Stage 1 - SMP workers** (`runNtfSMPWorker`): +- `NSASmpKey`: Generates auth+DH key pairs, sends `NKEY` to SMP server, stores `ClientNtfCreds`, then sets action to `NSANtf NSACreate` and kicks NTF workers. +- `NSASmpDelete`: Resets notifier credentials, sends `NDEL` to SMP server, deletes the subscription. + +**Stage 2 - NTF workers** (`runNtfWorker`): +- `NSACreate`: Sends `SNEW` to NTF server, stores `ntfSubId`, schedules first check. +- `NSACheck`: Sends `SCHK` to NTF server. AUTH errors from the check are handled separately - those subscriptions are immediately recreated via `recreateNtfSub`. For successful checks, if the subscription is in a subscribe-able status (`NSNew`, `NSPending`, `NSActive`, `NSInactive`), reschedules next check. Any other status (ended, deleted, service error, etc.) also triggers recreation from scratch (resets to `NSASmpKey`). + +### Cross-protocol link + +The SMP workers (`enableQueuesNtfs` / `disableQueuesNtfs` in `Agent/Client.hs`) use the agent's normal SMP client pool to send `NKEY`/`NDEL` to SMP servers. This is the cross-protocol dependency visible in the agent architecture - notification subscription setup requires SMP protocol operations. + +### Subscription state machine + +``` +(new connection, notifications enabled) + -> NSASMP NSASmpKey -- SMP worker: send NKEY to SMP server + -> NSANtf NSACreate -- NTF worker: send SNEW to NTF server + -> NSANtf NSACheck -- NTF worker: periodic SCHK + -> (steady state) + +(notifications disabled or connection deleted) + -> NSASMP NSASmpDelete -- SMP worker: send NDEL to SMP server + -> (subscription deleted) + +(check fails: subscription ended/deleted/auth) + -> NSASMP NSASmpKey -- restart from scratch +``` + +Each action is persisted in the store before execution, so the pipeline resumes after agent restart. Workers use `withRetryInterval` for temporary errors. + +### NotificationsMode + +- **NMInstant**: NTF server maintains active NSUB subscriptions and pushes immediately when messages arrive. Requires the full dual-worker pipeline. +- **NMPeriodic**: No NSUB subscriptions. NTF server sends periodic `PNCheckMessages` background notifications based on `tknCronInterval` (set via `TCRN`). Device wakes and fetches messages on its own schedule. + +Switching from NMInstant to NMPeriodic triggers `deleteNtfSubs` which flushes the `ntfSubQ` and sends `NSCSmpDelete` commands through the async worker pipeline to remove all notification subscriptions. + +--- + +## Push notification processing + +**Source**: [Agent.hs](../../src/Simplex/Messaging/Agent.hs), [Notifications/Server.hs](../../src/Simplex/Messaging/Notifications/Server.hs) + +### getNotificationConns - device wake path + +When the device wakes from a push notification, the app calls `getNotificationConns`: + +1. Retrieves the active token's `ntfDhSecret`. +2. Decrypts the push payload using `ntfDhSecret` and the nonce from the APNs notification. +3. Parses the result as `NonEmpty PNMessageData` (semicolon-separated list). +4. For each entry: + - Looks up the `RcvQueue` by `smpQueue` (`SMPServer` + `notifierId`) via `getNtfRcvQueue`. + - Decrypts `encNMsgMeta` using the queue's `rcvNtfDhSecret` and `nmsgNonce` to get `NMsgMeta {msgId, msgTs}`. +5. Filters "init" notifications (all but the last) by comparing `msgTs` against `lastBrokerTs` - notifications with timestamps not newer than the last seen broker timestamp are discarded. If `lastBrokerTs` is not set, the notification passes through. +6. Returns `NonEmpty NotificationInfo` for the app to fetch actual messages. + +### Token registration state machine + +`registerNtfToken` handles multiple states based on `(ntfTokenId, ntfTknAction)`: + +- `(Nothing, Just NTARegister)`: Re-register (first attempt failed after key generation). +- `(Just tknId, Nothing)`: Same device token - re-register; different token - replace via `TRPL`. +- `(Just tknId, Just NTAVerify code)`: Same device token - verify; different token - replace via `TRPL`. +- `(Just tknId, Just NTACheck)`: Same device token - check status, then initialize or delete subscriptions based on mode; different token - replace via `TRPL`. + +All `(Just tknId, ...)` branches check whether the device token changed and fall through to `replaceToken` on mismatch. + +### ntfSubQ writers + +The `ntfSubQ` is written by multiple paths in `Agent.hs`, all via `sendNtfSubCommand`: +- `sendNtfCreate` - during `subscribeConnections_` and `subscribeAllConnections'` (writes both `NSCCreate` and `NSCSmpDelete` depending on per-connection `enableNtfs`) +- `toggleConnectionNtfs'` - when the app enables/disables notifications for a connection +- `initializeNtfSubs` / `deleteNtfSubs` - during token activation and mode switching +- `newQueueNtfSubscription` - when joining a new connection +- `unsubNtfConnIds` - writes `NSCDeleteSub` during connection deletion +- `ICQDelete` async command handler - during queue rotation diff --git a/spec/topics/subscriptions.md b/spec/topics/subscriptions.md index b7bc9fa1c..54e1573ee 100644 --- a/spec/topics/subscriptions.md +++ b/spec/topics/subscriptions.md @@ -24,7 +24,7 @@ The router tracks which client connection is subscribed to each queue. At most o 1. **STM re-evaluation**: any transaction reading the TVar automatically re-evaluates when the subscriber changes (disconnects, gets displaced). This is used by `tryDeliverMessage` - if the subscriber disconnects mid-delivery, the STM transaction retries and sees `Nothing`. -2. **Reconnection continuity**: when a mobile client disconnects and reconnects, the TVar is reused rather than recreated. Subscriptions that were made at any point are never removed from the map - this is a deliberate trade-off for intermittently connected mobile clients. +2. **Reconnection continuity**: when a mobile client disconnects and reconnects, the TVar can be reused rather than recreated if a new subscription is established before cleanup. On disconnect, `deleteSubcribedClient` removes entries from the map (with a `sameClient` guard to avoid removing a newer subscriber). The `SubscribedClients` constructor is not exported from `Server/Env/STM.hs` (only the type is). All access goes through `getSubscribedClient` (IO, outside STM) and `upsertSubscribedClient` (STM). This prevents accidental use of `TM.lookup` inside STM transactions, which would add the entire TMap to the transaction's read set. @@ -82,12 +82,12 @@ The router delivers at most one unacknowledged message per subscription. The `de **Phase 1 - outside STM**: `getSubscribedClient` reads the `SubscribedClients` TMap via `readTVarIO` (IO, not STM). If no subscriber exists, the function returns immediately without entering any STM transaction. This avoids transaction overhead for queues with no active subscriber. -**Phase 2 - STM transaction** (`deliverToSub`): reads the client TVar (inside STM, so the transaction re-evaluates if the subscriber changes), checks `subThread == NoSub` and `delivered == Nothing`. Then: +**Phase 2 - STM transaction** (`deliverToSub`): reads the client TVar (inside STM, so the transaction re-evaluates if the subscriber changes), checks that `subThread` is `ServerSub` (not `ProhibitSub`), reads the inner `SubscriptionThread` TVar for `NoSub`, and checks `delivered == Nothing`. Then: - If the client's `sndQ` is **not full**: delivers the message directly in the same STM transaction (`writeTBQueue sndQ`), sets `delivered`. No thread is needed. This is the fast path. - If the client's `sndQ` is **full**: sets `subThread = SubPending` and returns the client + sub for phase 3. -**Phase 3 - forked thread** (`forkDeliver`): a `deliverThread` is spawned that blocks until the `sndQ` has room. Before delivering, it re-checks that the subscriber is still the same client and `delivered` is still `Nothing` - handling the race where the client disconnected and a new one subscribed between phases 2 and 3. +**Phase 3 - forked thread** (`forkDeliver`): a `deliverThread` is spawned that blocks until the `sndQ` has room. Before delivering, it re-checks that the subscriber is still the same client and `delivered` is still `Nothing` - handling the race where the client disconnected and a new one subscribed between phases 2 and 3. Note: for service-subscribed queues, phase 1 dispatches to `serviceSubscribers` (by ServiceId), but `deliverThread` in phase 3 always uses `queueSubscribers` (by QueueId) - if the queue is only service-subscribed, the phase 3 lookup silently no-ops. ### Per-queue encryption @@ -137,9 +137,9 @@ Router MSG → TLS → protocol client rcvQ ``` The protocol client's `processMsg` thread classifies each incoming transmission: -- **Non-empty corrId**: response to a pending command - delivered to the waiting `getResponse` caller via `responseVar`. +- **Non-empty corrId, matching pending command**: response to a pending command - delivered to the waiting `getResponse` caller via `responseVar`. - **Empty corrId**: server-initiated push (MSG, END, DELD, ENDS) - wrapped as `STEvent` and forwarded to `msgQ`. -- **Expired/unexpected responses**: also forwarded to `msgQ` as `STResponse`. +- **Non-empty corrId, no matching command**: forwarded to `msgQ` as `STUnexpectedError`. Expired responses (command was pending but timed out) are forwarded as `STResponse` only if the entity ID matches. The agent's `subscriber` thread reads from `msgQ` and processes all events under `agentOperationBracket AORcvNetwork`. @@ -173,8 +173,7 @@ After disconnect, the queue's messages remain stored. The next client to SUB the When the protocol client detects a TLS disconnect, `smpClientDisconnected` fires in the agent: -1. `removeSessVar` with CAS check (monotonic `sessionVarId` prevents stale callbacks from removing newer clients). -2. `setSubsPending` demotes all active subscriptions for the matching session to pending in `currentSubs`. +1-2. Atomically (single STM transaction via `removeClientAndSubs`): `removeSessVar` with CAS check (monotonic `sessionVarId` prevents stale callbacks from removing newer clients), then `setSubsPending` demotes all active subscriptions for the matching session to pending in `currentSubs`. 3. `DOWN srv connIds` is sent to the application for affected connections. 4. Resubscription begins - the mechanism depends on transport session mode: - **Entity-session mode**: `resubscribeSMPSession` spawns a persistent worker thread. @@ -213,7 +212,7 @@ Service subscriptions are a bulk mechanism where one `SUBS n idsHash` command su ### SUBS flow on the router -1. `sharedSubscribeService` checks the actual queue count and IDs hash against the stored service state, and enqueues a `CSService` event to `subQ` for `serverThread` to process (registration in `serviceSubscribers` happens asynchronously). +1. `sharedSubscribeService` queries the actual queue count and IDs hash from the store, computes drift statistics (for monitoring, not enforcement), and enqueues a `CSService` event to `subQ` for `serverThread` to process (registration in `serviceSubscribers` happens asynchronously). 2. If this is a new service subscription (not previously subscribed): `deliverServiceMessages` iterates all service-associated queues via `foldRcvServiceMessages`, creates per-queue `Sub` entries, and delivers pending messages. 3. After iteration completes, `ALLS` is sent to signal the client that all pending messages have been delivered. From 259e950282b15214a4e2dbf12a9795cab8b9c7b1 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 08:52:52 +0000 Subject: [PATCH 52/61] transport --- spec/topics/transport.md | 266 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 266 insertions(+) create mode 100644 spec/topics/transport.md diff --git a/spec/topics/transport.md b/spec/topics/transport.md new file mode 100644 index 000000000..2c3de1cf1 --- /dev/null +++ b/spec/topics/transport.md @@ -0,0 +1,266 @@ +# Transport + +How data moves over the wire: TLS infrastructure, protocol handshake family, block framing, transmission encoding, version negotiation, and connection management. This is the cross-cutting view spanning TLS setup, protocol-specific handshakes, block-level framing, and the protocol client's thread architecture. + +For service certificate handshake extensions, see [client-services.md](client-services.md). For the protocol client's role in subscription flow, see [subscriptions.md](subscriptions.md). For the SMP protocol specification, see [simplex-messaging.md](../../protocol/simplex-messaging.md). + +- [TLS infrastructure](#tls-infrastructure) +- [Handshake protocol family](#handshake-protocol-family) +- [Block framing](#block-framing) +- [Transmission encoding and signing](#transmission-encoding-and-signing) +- [Version negotiation](#version-negotiation) +- [Connection management](#connection-management) + +--- + +## TLS infrastructure + +**Source**: [Transport.hs](../../src/Simplex/Messaging/Transport.hs), [Transport/Credentials.hs](../../src/Simplex/Messaging/Transport/Credentials.hs), [Transport/Client.hs](../../src/Simplex/Messaging/Transport/Client.hs) + +### Certificate generation + +`genCredentials` generates Ed25519 key pairs and self-signed (or parent-signed) X.509 v3 certificates. Validity periods use an `Hours` offset type. The certificate serial number is always 1; nanoseconds are stripped from timestamps during encoding. + +For CA + leaf chains (used by routers), a root CA certificate signs a leaf certificate. The leaf's private key is used for per-session signing. For self-signed certificates (used by service clients), a single certificate serves both purposes. + +### TLS parameters + +`defaultSupportedParams` configures a minimal, high-security cipher suite: + +| Parameter | Value | +|-----------|-------| +| TLS versions | TLS 1.3, TLS 1.2 | +| TLS 1.3 cipher | CHACHA20-POLY1305-SHA256 | +| TLS 1.2 cipher | ECDHE-ECDSA-CHACHA20-POLY1305-SHA256 | +| Hash-signature pairs | Ed448, Ed25519 (both HashIntrinsic) | +| DH groups | X448, X25519 | +| Secure renegotiation | Disabled | + +`defaultSupportedParamsHTTPS` extends this with browser-compatible ciphers (RSA, ECDSA with SHA256/384/512, FFDHE groups, P521) for XFTP web clients. + +### Session identity + +Both sides derive `sessionId` from the TLS-unique channel binding value (RFC 5929). The server reads `T.getPeerFinished`; the client reads `T.getFinished`. This `sessionId` is used throughout the session - in handshake validation, transmission signing, and block encryption key derivation. + +--- + +## Handshake protocol family + +**Source**: [Transport.hs](../../src/Simplex/Messaging/Transport.hs), [Notifications/Transport.hs](../../src/Simplex/Messaging/Notifications/Transport.hs) + +All three protocols (SMP, NTF, XFTP) use the same TLS transport, but their application-level handshakes differ in complexity. SMP and NTF use a block-based handshake over TLS; XFTP uses HTTP/2 POST with a custom handshake. + +### SMP handshake - two messages plus optional third + +**Message 1 (router to client)**: `SMPServerHandshake` contains: +- `smpVersionRange` - negotiable version range (uses ALPN to select current vs legacy range) +- `sessionId` - TLS-unique channel binding +- `authPubKey` - `CertChainPubKey`: certificate chain plus X25519 public key signed with the certificate's signing key (v7+) + +**Message 2 (client to router)**: `SMPClientHandshake` contains: +- `smpVersion` - agreed maximum version from intersection +- `keyHash` - SHA256 of router's root CA certificate (identity verification) +- `authPubKey` - client's X25519 public key for DH agreement (v7+) +- `proxyServer` - boolean flag to disable transport block encryption (v14+) +- `clientService` - service credentials with `serviceRole` and `serviceCertKey` (v16+) + +**Message 3 (router to client, conditional)**: Sent only when `clientService` is present. The router verifies the TLS peer certificate matches the handshake certificate chain, extracts the fingerprint, creates or retrieves a `ServiceId`, and returns `SMPServerHandshakeResponse {serviceId}` (or `SMPServerHandshakeError` on failure). + +### NTF handshake - simplified two messages + +The NTF handshake follows the same server-first pattern but is simpler: + +| Difference | SMP | NTF | +|-----------|-----|-----| +| Block size | 16384 bytes | 512 bytes | +| Client auth key | X25519 DH public key | None | +| Service certificates | v16+ | Not supported | +| Block encryption | v11+ | Not supported | +| Batching | v4+ | v2+ | +| Version range | v6 - v19 | v1 - v3 | + +`NtfServerHandshake` sends version range, sessionId, and signed X25519 key (present at v2+, absent at v1). `NtfClientHandshake` returns only version and keyHash. No client public key exchange, no service certificates, no block encryption. + +### XFTP handshake - HTTP/2 based + +XFTP does not use the block-based TLS handshake at all. It uses HTTP/2 POST with ALPN `"xftp/1"`. The client sends `XFTPClientHello` (optional 32-byte web challenge for identity proof); the server responds with `XFTPServerHandshake` containing a signed challenge response and `CertChainPubKey`. Block size is 16384 bytes (same as SMP). + +### Block encryption setup (SMP only, v11+) + +After the handshake DH agreement, both sides compute a shared `DhSecretX25519`. `blockEncryption` derives chain keys via `sbcInit`: + +``` +sbcInit sessionId dhSecret + -> HKDF-SHA512(salt=sessionId, ikm=dhSecret, info="SimpleXSbChainInit", len=64) + -> split into (sndChainKey, rcvChainKey) +``` + +Each block encryption advances the chain key: +``` +sbcHkdf chainKey + -> HKDF-SHA512(salt="", ikm=chainKey, info="SimpleXSbChain", len=88) + -> split into (newChainKey[32], aesKey[32], nonce[24]) +``` + +This provides per-block forward secrecy - each block uses a different key, and old keys cannot be derived from new ones. The client swaps send/receive keys (its send key = server's receive key). + +Block encryption is disabled when `proxyServer == True` (proxy connections already have their own encryption layer) and when the version is below v11. + +--- + +## Block framing + +**Source**: [Transport.hs](../../src/Simplex/Messaging/Transport.hs), [Protocol.hs](../../src/Simplex/Messaging/Protocol.hs) + +### Block sizes + +| Protocol | Block size | Effective payload | +|----------|-----------|-------------------| +| SMP | 16384 bytes | 16363 (single in batch) or 16382 (unbatched) | +| NTF | 512 bytes | 491 (single in batch) or 510 (unbatched) | +| XFTP | 16384 bytes | Same as SMP | + +Batch overhead: 2 (pad) + 1 (count byte) + 16 (auth tag) + 2 (`Large` Word16 prefix per transmission) = 21 bytes for a single-item batch. + +### Reading and writing blocks + +`tPutBlock` pads the message to exactly `blockSize` bytes: +- **Without block encryption**: `C.pad` writes a 2-byte big-endian length prefix, then the message, then `'#'` characters to fill the block. +- **With block encryption** (v11+): `sbEncrypt` with the chain-derived key and nonce. The available payload is reduced by 16 bytes (Poly1305 auth tag). + +`tGetBlock` reads exactly `blockSize` bytes and reverses the process. If the received data is not exactly `blockSize` bytes, an EOF error is raised. + +### Batch format + +When `batch` is enabled (SMP v4+, NTF v2+), multiple transmissions are packed into a single block: + +1. One byte: transmission count (1-255) +2. Each transmission wrapped in `Large` encoding (fixed 2-byte Word16 length prefix + content) +3. Total size of all `Large`-encoded transmissions must fit in `blockSize - 19` bytes (2 pad + 1 count + 16 auth tag) + +`batchTransmissions_` packs transmissions left-to-right into batches. When the next transmission would exceed the remaining space (or the count reaches 255), a new batch starts. Transmissions that individually exceed the batch limit produce a `TBError TELargeMsg`. + +`tPut` encodes a list of transmissions into batches via `batchTransmissions`, then writes each batch as a separate block via `tPutBlock`. Results are collected per-transmission, not per-block. + +--- + +## Transmission encoding and signing + +**Source**: [Protocol.hs](../../src/Simplex/Messaging/Protocol.hs), [Client.hs](../../src/Simplex/Messaging/Client.hs) + +### Wire format + +`encodeTransmission_` produces the core transmission bytes: + +``` +corrId || entityId || encodedCommand +``` + +- `corrId`: variable-length correlation ID (empty for server-initiated pushes) +- `entityId`: queue/entity identifier +- `encodedCommand`: protocol-specific command encoding + +### Session ID handling (`implySessId`) + +For v7+ (`authCmdsSMPVersion`), `implySessId` is `True`. This affects how `sessionId` is used: + +- **`tForAuth`** (what gets signed): always includes `sessionId` prefix +- **`tToSend`** (what goes on the wire): excludes `sessionId` when `implySessId == True` + +This saves bandwidth - the session ID is implicit (both sides know it from the TLS handshake) but still covered by the signature, preventing session fixation attacks. + +`tForAuth` is lazy (uses `~ByteString`) to avoid computing the signed representation when no signing key is present. + +### Dual signature scheme + +`authTransmission` produces `TAuthorizations` - a tuple of entity auth plus optional service signature: + +**Entity auth** (always present when key provided): +- X25519 keys: `C.cbAuthenticate` using the server's public key, the per-queue private key, correlation nonce, and the signing content (see below) +- Ed25519/Ed448 keys: standard signature over the signing content + +**Service auth** (v16+, when `serviceAuth == True` and `clientService` exists): +- The signing content becomes `serviceCertHash || tForAuth` (instead of plain `tForAuth`) - binding the service identity to the queue operation, preventing MITM service substitution within TLS +- Service session key additionally signs over `tForAuth` alone + +Without active service auth, the signing content is `tForAuth` directly. + +The dual signature ensures that even within a TLS session, an attacker cannot substitute a different service certificate without invalidating the entity key signature. + +--- + +## Version negotiation + +**Source**: [Version.hs](../../src/Simplex/Messaging/Version.hs), [Transport.hs](../../src/Simplex/Messaging/Transport.hs) + +### Range intersection + +`VersionRange` is an inclusive `(min, max)` pair with nominal typing per protocol (SMP, NTF, XFTP use distinct phantom types via `VersionScope`). + +`compatibleVRange` computes the intersection of two ranges: `max(min1, min2)` to `min(max1, max2)`. Returns `Nothing` if the intersection is empty (no compatible version exists). The agreed version is the maximum of the intersection range. + +`compatibleVRange'` caps a range by a single version (used when the peer advertises a specific maximum rather than a range). + +### Version-gated features + +Feature availability is controlled by version constants. Key SMP version gates: + +| Version | Feature | +|---------|---------| +| v4 | Command batching | +| v7 | Authenticated encryption, implied session ID | +| v9 | SKEY for faster sender handshake | +| v11 | Block encryption with forward secrecy | +| v14 | `proxyServer` handshake property | +| v16 | Service certificates | +| v19 | Service subscriptions (SUBS/NSUBS) | + +### Anti-fingerprinting version cap + +`proxiedSMPRelayVersion` (v18) is the maximum version an SMP proxy advertises to destination routers. The proxy's actual version may be higher (currently v19), but by capping the proxied connection, clients behind the proxy cannot be fingerprinted by the destination router based on their SMP version. All proxied clients appear as v18 or below. + +### Proxy version downgrade logic + +When `smpClientHandshake` detects it is acting as a proxy (`proxyServer == True`) and the destination router's maximum version is below v14 (`proxyServerHandshakeSMPVersion`), it caps the negotiated range at v10 (`deletedEventSMPVersion`). This disables transport block encryption between proxy and relay - transport encryption at v11 would increase message size, breaking clients at v10 or earlier. + +--- + +## Connection management + +**Source**: [Client.hs](../../src/Simplex/Messaging/Client.hs), [Transport/KeepAlive.hs](../../src/Simplex/Messaging/Transport/KeepAlive.hs) + +### Four concurrent threads + +Each protocol client connection runs four concurrent threads via `raceAny_` - if any thread exits, all are cancelled and the disconnect handler fires: + +**send**: reads `(Maybe Request, ByteString)` tuples from `sndQ` (bounded `TBQueue`). For requests with a `responseVar`, checks the `pending` flag before sending (a cancelled request is silently skipped). Transport errors on write are delivered to the waiting `responseVar`. + +**receive**: calls `tGetClient` in a loop to read and parse blocks. Updates `lastReceived` timestamp and resets `timeoutErrorCount` to 0 on each successful read. + +**process**: reads parsed transmissions from `rcvQ` and classifies each by correlation ID: +- Empty corrId: server-initiated push - forwarded to `msgQ` as `STEvent` (any response with empty corrId is classified this way; typical types are MSG, END, DELD, ENDS) +- Matching pending command: response - delivered to the command's `responseVar` +- No matching command: forwarded to `msgQ` as `STUnexpectedError` + +**monitor** (optional, disabled when `smpPingInterval == 0`): sends application-level PING when the connection is idle for `smpPingInterval` (default 600 seconds / 10 minutes), but only after `sendPings` is explicitly enabled by the caller. Tracks consecutive timeout errors via `timeoutErrorCount`. Drops the client after `smpPingCount` (default 3) consecutive timeouts, but only if at least 15 minutes have passed since the last received response (recovery window). + +### TCP keep-alive + +`defaultKeepAliveOpts` configures OS-level TCP keep-alive probes: + +| Parameter | Value | Socket option | +|-----------|-------|---------------| +| `keepIdle` | 30 seconds | TCP_KEEPIDLE (Linux) / TCP_KEEPALIVE (macOS) | +| `keepIntvl` | 15 seconds | TCP_KEEPINTVL | +| `keepCnt` | 4 probes | TCP_KEEPCNT | + +TCP keep-alive detects dead connections at the OS level. The application-level PING/PONG provides a higher-level liveness check that also validates the protocol layer. + +### Disconnect and teardown + +All four threads run inside `raceAny_` with `E.finally disconnected`. When any thread exits (network error, timeout, or protocol error), the `finally` handler: + +1. Fires the `disconnected` callback provided by the caller (e.g., `smpClientDisconnected` in the agent) +2. The agent callback demotes subscriptions, fires DOWN events, and initiates resubscription + +The `connected` TVar is set to `True` after the handshake succeeds and before the threads start. Note: in the protocol client, this TVar is not reset on disconnect - disconnect detection relies on thread cancellation via `raceAny_` and the `disconnected` callback, not STM re-evaluation. (The server-side `Client` type has a separate `connected` TVar that is reset in `clientDisconnected`.) From 31158ab02e653eeda9dc50f1b14aacf34e5c75aa Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 09:21:01 +0000 Subject: [PATCH 53/61] update --- spec/topics/transport.md | 67 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 5 deletions(-) diff --git a/spec/topics/transport.md b/spec/topics/transport.md index 2c3de1cf1..2cce46f77 100644 --- a/spec/topics/transport.md +++ b/spec/topics/transport.md @@ -42,6 +42,25 @@ For CA + leaf chains (used by routers), a root CA certificate signs a leaf certi Both sides derive `sessionId` from the TLS-unique channel binding value (RFC 5929). The server reads `T.getPeerFinished`; the client reads `T.getFinished`. This `sessionId` is used throughout the session - in handshake validation, transmission signing, and block encryption key derivation. +### Certificate chain semantics + +**Source**: [Transport/Shared.hs](../../src/Simplex/Messaging/Transport/Shared.hs) + +Routers use variable-length certificate chains. The `chainIdCaCerts` function extracts the identity certificate (`idCert`) based on chain length: + +| Chain length | Structure | Identity certificate | +|--------------|-----------|---------------------| +| 0 | `[]` | Rejected as CCEmpty | +| 1 | `[cert]` | Self-signed: `idCert = cert` | +| 2 | `[leaf, ca]` | Current online/offline pattern: `idCert = ca` | +| 3 | `[leaf, id, ca]` | With operator certificate: `idCert = id` (second) | +| 4 | `[leaf, id, net, ca]` | With network certificate: `idCert = id` (second, network cert ignored) | +| 5+ | - | Rejected as CCLong | + +The **router identity** is always determined by `idCert` - its SHA256 fingerprint is compared against the `keyHash` the client expects. For 2-cert chains (the common case), `idCert` equals the CA. For 3+ cert chains, `idCert` is always the **second certificate** (index 1). + +The client verifies the router identity by computing `XV.getFingerprint idCert X.HashSHA256` and comparing against the expected `keyHash`. This allows operators to rotate leaf certificates without changing the router's public identity. + --- ## Handshake protocol family @@ -59,7 +78,7 @@ All three protocols (SMP, NTF, XFTP) use the same TLS transport, but their appli **Message 2 (client to router)**: `SMPClientHandshake` contains: - `smpVersion` - agreed maximum version from intersection -- `keyHash` - SHA256 of router's root CA certificate (identity verification) +- `keyHash` - SHA256 of router's identity certificate (`idCert`, see certificate chain semantics above) - `authPubKey` - client's X25519 public key for DH agreement (v7+) - `proxyServer` - boolean flag to disable transport block encryption (v14+) - `clientService` - service credentials with `serviceRole` and `serviceCertKey` (v16+) @@ -73,7 +92,7 @@ The NTF handshake follows the same server-first pattern but is simpler: | Difference | SMP | NTF | |-----------|-----|-----| | Block size | 16384 bytes | 512 bytes | -| Client auth key | X25519 DH public key | None | +| Client auth key | X25519 DH public key | None (server sends key, client does not) | | Service certificates | v16+ | Not supported | | Block encryption | v11+ | Not supported | | Batching | v4+ | v2+ | @@ -83,7 +102,43 @@ The NTF handshake follows the same server-first pattern but is simpler: ### XFTP handshake - HTTP/2 based -XFTP does not use the block-based TLS handshake at all. It uses HTTP/2 POST with ALPN `"xftp/1"`. The client sends `XFTPClientHello` (optional 32-byte web challenge for identity proof); the server responds with `XFTPServerHandshake` containing a signed challenge response and `CertChainPubKey`. Block size is 16384 bytes (same as SMP). +**Source**: [FileTransfer/Transport.hs](../../src/Simplex/FileTransfer/Transport.hs), [FileTransfer/Server.hs](../../src/Simplex/FileTransfer/Server.hs), [FileTransfer/Client.hs](../../src/Simplex/FileTransfer/Client.hs) + +XFTP does not use the block-based TLS handshake. It uses HTTP/2 POST with ALPN `"xftp/1"`. The handshake has two flows depending on client type. + +**Native client handshake** (standard two-step): + +1. Client sends POST with no body, server responds with `XFTPServerHandshake`: + - `xftpVersionRange` - negotiable version range + - `sessionId` - TLS-unique channel binding + - `authPubKey` - `CertChainPubKey` (always required, non-optional) + - `webIdentityProof` - absent for native clients + +2. Client sends POST with `XFTPClientHandshake`: + - `xftpVersion` - agreed version + - `keyHash` - SHA256 of router's identity certificate + +3. Server validates keyHash against `idCert` fingerprint (currently expects exactly 2-cert chain: `[leaf, ca]` where `ca` is identity) + +**Web client handshake** (three-step with identity proof): + +Web browsers cannot access the TLS certificate chain for verification. The web handshake adds a challenge-response mechanism: + +1. Client sends POST with `xftp-web-hello: 1` header and `XFTPClientHello`: + - `webChallenge` - optional 32-byte random challenge + +2. Server responds with `XFTPServerHandshake`: + - `webIdentityProof` - signature over `(webChallenge || sessionId)` using the router's signing key + +3. Client verifies `webIdentityProof` using the public key from `authPubKey`, confirming server identity without needing TLS certificate access + +4. Client sends POST with `xftp-handshake: 1` header and `XFTPClientHandshake` (same as native step 2) + +The server tracks handshake state per `sessionId` in a `TMap SessionId Handshake`: +- `HandshakeSent pk` - hello received, awaiting client handshake +- `HandshakeAccepted thParams` - handshake complete, ready for commands + +Web hello can be re-sent at any state (server reuses existing X25519 key if already generated). Block size is 16384 bytes (same as SMP). ### Block encryption setup (SMP only, v11+) @@ -99,12 +154,14 @@ Each block encryption advances the chain key: ``` sbcHkdf chainKey -> HKDF-SHA512(salt="", ikm=chainKey, info="SimpleXSbChain", len=88) - -> split into (newChainKey[32], aesKey[32], nonce[24]) + -> split into (newChainKey[32], secretBoxKey[32], nonce[24]) ``` +The keys are used with XSalsa20-Poly1305 (NaCl secret_box), not AES. + This provides per-block forward secrecy - each block uses a different key, and old keys cannot be derived from new ones. The client swaps send/receive keys (its send key = server's receive key). -Block encryption is disabled when `proxyServer == True` (proxy connections already have their own encryption layer) and when the version is below v11. +Block encryption is disabled when `proxyServer == True` (proxy connections already have their own encryption layer), when the version is below v11, or when no DH session secret is available (no `thAuth` or missing `sessSecret`). --- From 73d12aad8a331794f9bd6452c7ce08bc7ca45ed0 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 10:22:24 +0000 Subject: [PATCH 54/61] patterns --- spec/topics/patterns.md | 337 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 337 insertions(+) create mode 100644 spec/topics/patterns.md diff --git a/spec/topics/patterns.md b/spec/topics/patterns.md new file mode 100644 index 000000000..a5d61500e --- /dev/null +++ b/spec/topics/patterns.md @@ -0,0 +1,337 @@ +# Code Patterns + +Cross-cutting patterns used throughout the codebase: exception handling, encoding utilities, compression, concurrent data structures, and batch processing. These patterns provide consistency, type safety, and correctness guarantees across all modules. + +For protocol-specific encoding details, see [transport.md](transport.md). For cryptographic operations, see the inline documentation in [Crypto.hs](../../src/Simplex/Messaging/Crypto.hs). + +- [Exception handling](#exception-handling) +- [Binary encoding](#binary-encoding) +- [String encoding](#string-encoding) +- [Compression](#compression) +- [Concurrent data structures](#concurrent-data-structures) +- [Batch processing](#batch-processing) + +--- + +## Exception handling + +**Source**: [Agent/Protocol.hs](../../src/Simplex/Messaging/Agent/Protocol.hs), [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs), [Agent/Store/SQLite.hs](../../src/Simplex/Messaging/Agent/Store/SQLite.hs) + +### Error type hierarchy + +The codebase uses a hierarchical error type structure: + +**`AgentErrorType`** - top-level error type for agent client responses: +- `CMD` - command/response errors with context string +- `CONN` - connection errors with context (NOT_FOUND, DUPLICATE, SIMPLEX) +- `SMP`/`NTF`/`XFTP` - protocol-specific errors with server address +- `BROKER` - transport-level broker errors +- `AGENT` - internal agent errors (A_DUPLICATE, A_PROHIBITED) +- `INTERNAL` - implementation bugs (should never occur in production) +- `CRITICAL` - critical errors with optional restart offer + +**`StoreError`** - database/storage layer errors: +- `SEInternal` - IO exceptions during database operations +- `SEDatabaseBusy` - database locked/busy (triggers CRITICAL with restart) +- `SEConnNotFound`/`SEUserNotFound` - entity lookup failures +- `SEBadConnType` - wrong connection type for operation + +**`AgentCryptoError`** - cryptographic failures: +- `DECRYPT_AES`/`DECRYPT_CB` - decryption failures +- `RATCHET_HEADER`/`RATCHET_EARLIER Word32`/`RATCHET_SKIPPED Word32`/`RATCHET_SYNC` - double ratchet state issues + +### Monad stack + +``` +AM a = ExceptT AgentErrorType (ReaderT Env IO) a -- full error handling +AM' a = ReaderT Env IO a -- no error handling (for batch ops) +``` + +### Store access patterns + +**Basic operations** lift IO actions into the AM monad: + +```haskell +withStore :: AgentClient -> (DB.Connection -> IO (Either StoreError a)) -> AM a +withStore' :: AgentClient -> (DB.Connection -> IO a) -> AM a -- wraps result in Right +``` + +Both wrap the action in a database transaction and convert `StoreError` to `AgentErrorType` via `storeError`. + +**Error mapping** (key cases from `storeError`): +- `SEConnNotFound`/`SERatchetNotFound` -> `CONN NOT_FOUND` +- `SEConnDuplicate` -> `CONN DUPLICATE` +- `SEBadConnType` -> `CONN SIMPLEX` with context +- `SEUserNotFound` -> `NO_USER` +- `SEAgentError e` -> `e` (propagates wrapped error) +- `SEDatabaseBusy` -> `CRITICAL True` (offers restart) +- Other errors -> `INTERNAL` with error message + +### Error recovery patterns + +**tryError** - attempt an operation, handle failure without throwing: +```haskell +tryError (deleteQueue c NRMBackground rq') >>= \case + Left e -> logError e >> continue + Right () -> success +``` + +**catchAllErrors** - catch errors and run cleanup: +```haskell +getQueueMessage c rq `catchAllErrors` \e -> + atomically (releaseGetLock c rq) >> throwError e +``` + +**catchAll_** - catch all exceptions, return default on failure: +```haskell +notices <- liftIO $ withTransaction store (`getClientNotices` servers) `catchAll_` pure [] +``` + +--- + +## Binary encoding + +**Source**: [Encoding.hs](../../src/Simplex/Messaging/Encoding.hs) + +### Encoding typeclass + +```haskell +class Encoding a where + smpEncode :: a -> ByteString -- encode to binary + smpP :: Parser a -- attoparsec parser + smpDecode :: ByteString -> Either String a -- default via parseAll smpP +``` + +### Primitive encoding + +| Type | Wire format | +|------|-------------| +| `Char` | Single byte | +| `Bool` | `'T'` or `'F'` | +| `Word16` | 2-byte big-endian | +| `Word32` | 4-byte big-endian | +| `Int64` | Two `Word32`s combined | +| `ByteString` | 1-byte length prefix + data (max 255 bytes) | +| `Maybe a` | `'0'` (Nothing) or `'1'` + encoded value | +| `(a, b)` | Concatenated encodings (no separator) | + +### Special wrappers + +**`Tail`** - takes remaining bytes without length prefix: +```haskell +newtype Tail = Tail {unTail :: ByteString} +-- smpEncode = unTail (no prefix) +-- smpP = takeByteString +``` + +**`Large`** - for ByteStrings > 255 bytes: +```haskell +newtype Large = Large {unLarge :: ByteString} +-- smpEncode = Word16 length prefix + data +-- smpP = read Word16, take that many bytes +``` + +### List encoding + +```haskell +smpEncodeList :: Encoding a => [a] -> ByteString +-- 1-byte count prefix + concatenated encoded items (max 255 items) + +instance Encoding (NonEmpty a) +-- Same format, fails on empty input during parsing +``` + +--- + +## String encoding + +**Source**: [Encoding/String.hs](../../src/Simplex/Messaging/Encoding/String.hs) + +### StrEncoding typeclass + +```haskell +class StrEncoding a where + strEncode :: a -> ByteString -- human-readable encoding + strP :: Parser a -- parser (defaults to base64url) + strDecode :: ByteString -> Either String a +``` + +Used for addresses, keys, and values displayed to users or in URIs. + +### Base64 URL encoding + +`ByteString` instances use base64url encoding (RFC 4648): +- Alphabet: A-Z, a-z, 0-9, `-`, `_` +- No padding by default in output +- Accepts optional `=` padding on input + +### Tuple and list encoding + +**Tuples** use space separation (via `B.unwords`): +```haskell +strEncode (a, b) = B.unwords [strEncode a, strEncode b] +``` + +**Lists** use comma separation: +```haskell +strEncodeList :: StrEncoding a => [a] -> ByteString +strEncodeList = B.intercalate "," . map strEncode +``` + +### Numeric types + +`Int`, `Word16`, `Word32`, `Int64` encode as decimal strings (not binary). + +### JSON conversion utilities + +```haskell +strToJSON :: StrEncoding a => a -> J.Value +strParseJSON :: StrEncoding a => String -> J.Value -> JT.Parser a +``` + +Convert between `StrEncoding` and JSON string values for API serialization. + +--- + +## Compression + +**Source**: [Compression.hs](../../src/Simplex/Messaging/Compression.hs) + +### Algorithm and thresholds + +Uses Zstandard (zstd) compression at level 3 (moderate compression/speed tradeoff). + +```haskell +maxLengthPassthrough :: Int +maxLengthPassthrough = 180 -- messages <= 180 bytes are not compressed +``` + +### Wire format + +```haskell +data Compressed + = Passthrough ByteString -- tag '0' + 1-byte length + data + | Compressed Large -- tag '1' + 2-byte length + zstd data +``` + +### Decompression bomb protection + +`decompress1` requires the compressed data to declare its decompressed size upfront: + +```haskell +decompress1 :: Int -> Compressed -> Either String ByteString +``` + +The function checks `Z1.decompressedSize` before decompressing. If the declared size exceeds the `limit` parameter (or is not specified), decompression is rejected. This prevents zip-bomb attacks where a small compressed payload would expand to exhaust memory. + +Zstd's `decompress` can return `Error`, `Skip` (empty result), or `Decompress bs'` - all cases are handled explicitly. + +--- + +## Concurrent data structures + +**Source**: [TMap.hs](../../src/Simplex/Messaging/TMap.hs) + +### TMap + +A `TVar`-wrapped immutable `Data.Map`, providing atomic read-modify-write operations via STM: + +```haskell +type TMap k a = TVar (Map k a) +``` + +### STM operations (atomic) + +| Operation | Description | +|-----------|-------------| +| `lookup k m` | Read value for key | +| `member k m` | Check key existence | +| `insert k v m` | Insert/update value | +| `delete k m` | Remove key | +| `lookupInsert k v m` | Atomic lookup-then-insert, returns old value | +| `lookupDelete k m` | Atomic lookup-then-delete, returns deleted value | + +### IO operations (non-transactional) + +```haskell +lookupIO :: Ord k => k -> TMap k a -> IO (Maybe a) +memberIO :: Ord k => k -> TMap k a -> IO Bool +``` + +These bypass STM for read-only access when atomicity with other operations is not needed. + +### Usage pattern + +```haskell +-- Within STM transaction (atomic with other STM ops) +atomically $ do + existing <- TM.lookup key map + case existing of + Nothing -> TM.insert key newValue map + Just _ -> pure () + +-- Outside transaction (simple read) +value <- TM.lookupIO key map +``` + +--- + +## Batch processing + +**Source**: [Agent/Client.hs](../../src/Simplex/Messaging/Agent/Client.hs) + +### withStoreBatch + +Executes multiple database operations in a single transaction: + +```haskell +withStoreBatch :: Traversable t + => AgentClient + -> (DB.Connection -> t (IO (Either AgentErrorType a))) + -> AM' (t (Either AgentErrorType a)) +``` + +All operations run within one database transaction, ensuring: +- **Atomicity**: All operations succeed or all fail together +- **Isolation**: No partial updates visible to other readers +- **Efficiency**: Single transaction overhead instead of per-operation + +### Result semantics + +Each batched operation produces an individual `Either AgentErrorType a`: +- Partial success is possible (some `Right`, some `Left`) +- If the transaction itself fails, all results become errors +- Fine-grained error handling per operation + +### Common patterns + +**Store multiple items**: +```haskell +void $ withStoreBatch' c $ \db -> + map (storeDelivery db) deliveries +``` + +**Fetch multiple items**: +```haskell +results <- withStoreBatch c $ \db -> + map (getConnection db) connIds +``` + +**Update multiple items**: +```haskell +void $ withStoreBatch' c $ \db -> + map (\connId -> setConnPQSupport db connId PQSupportOn) connIds +``` + +### withStoreBatch' + +Convenience variant that wraps results in `Right`: + +```haskell +withStoreBatch' :: Traversable t + => AgentClient + -> (DB.Connection -> t (IO a)) + -> AM' (t (Either AgentErrorType a)) +``` + +Use when operations cannot fail (or failures should become `INTERNAL` errors). From c0698817d173122e8353af49278fa53a6232d5ea Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 10:54:12 +0000 Subject: [PATCH 55/61] xftp topic --- spec/topics/xftp.md | 394 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 394 insertions(+) create mode 100644 spec/topics/xftp.md diff --git a/spec/topics/xftp.md b/spec/topics/xftp.md new file mode 100644 index 000000000..7efd88495 --- /dev/null +++ b/spec/topics/xftp.md @@ -0,0 +1,394 @@ +# XFTP + +File transfer protocol for large files: router storage architecture, protocol commands, agent upload/download pipelines, chunk management, and encryption. XFTP enables secure file sharing by splitting files into encrypted chunks stored across multiple routers. + +For XFTP transport handshake details, see [transport.md](transport.md). For the XFTP protocol specification, see [xftp.md](../../protocol/xftp.md). + +- [Protocol overview](#protocol-overview) +- [Router storage](#router-storage) +- [Protocol commands](#protocol-commands) +- [Agent upload pipeline](#agent-upload-pipeline) +- [Agent download pipeline](#agent-download-pipeline) +- [Chunk encryption](#chunk-encryption) +- [Chunk management](#chunk-management) + +--- + +## Protocol overview + +**Source**: [FileTransfer/Protocol.hs](../../src/Simplex/FileTransfer/Protocol.hs) + +XFTP separates file metadata from file content. A sender uploads encrypted chunks to one or more routers, then shares a file description (containing chunk locations, keys, and digests) with recipients via SMP messaging. + +Key properties: +- File encrypted as a single stream with XSalsa20-Poly1305, then split into chunks +- Chunks are byte ranges of the encrypted file (not independently encrypted) +- Chunks can be replicated across multiple routers +- Recipients download chunks directly from routers +- Router never sees plaintext or file metadata + +### Parties + +| Party | Role | Authentication | +|-------|------|----------------| +| Sender | Creates file, uploads chunks, manages recipients | Per-file sender key | +| Recipient | Downloads chunks, acknowledges receipt | Per-recipient key (created by sender) | + +### File description + +The sender generates a `ValidFileDescription` containing: +- Chunk specifications: server address, recipient ID, recipient key, size, digest +- Encryption key and nonce for the full file +- File size and SHA-512 digest +- Optional redirect to another file description + +--- + +## Router storage + +**Source**: [FileTransfer/Server/Store.hs](../../src/Simplex/FileTransfer/Server/Store.hs) + +### In-memory store + +```haskell +data FileStore = FileStore + { files :: TMap SenderId FileRec, + recipients :: TMap RecipientId (SenderId, RcvPublicAuthKey), + usedStorage :: TVar Int64 + } +``` + +- `files` maps sender IDs to file records +- `recipients` maps recipient IDs to (sender, auth key) for download authorization +- `usedStorage` tracks total bytes for quota enforcement + +### File record + +```haskell +data FileRec = FileRec + { senderId :: SenderId, + fileInfo :: FileInfo, -- sndKey, size, digest + filePath :: TVar (Maybe FilePath), -- set after upload + recipientIds :: TVar (Set RecipientId), + createdAt :: RoundedFileTime, -- truncated to 1-hour precision + fileStatus :: TVar ServerEntityStatus + } +``` + +The `filePath` is `Nothing` until FPUT completes. The file is stored at `filesPath/`. + +### Quota management + +File size is reserved atomically when FNEW is processed. If `usedStorage + fileSize > fileSizeQuota`, the request is rejected with QUOTA error. Storage is released when files are deleted or expire. + +### File expiration + +Files expire based on `ttl` configuration (default 48 hours). The expiration thread periodically scans files where `createdAt + fileTimePrecision < threshold`. Expired files are deleted from disk and removed from the store. + +`fileTimePrecision` is 3600 seconds (1 hour), providing k-anonymity for file creation times. + +--- + +## Protocol commands + +**Source**: [FileTransfer/Protocol.hs](../../src/Simplex/FileTransfer/Protocol.hs), [FileTransfer/Server.hs](../../src/Simplex/FileTransfer/Server.hs) + +### Command summary + +| Command | Party | Purpose | +|---------|-------|---------| +| FNEW | Sender | Create file with metadata and initial recipient keys | +| FADD | Sender | Add recipient auth keys to existing file | +| FPUT | Sender | Upload encrypted chunk data | +| FDEL | Sender | Delete file from router | +| FGET | Recipient | Download file (initiates DH key exchange) | +| FACK | Recipient | Acknowledge download, remove recipient from file | +| PING | Recipient | Keepalive | + +### FNEW - create file + +Request: `FNEW FileInfo (NonEmpty RcvPublicAuthKey) (Maybe BasicAuth)` + +- `FileInfo`: sender's auth key, file size (Word32), SHA-512 digest +- Recipient keys: one per intended recipient +- Optional basic auth for servers requiring authorization + +Response: `FRSndIds SenderId (NonEmpty RecipientId)` + +The router generates random sender ID and recipient IDs. The sender uses `SenderId` for subsequent commands; recipients receive their `RecipientId` via file description. + +### FPUT - upload chunk + +Request: `FPUT` with chunk data in HTTP/2 body + +The router: +1. Validates sender authorization +2. Reserves storage quota +3. Receives encrypted chunk with timeout +4. Writes to `filesPath/` +5. Updates `filePath` in file record + +If the file already has a `filePath` (re-upload), the body is discarded and `FROk` returned immediately. + +### FGET - download chunk + +Request: `FGET RcvPublicDhKey` + +The recipient provides an ephemeral X25519 public key for DH agreement. + +Response: `FRFile SrvPublicDhKey C.CbNonce` (server's ephemeral DH key and nonce) + +The router: +1. Generates ephemeral DH key pair +2. Computes shared secret: `dh'(recipientDhKey, serverPrivKey)` +3. Initializes encryption state with shared secret and nonce +4. Streams encrypted file in HTTP/2 response body + +The recipient uses the returned server DH key and nonce to decrypt the stream. + +### FACK - acknowledge receipt + +Request: `FACK` + +Removes the recipient from the file's recipient set. Once all recipients have acknowledged, only the sender can access the file (until FDEL or expiration). + +### FDEL - delete file + +Request: `FDEL` + +Deletes the file from disk and store, releases quota. All recipient IDs become invalid. + +--- + +## Agent upload pipeline + +**Source**: [FileTransfer/Agent.hs](../../src/Simplex/FileTransfer/Agent.hs), [FileTransfer/Chunks.hs](../../src/Simplex/FileTransfer/Chunks.hs) + +### Upload state machine + +``` +SFSNew -> SFSEncrypting -> SFSEncrypted -> SFSUploading -> SFSComplete + \-> SFSError +``` + +### Phase 1: File preparation (SFSNew -> SFSEncrypted) + +`prepareFile` encrypts the source file: + +1. Generate random `SbKey` and `CbNonce` +2. Create encrypted file structure: + - 8 bytes: encoded content length + - FileHeader: filename and optional metadata (SMP-encoded) + - File content: encrypted in 64KB streaming chunks + - Padding: `'#'` characters to multiple of 16384 bytes + - Auth tag: 16 bytes (Poly1305) +3. Compute SHA-512 digest of encrypted file +4. Calculate chunk boundaries via `prepareChunkSizes` + +### Chunk size selection + +`prepareChunkSizes` selects chunk sizes based on total file size: + +| File size | Chunk size used | +|-----------|-----------------| +| > 3/4 of 4MB (~3.0MB) | 4MB chunks | +| > 3/4 of 1MB (768KB) | 1MB chunks | +| Otherwise | 64KB or 256KB | + +The last chunk may be smaller than the standard size. + +### Phase 2: Chunk registration + +For each chunk: +1. Select XFTP server (different server per chunk recommended) +2. Send FNEW with chunk's digest and recipient keys +3. Store `SndFileChunkReplica` with server-assigned IDs +4. Status: `SFRSCreated` + +### Phase 3: Upload (SFSUploading -> SFSComplete) + +`uploadFileChunk` for each replica: +1. If not all recipients added: send FADD +2. Read chunk from encrypted file at (offset, size) +3. Send FPUT with chunk data +4. Update replica status to `SFRSUploaded` +5. Report progress to agent client + +When all chunks uploaded: mark file `SFSComplete`, generate file description. + +### Error handling + +- Retry with exponential backoff per `reconnectInterval` +- Track consecutive retries per replica +- After `xftpConsecutiveRetries` failures: mark `SFSError` +- Delay and retry count stored in DB for resumption + +--- + +## Agent download pipeline + +**Source**: [FileTransfer/Agent.hs](../../src/Simplex/FileTransfer/Agent.hs) + +### Download state machine + +``` +RFSReceiving -> RFSReceived -> RFSDecrypting -> RFSComplete + \-> RFSError +``` + +### Phase 1: Chunk download (RFSReceiving -> RFSReceived) + +`downloadFileChunk` for each chunk: +1. Verify server is in approved relays (if relay approval required) +2. Generate ephemeral DH key pair +3. Send FGET with public DH key +4. Receive `FRFile` with server's DH key and nonce +5. Compute shared secret, initialize decryption +6. Stream-decrypt chunk to `tmpPath/chunkNo` +7. Verify chunk's SHA-256 digest matches specification +8. Mark replica as `received` + +Replicas are tried in order; if first fails, try next replica of same chunk. + +### Phase 2: Reassembly (RFSReceived -> RFSComplete) + +`decryptFile` reassembles and decrypts: +1. Concatenate all chunk files in order +2. Validate total size matches file digest +3. Decrypt with file's `SbKey` and `CbNonce`: + - Parse length prefix and FileHeader + - Stream-decrypt content + - Verify auth tag +4. Write to final destination (`savePath`) +5. Delete temporary chunk files +6. Mark `RFSComplete` + +### Redirect files + +If the file description has a `redirect` field: +1. Decrypt the downloaded content +2. Parse as YAML file description +3. Validate size/digest match redirect specification +4. Register actual chunks from redirect description +5. Download from redirected sources + +This enables indirection for large file descriptions or server migration. + +--- + +## Chunk encryption + +**Source**: [FileTransfer/Crypto.hs](../../src/Simplex/FileTransfer/Crypto.hs), [Messaging/Crypto/File.hs](../../src/Simplex/Messaging/Crypto/File.hs) + +### File encryption (sender side) + +``` +[8-byte length][FileHeader][file content][padding][16-byte auth tag] +``` + +- Algorithm: XSalsa20-Poly1305 (NaCl secret_box) +- Key: random 32-byte `SbKey` +- Nonce: random 24-byte `CbNonce` +- Streaming: 64KB chunks encrypted incrementally +- Padding: `'#'` characters to align to 16384-byte boundary + +### Chunk transport encryption (FGET) + +Each FGET establishes a fresh DH shared secret: +1. Recipient generates ephemeral X25519 key pair +2. Sends public key in FGET request +3. Router generates ephemeral key pair +4. Both compute: `secret = dh(peerPubKey, ownPrivKey)` +5. Router streams chunk encrypted with `cbInit(secret, nonce)` +6. Recipient decrypts with same parameters + +This provides forward secrecy per-download - compromising the file encryption key does not reveal transport keys. + +### Auth tag verification + +The 16-byte Poly1305 auth tag is verified after receiving all chunks: +- Single chunk: tag appended at end +- Multiple chunks: tag in final chunk, verified after concatenation + +Failed auth tag verification produces `CRYPTO` error. + +--- + +## Chunk management + +**Source**: [FileTransfer/Types.hs](../../src/Simplex/FileTransfer/Types.hs) + +### Sender chunk state + +```haskell +data SndFileChunkReplica = SndFileChunkReplica + { sndChunkReplicaId :: Int64, + server :: XFTPServer, + replicaId :: ChunkReplicaId, + replicaKey :: C.APrivateAuthKey, + rcvIdsKeys :: [(ChunkReplicaId, C.APrivateAuthKey)], + replicaStatus :: SndFileReplicaStatus, + delay :: Maybe Int64, + retries :: Int + } + +data SndFileReplicaStatus = SFRSCreated | SFRSUploaded +``` + +- `SFRSCreated`: FNEW sent, replica registered on server +- `SFRSUploaded`: FPUT complete, chunk data stored +- `rcvIdsKeys`: recipient IDs and keys for this replica + +### Recipient chunk state + +```haskell +data RcvFileChunk = RcvFileChunk + { rcvFileChunkId :: Int64, + chunkNo :: Int, + chunkSize :: Word32, + digest :: ByteString, + replicas :: [RcvFileChunkReplica], + fileTmpPath :: FilePath, + chunkTmpPath :: Maybe FilePath + } + +data RcvFileChunkReplica = RcvFileChunkReplica + { rcvChunkReplicaId :: Int64, + server :: XFTPServer, + replicaId :: ChunkReplicaId, + replicaKey :: C.APrivateAuthKey, + received :: Bool, + delay :: Maybe Int64, + retries :: Int + } +``` + +### Replica selection + +Each chunk can have multiple replicas on different servers. The file description includes all replicas; the recipient: +1. Tries first replica +2. On failure, tries next replica +3. Continues until success or all replicas exhausted + +This provides redundancy against server unavailability. + +### Retry handling + +Retry state is stored per-replica with two fields: +- `delay :: Maybe Int64` - milliseconds until next retry +- `retries :: Int` - consecutive failure count + +On failure, delay increases with exponential backoff. State persists in DB for resumption after agent restart. + +### Chunk sizes + +```haskell +chunkSize0 = kb 64 -- 65536 bytes +chunkSize1 = kb 256 -- 262144 bytes +chunkSize2 = mb 1 -- 1048576 bytes +chunkSize3 = mb 4 -- 4194304 bytes + +serverChunkSizes = [chunkSize0, chunkSize1, chunkSize2, chunkSize3] +``` + +Routers validate that uploaded chunks match one of the allowed sizes. This prevents fingerprinting based on exact file sizes. From f56b5940368c5d8773a4f1534c5f5c87daaa95d4 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 11:03:30 +0000 Subject: [PATCH 56/61] remove old topic stubs --- spec/agent-protocol.md | 13 -- spec/compression.md | 84 --------- spec/crypto-ratchet.md | 13 -- spec/crypto-tls.md | 11 -- spec/crypto.md | 19 --- spec/encoding.md | 332 ------------------------------------ spec/ntf-protocol.md | 15 -- spec/ntf-server.md | 11 -- spec/remote-control.md | 11 -- spec/smp-client.md | 11 -- spec/smp-protocol.md | 13 -- spec/smp-server.md | 13 -- spec/storage-agent.md | 11 -- spec/storage-server.md | 9 - spec/transport-http2.md | 13 -- spec/transport-websocket.md | 7 - spec/transport.md | 11 -- spec/version.md | 177 ------------------- spec/xftp-client.md | 11 -- spec/xftp-protocol.md | 13 -- spec/xftp-server.md | 11 -- spec/xrcp-protocol.md | 13 -- 22 files changed, 822 deletions(-) delete mode 100644 spec/agent-protocol.md delete mode 100644 spec/compression.md delete mode 100644 spec/crypto-ratchet.md delete mode 100644 spec/crypto-tls.md delete mode 100644 spec/crypto.md delete mode 100644 spec/encoding.md delete mode 100644 spec/ntf-protocol.md delete mode 100644 spec/ntf-server.md delete mode 100644 spec/remote-control.md delete mode 100644 spec/smp-client.md delete mode 100644 spec/smp-protocol.md delete mode 100644 spec/smp-server.md delete mode 100644 spec/storage-agent.md delete mode 100644 spec/storage-server.md delete mode 100644 spec/transport-http2.md delete mode 100644 spec/transport-websocket.md delete mode 100644 spec/transport.md delete mode 100644 spec/version.md delete mode 100644 spec/xftp-client.md delete mode 100644 spec/xftp-protocol.md delete mode 100644 spec/xftp-server.md delete mode 100644 spec/xrcp-protocol.md diff --git a/spec/agent-protocol.md b/spec/agent-protocol.md deleted file mode 100644 index b84ffb9ce..000000000 --- a/spec/agent-protocol.md +++ /dev/null @@ -1,13 +0,0 @@ -# Agent Protocol Implementation - -> Implements agent connection procedures, queue rotation, and duplex messaging. - -**Protocol reference**: [`protocol/agent-protocol.md`](../protocol/agent-protocol.md) - -## Types - -## Connection Procedures - -## Queue Rotation - -## Functions diff --git a/spec/compression.md b/spec/compression.md deleted file mode 100644 index faa8c275f..000000000 --- a/spec/compression.md +++ /dev/null @@ -1,84 +0,0 @@ -# Compression - -> Zstd compression for SimpleX protocol messages. - -**Source file**: [`Compression.hs`](../src/Simplex/Messaging/Compression.hs) - -## Overview - -Optional Zstd compression for SMP message bodies. Short messages bypass compression entirely to avoid overhead. The `Compressed` type carries a tag byte indicating whether the payload is compressed or passthrough, making it self-describing on the wire. - -## Types - -### `Compressed` - -**Source**: `Compression.hs:17-22` - -```haskell -data Compressed - = Passthrough ByteString -- short messages, left intact - | Compressed Large -- Zstd-compressed, 2-byte length prefix -``` - -Wire encoding (`Compression.hs:30-38`): - -``` -Passthrough → '0' ++ smpEncode ByteString (1-byte tag + 1-byte length + data) -Compressed → '1' ++ smpEncode Large (1-byte tag + 2-byte length + data) -``` - -Tags are `'0'` (0x30) and `'1'` (0x31) — same ASCII convention as `Maybe` encoding. - -`Passthrough` uses standard `ByteString` encoding (max 255 bytes, 1-byte length prefix). `Compressed` uses `Large` encoding (max 65535 bytes, 2-byte Word16 length prefix), since compressed output can exceed 255 bytes for larger inputs. - -## Constants - -| Constant | Value | Purpose | Source | -|----------|-------|---------|--------| -| `maxLengthPassthrough` | 180 | Messages at or below this length are not compressed | `Compression.hs:24-25` | -| `compressionLevel` | 3 | Zstd compression level | `Compression.hs:27-28` | - -The 180-byte threshold was "sampled from real client data" — messages above this length show rapidly increasing compression ratio. Below 180 bytes, compression overhead (FFI call, dictionary-less Zstd startup) outweighs savings. - -## Functions - -### `compress1` - -**Source**: `Compression.hs:40-43` - -```haskell -compress1 :: ByteString -> Compressed -``` - -Compress a message body: -- If `B.length bs <= 180` → `Passthrough bs` -- Otherwise → `Compressed (Large (Z1.compress 3 bs))` - -No context or dictionary — each message is independently compressed ("1" in `compress1` refers to single-shot compression). - -### `decompress1` - -**Source**: `Compression.hs:45-53` - -```haskell -decompress1 :: Int -> Compressed -> Either String ByteString -``` - -Decompress with size limit: -- `Passthrough bs` → `Right bs` (no check needed — already bounded by encoding) -- `Compressed (Large bs)` → check `Z1.decompressedSize bs`: - - If size is known and within `limit` → decompress - - If size unknown or exceeds `limit` → `Left` error - -The size limit check happens **before** decompression, using Zstd's frame header (which includes the decompressed size when the compressor wrote it). This prevents decompression bombs — an attacker cannot cause unbounded memory allocation by sending a small compressed payload that expands to gigabytes. - -The `Z1.decompress` result is pattern-matched for three cases: -- `Z1.Error e` → `Left e` -- `Z1.Skip` → `Right mempty` (zero-length output) -- `Z1.Decompress bs'` → `Right bs'` - -## Security notes - -- **Decompression bomb protection**: `decompress1` requires an explicit size limit and checks `decompressedSize` before allocating. Callers must pass an appropriate limit (typically the SMP block size). -- **No dictionary/context**: Each message is independently compressed. No shared state between messages that could leak information across compression boundaries. -- **Passthrough for short messages**: Messages ≤ 180 bytes are never compressed, avoiding timing side channels from compression ratio differences on short, potentially-predictable messages. diff --git a/spec/crypto-ratchet.md b/spec/crypto-ratchet.md deleted file mode 100644 index de5af38a2..000000000 --- a/spec/crypto-ratchet.md +++ /dev/null @@ -1,13 +0,0 @@ -# Double Ratchet & PQDR - -> Implements the double ratchet algorithm with post-quantum extensions (PQDR). - -**Protocol reference**: [`protocol/pqdr.md`](../protocol/pqdr.md) - -## State - -## Transitions - -## Key Derivation - -## Functions diff --git a/spec/crypto-tls.md b/spec/crypto-tls.md deleted file mode 100644 index 9327ae69a..000000000 --- a/spec/crypto-tls.md +++ /dev/null @@ -1,11 +0,0 @@ -# TLS & Certificate Chains - -> TLS session setup, certificate chain construction, and server identity validation. - -## TLS Setup - -## Certificate Validation - -## Trust Anchoring - -## Functions diff --git a/spec/crypto.md b/spec/crypto.md deleted file mode 100644 index ec8fb0a49..000000000 --- a/spec/crypto.md +++ /dev/null @@ -1,19 +0,0 @@ -# Cryptographic Primitives - -> All cryptographic primitives used across SimpleX protocols. - -## Ed25519 - -## X25519 - -## NaCl - -## AES-GCM - -## SHA - -## HKDF - -## Key Generation - -## Functions diff --git a/spec/encoding.md b/spec/encoding.md deleted file mode 100644 index f5501cfab..000000000 --- a/spec/encoding.md +++ /dev/null @@ -1,332 +0,0 @@ -# Encoding - -> Binary and string encoding used across all SimpleX protocols. - -**Source files**: [`Encoding.hs`](../src/Simplex/Messaging/Encoding.hs), [`Encoding/String.hs`](../src/Simplex/Messaging/Encoding/String.hs), [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs) - -## Overview - -Two encoding layers serve different purposes: - -- **`Encoding`** — Binary wire format for SMP protocol transmissions. Compact, no delimiters between fields. Used in all on-the-wire protocol messages. -- **`StrEncoding`** — Human-readable string format for configuration, URIs, logs, and JSON serialization. Uses base64url for binary data, decimal for numbers, comma-separated lists, space-separated tuples. - -Both are typeclasses with `MINIMAL` pragmas requiring `encode` + (`decode` | `parser`), with the missing one derived from the other. - -## Binary Encoding (`Encoding` class) - -```haskell -class Encoding a where - smpEncode :: a -> ByteString - smpDecode :: ByteString -> Either String a -- default: parseAll smpP - smpP :: Parser a -- default: smpDecode <$?> smpP -``` - -### Length-prefix conventions - -| Type | Prefix | Max size | -|------|--------|----------| -| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | -| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | -| `Tail` (newtype) | None — consumes rest of input | Unlimited | -| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | -| `NonEmpty` | Same as list (fails on count=0) | 255 items | - -### Scalar types - -| Type | Encoding | Bytes | -|------|----------|-------| -| `Char` | Raw byte | 1 | -| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | -| `Word16` | Big-endian | 2 | -| `Word32` | Big-endian | 4 | -| `Int64` | Two big-endian Word32s (high then low) | 8 | -| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | -| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | -| `String` | `B.pack` then ByteString encoding | 1 + len | - -### `Maybe a` - -``` -Nothing → '0' (0x30) -Just x → '1' (0x31) ++ smpEncode x -``` - -Tags are ASCII characters `'0'`/`'1'`, not binary 0x00/0x01. - -### Tuples - -Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's `smpP`. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size). - -### Combinators - -| Function | Signature | Purpose | -|----------|-----------|---------| -| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | -| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | -| `smpListP` | `Parser [a]` | Parse count then that many items | -| `lenEncode` | `Int -> Char` | Int to single-byte length char | - -## String Encoding (`StrEncoding` class) - -```haskell -class StrEncoding a where - strEncode :: a -> ByteString - strDecode :: ByteString -> Either String a -- default: parseAll strP - strP :: Parser a -- default: strDecode <$?> base64urlP -``` - -Key difference from `Encoding`: the default `strP` parses base64url input first, then applies `strDecode`. This means types that only implement `strDecode` will automatically accept base64url-encoded input. - -### Instance conventions - -| Type | Encoding | -|------|----------| -| `ByteString` | base64url (non-empty required) | -| `Word16`, `Word32` | Decimal string | -| `Int`, `Int64` | Signed decimal | -| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | -| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | -| `Text` | UTF-8 bytes, parsed until space/newline | -| `SystemTime` | `systemSeconds` as Int64 (decimal) | -| `UTCTime` | ISO 8601 string | -| `CertificateChain` | Comma-separated base64url blobs | -| `Fingerprint` | base64url of fingerprint bytes | - -### Collection encoding - -| Type | Separator | -|------|-----------| -| Lists (`strEncodeList`) | Comma `,` | -| `NonEmpty` | Comma (fails on empty) | -| `Set a` | Comma | -| `IntSet` | Comma | -| Tuples (2-6) | Space (` `) | - -### `Str` newtype - -Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded. - -### `TextEncoding` class - -```haskell -class TextEncoding a where - textEncode :: a -> Text - textDecode :: Text -> Maybe a -``` - -Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Used for types that need Text representation (e.g., enum display names). - -### JSON bridge functions - -| Function | Purpose | -|----------|---------| -| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | -| `strToJEncoding` | Same, for Aeson encoding | -| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | -| `textToJSON` | `TextEncoding a => a -> J.Value` | -| `textToEncoding` | Same, for Aeson encoding | -| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | - -## Parsers - -**Source**: [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs) - -### Core parsing functions - -| Function | Signature | Purpose | -|----------|-----------|---------| -| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | -| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | -| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | -| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | -| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | -| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | - -### `base64P` - -Standard base64 parser (not base64url — uses `+`/`/` alphabet). Takes alphanumeric + `+`/`/` characters, optional `=` padding, then decodes. Contrast with `base64urlP` in `Encoding/String.hs` which uses `-`/`_` alphabet. - -### JSON options helpers - -Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift). - -| Function | Purpose | -|----------|---------| -| `enumJSON` | All-nullary constructors as strings, with tag modifier | -| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | -| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | -| `singleFieldJSON` | `{"Tag": value}` format | -| `defaultJSON` | Default options with `omitNothingFields = True` | - -Pattern synonyms for JSON field names: -- `TaggedObjectJSONTag = "type"` -- `TaggedObjectJSONData = "data"` -- `SingleFieldJSONTag = "_owsf"` - -### String helpers - -| Function | Purpose | -|----------|---------| -| `fstToLower` | Lowercase first character | -| `dropPrefix` | Remove prefix string, lowercase remainder | -| `textP` | Parse rest of input as UTF-8 `String` | - -## Auxiliary Types and Utilities - -### TMap - -**Source**: [`TMap.hs`](../src/Simplex/Messaging/TMap.hs) - -```haskell -type TMap k a = TVar (Map k a) -``` - -STM-based concurrent map. Wraps `Data.Map.Strict` in a `TVar`. All mutations use `modifyTVar'` (strict) to prevent thunk accumulation. - -| Function | Notes | -|----------|-------| -| `emptyIO` | IO allocation (`newTVarIO`) | -| `singleton` | STM allocation | -| `clear` | Reset to empty | -| `lookup` / `lookupIO` | STM / non-transactional IO read | -| `member` / `memberIO` | STM / non-transactional IO membership | -| `insert` / `insertM` | Insert value / insert from STM action | -| `delete` | Remove key | -| `lookupInsert` | Atomic lookup-then-insert (returns old value) | -| `lookupDelete` | Atomic lookup-then-delete | -| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | -| `union` | Merge `Map` into `TMap` | - -`lookupIO`/`memberIO` use `readTVarIO` — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations. - -### SessionVar - -**Source**: [`Session.hs`](../src/Simplex/Messaging/Session.hs) - -Race-safe session management using TMVar + monotonic ID. - -```haskell -data SessionVar a = SessionVar - { sessionVar :: TMVar a -- result slot - , sessionVarId :: Int -- monotonic ID from TVar counter - , sessionVarTs :: UTCTime -- creation timestamp - } -``` - -| Function | Purpose | -|----------|---------| -| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | -| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | -| `tryReadSessVar` | Non-blocking read of session result | - -The ID-match check in `removeSessVar` prevents a race where: -1. Thread A creates session #5, starts work -2. Thread B creates session #6 (replacing #5 in TMap) -3. Thread A finishes, tries to remove — ID mismatch, removal blocked - -### ServiceScheme - -**Source**: [`ServiceScheme.hs`](../src/Simplex/Messaging/ServiceScheme.hs) - -```haskell -data ServiceScheme = SSSimplex | SSAppServer SrvLoc -data SrvLoc = SrvLoc HostName ServiceName -``` - -URI scheme for SimpleX service addresses. `SSSimplex` encodes as `"simplex:"`, `SSAppServer` as `"https://host:port"`. - -`simplexChat` is the constant `SSAppServer (SrvLoc "simplex.chat" "")`. - -### SystemTime - -**Source**: [`SystemTime.hs`](../src/Simplex/Messaging/SystemTime.hs) - -```haskell -newtype RoundedSystemTime (t :: Nat) = RoundedSystemTime { roundedSeconds :: Int64 } -type SystemDate = RoundedSystemTime 86400 -- day precision -type SystemSeconds = RoundedSystemTime 1 -- second precision -``` - -Phantom-typed time rounding. The `Nat` type parameter specifies rounding granularity in seconds. - -| Function | Purpose | -|----------|---------| -| `getRoundedSystemTime` | Get current time rounded to `t` seconds | -| `getSystemDate` | Alias for day-rounded time | -| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | -| `roundedToUTCTime` | Convert back to `UTCTime` | - -`RoundedSystemTime` derives `FromField`/`ToField` for SQLite storage and `FromJSON`/`ToJSON` for API serialization. - -### Util - -**Source**: [`Util.hs`](../src/Simplex/Messaging/Util.hs) - -Selected utilities used across the codebase: - -**Monadic combinators**: - -| Function | Signature | Purpose | -|----------|-----------|---------| -| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | -| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | -| `ifM` / `whenM` / `unlessM` | Monadic conditionals | | -| `anyM` | Short-circuit `any` for monadic predicates (strict) | | - -**Error handling**: - -| Function | Purpose | -|----------|---------| -| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | -| `catchAllErrors` | Same with handler | -| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | -| `catchAllOwnErrors` | Same with handler | -| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | -| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | -| `catchThrow` | Catch exceptions, wrap in Left | -| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | - -The own-vs-async distinction is critical: `catchOwn`/`tryAllOwnErrors` never swallow async cancellation (`ThreadKilled`, `UserInterrupt`, etc.), only synchronous exceptions and resource exhaustion (`StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded`). - -**STM**: - -| Function | Purpose | -|----------|---------| -| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | - -**Database result helpers**: - -| Function | Purpose | -|----------|---------| -| `firstRow` | Extract first row with transform, or Left error | -| `maybeFirstRow` | Extract first row as Maybe | -| `firstRow'` | Like `firstRow` but transform can also fail | - -**Collection utilities**: - -| Function | Purpose | -|----------|---------| -| `groupOn` | `groupBy` using equality on projected key | -| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | -| `toChunks` | Split list into `NonEmpty` chunks of size n | -| `packZipWith` | Optimized ByteString zipWith (direct memory access) | - -**Miscellaneous**: - -| Function | Purpose | -|----------|---------| -| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | -| `bshow` / `tshow` | `show` to `ByteString` / `Text` | -| `threadDelay'` | `Int64` delay (handles overflow by looping) | -| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | -| `labelMyThread` | Label current thread for debugging | -| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | -| `traverseWithKey_` | `Map` traversal discarding results | - -## Security notes - -- **Length prefix overflow**: `ByteString` encoding uses 1-byte length — silently truncates strings > 255 bytes. Callers must ensure size bounds before encoding. `Large` extends to 65535 bytes via Word16 prefix. -- **`Tail` unbounded**: `Tail` consumes all remaining input with no size check. Only safe when total message size is already bounded (e.g., within a padded SMP block). -- **base64 vs base64url**: `Parsers.base64P` uses standard alphabet (`+`/`/`), while `String.base64urlP` uses URL-safe alphabet (`-`/`_`). Mixing them causes silent decode failures. -- **`safeDecodeUtf8`**: Replaces invalid UTF-8 with `'?'` rather than failing. Suitable for logging/display, not for security-critical string comparison. diff --git a/spec/ntf-protocol.md b/spec/ntf-protocol.md deleted file mode 100644 index c826e7e72..000000000 --- a/spec/ntf-protocol.md +++ /dev/null @@ -1,15 +0,0 @@ -# NTF Protocol Implementation - -> Implements NTF commands, token registration, and subscription lifecycle for push notifications. - -**Protocol reference**: [`protocol/push-notifications.md`](../protocol/push-notifications.md) - -## Types - -## Commands - -## Token Lifecycle - -## Subscription Lifecycle - -## Functions diff --git a/spec/ntf-server.md b/spec/ntf-server.md deleted file mode 100644 index 4a39957e3..000000000 --- a/spec/ntf-server.md +++ /dev/null @@ -1,11 +0,0 @@ -# Notification Server - -> Notification server implementation: token management, subscriptions, and APNS integration. - -## Token Management - -## Subscription Management - -## APNS Integration - -## Functions diff --git a/spec/remote-control.md b/spec/remote-control.md deleted file mode 100644 index 5a064437c..000000000 --- a/spec/remote-control.md +++ /dev/null @@ -1,11 +0,0 @@ -# Remote Control (XRCP) - -> XRCP implementation: discovery, invitation, and session management. - -## Discovery - -## Invitation - -## Session Management - -## Functions diff --git a/spec/smp-client.md b/spec/smp-client.md deleted file mode 100644 index 39ae87f9a..000000000 --- a/spec/smp-client.md +++ /dev/null @@ -1,11 +0,0 @@ -# SMP Client - -> SMP client implementation: protocol operations, proxy relay, and reconnection logic. - -## Protocol Operations - -## Proxy Relay - -## Reconnection - -## Functions diff --git a/spec/smp-protocol.md b/spec/smp-protocol.md deleted file mode 100644 index 0def97941..000000000 --- a/spec/smp-protocol.md +++ /dev/null @@ -1,13 +0,0 @@ -# SMP Protocol Implementation - -> Implements SMP commands, types, and binary encoding for the SimpleX Messaging Protocol. - -**Protocol reference**: [`protocol/simplex-messaging.md`](../protocol/simplex-messaging.md) - -## Types - -## Commands - -## Encoding - -## Functions diff --git a/spec/smp-server.md b/spec/smp-server.md deleted file mode 100644 index 696d19067..000000000 --- a/spec/smp-server.md +++ /dev/null @@ -1,13 +0,0 @@ -# SMP Server - -> SMP server implementation: connection handling, queue operations, proxying, and control port. - -## Connection Handling - -## Queue Operations - -## Proxying - -## Control - -## Functions diff --git a/spec/storage-agent.md b/spec/storage-agent.md deleted file mode 100644 index 4ba4c414c..000000000 --- a/spec/storage-agent.md +++ /dev/null @@ -1,11 +0,0 @@ -# Agent Storage - -> Agent storage backends: SQLite, Postgres, and migration framework. - -## SQLite Backend - -## Postgres Backend - -## Migration Framework - -## Functions diff --git a/spec/storage-server.md b/spec/storage-server.md deleted file mode 100644 index b2dec1842..000000000 --- a/spec/storage-server.md +++ /dev/null @@ -1,9 +0,0 @@ -# Server Storage - -> Server storage backends: STM queues and message stores (STM, Journal, Postgres). - -## STM Queues - -## Message Stores (STM, Journal, Postgres) - -## Functions diff --git a/spec/transport-http2.md b/spec/transport-http2.md deleted file mode 100644 index 2594b8431..000000000 --- a/spec/transport-http2.md +++ /dev/null @@ -1,13 +0,0 @@ -# HTTP/2 Transport - -> HTTP/2 framing, client and server sessions, and file streaming for XFTP. - -## Framing - -## Client Sessions - -## Server Sessions - -## File Streaming - -## Functions diff --git a/spec/transport-websocket.md b/spec/transport-websocket.md deleted file mode 100644 index 182a43c47..000000000 --- a/spec/transport-websocket.md +++ /dev/null @@ -1,7 +0,0 @@ -# WebSocket Transport - -> WebSocket adapter for browser-based SimpleX clients. - -## Adapter - -## Functions diff --git a/spec/transport.md b/spec/transport.md deleted file mode 100644 index 0e50a67d9..000000000 --- a/spec/transport.md +++ /dev/null @@ -1,11 +0,0 @@ -# Transport Layer - -> Transport abstraction, handshake protocol, and block padding for metadata privacy. - -## Abstraction - -## Handshake Protocol - -## Block Padding - -## Functions diff --git a/spec/version.md b/spec/version.md deleted file mode 100644 index 19ad786fe..000000000 --- a/spec/version.md +++ /dev/null @@ -1,177 +0,0 @@ -# Version Negotiation - -> Version ranges and compatibility checking for protocol evolution. - -**Source files**: [`Version.hs`](../src/Simplex/Messaging/Version.hs), [`Version/Internal.hs`](../src/Simplex/Messaging/Version/Internal.hs) - -## Overview - -All SimpleX protocols use version negotiation during handshake. Each party advertises a `VersionRange` (min..max supported), and negotiation produces a `Compatible` proof value if the ranges overlap — choosing the highest mutually-supported version. - -The `Compatible` newtype can only be constructed internally (constructor is not exported), so the type system enforces that compatibility was actually checked. - -## Types - -### `Version v` - -```haskell -newtype Version v = Version Word16 -``` - -Phantom-typed version number. The phantom `v` distinguishes version spaces (e.g., SMP versions vs Agent versions vs XFTP versions) at the type level, preventing accidental comparison across protocols. - -- `Encoding`: 2 bytes big-endian (via Word16 instance) -- `StrEncoding`: decimal string -- JSON: numeric value -- Derives: `Eq`, `Ord`, `Show` - -The constructor is exported from `Version.Internal` but not from `Version`, so application code cannot fabricate versions — they must come from protocol constants or parsing. - -### `VersionRange v` - -```haskell -data VersionRange v = VRange - { minVersion :: Version v - , maxVersion :: Version v - } -``` - -Invariant: `minVersion <= maxVersion` (enforced by smart constructors). - -The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only) is public. - -- `Encoding`: two Word16s concatenated (4 bytes total) -- `StrEncoding`: `"min-max"` or `"v"` if min == max -- JSON: `{"minVersion": n, "maxVersion": n}` - -### `VersionScope v` - -```haskell -class VersionScope v -``` - -Empty typeclass used as a constraint on version operations. Each protocol declares its version scope: - -```haskell -instance VersionScope SMP -instance VersionScope Agent -``` - -This prevents accidentally mixing version ranges from different protocols in negotiation functions. - -### `Compatible a` - -```haskell -newtype Compatible a = Compatible_ a - -pattern Compatible :: a -> Compatible a -pattern Compatible a <- Compatible_ a -``` - -Proof that compatibility was checked. The `Compatible_` constructor is not exported — `Compatible` is a read-only pattern synonym. The only way to obtain a `Compatible` value is through `compatibleVersion`, `compatibleVRange`, `proveCompatible`, or the internal `mkCompatibleIf`. - -### `VersionI` / `VersionRangeI` type classes - -Multi-param typeclasses with functional dependencies for generic version/range operations. Allow extension types that wrap `Version` or `VersionRange` to participate in negotiation: - -```haskell -class VersionScope v => VersionI v a | a -> v where - type VersionRangeT v a -- associated type: range form - version :: a -> Version v - toVersionRangeT :: a -> VersionRange v -> VersionRangeT v a - -class VersionScope v => VersionRangeI v a | a -> v where - type VersionT v a -- associated type: version form - versionRange :: a -> VersionRange v - toVersionRange :: a -> VersionRange v -> a - toVersionT :: a -> Version v -> VersionT v a -``` - -Identity instances exist for `Version v` and `VersionRange v` themselves. - -## Functions - -### Construction - -| Function | Signature | Purpose | -|----------|-----------|---------| -| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | -| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | -| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | - -### Compatibility checking - -### isCompatible - -**Purpose**: Check if a single version falls within a range. - -```haskell -isCompatible :: VersionI v a => a -> VersionRange v -> Bool -``` - -### isCompatibleRange - -**Purpose**: Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. - -```haskell -isCompatibleRange :: VersionRangeI v a => a -> VersionRange v -> Bool -``` - -### proveCompatible - -**Purpose**: If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. - -```haskell -proveCompatible :: VersionI v a => a -> VersionRange v -> Maybe (Compatible a) -``` - -### Negotiation - -### compatibleVersion - -**Purpose**: Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. - -```haskell -compatibleVersion :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible (VersionT v a)) -``` - -### compatibleVRange - -**Purpose**: Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty. - -```haskell -compatibleVRange :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible a) -``` - -### compatibleVRange' - -**Purpose**: Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. - -```haskell -compatibleVRange' :: VersionRangeI v a => a -> Version v -> Maybe (Compatible a) -``` - -## Protocol version constants - -Version constants for each protocol are defined in their respective Transport modules. For SMP, key gates include: - -- `currentSMPAgentVersion`, `supportedSMPAgentVRange` — current negotiation range -- `serviceCertsSMPVersion = 16` — service certificate handshake -- `rcvServiceSMPVersion = 19` — service subscription commands - -See [`transport.md`](transport.md) and [`rcv-services.md`](rcv-services.md) for protocol-specific version constants. - -## Negotiation protocol - -During handshake: -1. Client sends its `VersionRange` to server -2. Server computes `compatibleVRange clientRange serverRange` -3. If `Nothing` → reject connection (incompatible) -4. If `Just (Compatible agreedRange)` → use `maxVersion agreedRange` as the effective protocol version - -The `Compatible` proof flows through the connection setup, ensuring all subsequent version-gated code paths have evidence that negotiation occurred. - -## Security notes - -- **No downgrade attack protection in negotiation itself** — an active MITM could modify the version range to force a lower version. Protection comes from the TLS layer (authentication prevents MITM) and from servers setting minimum version floors. -- **`mkVersionRange` uses `error`** — only safe for compile-time constants. Runtime construction must use `safeVersionRange`. diff --git a/spec/xftp-client.md b/spec/xftp-client.md deleted file mode 100644 index 99306bb73..000000000 --- a/spec/xftp-client.md +++ /dev/null @@ -1,11 +0,0 @@ -# XFTP Client - -> XFTP client implementation: file operations, CLI interface, and agent integration. - -## File Operations - -## CLI - -## Agent - -## Functions diff --git a/spec/xftp-protocol.md b/spec/xftp-protocol.md deleted file mode 100644 index 26eb950be..000000000 --- a/spec/xftp-protocol.md +++ /dev/null @@ -1,13 +0,0 @@ -# XFTP Protocol Implementation - -> Implements XFTP commands, types, and chunk operations for the SimpleX File Transfer Protocol. - -**Protocol reference**: [`protocol/xftp.md`](../protocol/xftp.md) - -## Types - -## Commands - -## Chunk Operations - -## Functions diff --git a/spec/xftp-server.md b/spec/xftp-server.md deleted file mode 100644 index bdcbbb9aa..000000000 --- a/spec/xftp-server.md +++ /dev/null @@ -1,11 +0,0 @@ -# XFTP Server - -> XFTP server implementation: chunk storage, recipient management, and control port. - -## Chunk Storage - -## Recipient Management - -## Control - -## Functions diff --git a/spec/xrcp-protocol.md b/spec/xrcp-protocol.md deleted file mode 100644 index 8f084f7ca..000000000 --- a/spec/xrcp-protocol.md +++ /dev/null @@ -1,13 +0,0 @@ -# XRCP Protocol Implementation - -> Implements XRCP session handshake and commands for remote control of SimpleX clients. - -**Protocol reference**: [`protocol/xrcp.md`](../protocol/xrcp.md) - -## Types - -## Session Handshake - -## Commands - -## Functions From bb72e1a97af98052e29a7655ea9c82df0ddadf2a Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 12:37:33 +0000 Subject: [PATCH 57/61] update --- spec/agent/infrastructure.md | 15 +++++++++++++++ spec/topics/patterns.md | 20 ++++++++++++++++++++ spec/topics/transport.md | 26 ++++++++++++++++++++++++++ 3 files changed, 61 insertions(+) diff --git a/spec/agent/infrastructure.md b/spec/agent/infrastructure.md index 5e3c828e6..784608b62 100644 --- a/spec/agent/infrastructure.md +++ b/spec/agent/infrastructure.md @@ -194,3 +194,18 @@ The agent supports SQLite and PostgreSQL via CPP compilation flags (`#if defined **Store access bracketing**: `withStore` wraps all database operations with `agentOperationBracket AODatabase`, connecting the store to the suspension cascade. `withStoreBatch` / `withStoreBatch'` run multiple operations in a single transaction with per-operation error catching. **Known bug**: `checkConfirmedSndQueueExists_` uses `#if defined(dpPostgres)` (typo - should be `dbPostgres`), so the `FOR UPDATE` clause is never included on either backend. + +### Migration framework + +**Source**: [Agent/Store/Migrations.hs](../../src/Simplex/Messaging/Agent/Store/Migrations.hs), [Agent/Store/Shared.hs](../../src/Simplex/Messaging/Agent/Store/Shared.hs) + +Migrations are Haskell modules under `Agent/Store/SQLite/Migrations/` and `Agent/Store/Postgres/Migrations/`. Each has `up` SQL and optional `down` SQL. + +**Key behaviors**: + +- `migrationsToRun` compares app migrations against the `migrations` table by name. Divergent histories (app has `[a,b]`, DB has `[a,c]`) produce `MTREDifferent` error - manual intervention required. +- Each migration runs in its own transaction with the `migrations` insert *before* the schema change - failure rolls back both. +- Downgrades require all intermediate migrations to have `down` SQL; missing any produces `MTRENoDown`. +- `MigrationConfirmation` controls whether upgrades/downgrades auto-apply, prompt, or error. + +**Special case**: `m20220811_onion_hosts` triggers `updateServers` to expand host entries with Tor addresses - this is data migration, not just schema. diff --git a/spec/topics/patterns.md b/spec/topics/patterns.md index a5d61500e..09bd9a7e1 100644 --- a/spec/topics/patterns.md +++ b/spec/topics/patterns.md @@ -10,6 +10,8 @@ For protocol-specific encoding details, see [transport.md](transport.md). For cr - [Compression](#compression) - [Concurrent data structures](#concurrent-data-structures) - [Batch processing](#batch-processing) +- [Time encoding](#time-encoding) +- [Utilities](#utilities) --- @@ -335,3 +337,21 @@ withStoreBatch' :: Traversable t ``` Use when operations cannot fail (or failures should become `INTERNAL` errors). + +--- + +## Time encoding + +**Source**: [SystemTime.hs](../../src/Simplex/Messaging/SystemTime.hs) + +`RoundedSystemTime t` uses a phantom type-level `Nat` for precision. `SystemDate` (precision 86400) provides k-anonymity for file creation times - all timestamps within a day collapse to the same value, preventing correlation attacks. + +--- + +## Utilities + +**Source**: [Util.hs](../../src/Simplex/Messaging/Util.hs) + +**Functor combinators**: `<$$>` (double fmap), `<$$` (double fmap const), and `<$?>` (fmap with `MonadFail` on `Left`) are used throughout for nested functor manipulation and fallible parsing chains. + +**`threadDelay'`**: Handles `Int64` delays that exceed `maxBound::Int` by looping with `maxBound`-sized chunks. diff --git a/spec/topics/transport.md b/spec/topics/transport.md index 2cce46f77..a9ce85f1b 100644 --- a/spec/topics/transport.md +++ b/spec/topics/transport.md @@ -10,6 +10,8 @@ For service certificate handshake extensions, see [client-services.md](client-se - [Transmission encoding and signing](#transmission-encoding-and-signing) - [Version negotiation](#version-negotiation) - [Connection management](#connection-management) +- [HTTP/2 sessions](#http2-sessions) +- [WebSocket adapter](#websocket-adapter) --- @@ -321,3 +323,27 @@ All four threads run inside `raceAny_` with `E.finally disconnected`. When any t 2. The agent callback demotes subscriptions, fires DOWN events, and initiates resubscription The `connected` TVar is set to `True` after the handshake succeeds and before the threads start. Note: in the protocol client, this TVar is not reset on disconnect - disconnect detection relies on thread cancellation via `raceAny_` and the `disconnected` callback, not STM re-evaluation. (The server-side `Client` type has a separate `connected` TVar that is reset in `clientDisconnected`.) + +--- + +## HTTP/2 sessions + +**Source**: [Transport/HTTP2/Client.hs](../../src/Simplex/Messaging/Transport/HTTP2/Client.hs), [Transport/HTTP2/Server.hs](../../src/Simplex/Messaging/Transport/HTTP2/Server.hs) + +HTTP/2 is used for XFTP file transfers and notifications to push providers (APNs). + +**Why the request queue**: `sendRequest` serializes requests through a `TBQueue` because the underlying http2 library is not thread-safe for concurrent stream creation. `sendRequestDirect` exists but is explicitly marked unsafe. + +**Inactivity expiration**: Server connections track `activeAt` and are closed by a background thread when idle beyond `checkInterval`. This is necessary because HTTP/2 has no application-level keepalive - abandoned connections would otherwise persist indefinitely. + +--- + +## WebSocket adapter + +**Source**: [Transport/WebSockets.hs](../../src/Simplex/Messaging/Transport/WebSockets.hs) + +WebSocket wraps TLS for browser clients, implementing the `Transport` typeclass. + +**Strict size matching**: Unlike raw TLS where `cGet` may accumulate multiple reads, WebSocket `cGet` expects a single `receiveData` to return exactly the requested size. Mismatch throws `TEBadBlock` immediately - WebSocket messages are atomic, so partial reads indicate a protocol error. + +**No compression**: `connectionCompressionOptions = NoCompression` because the payload is already encrypted. Compressing ciphertext wastes CPU and leaks information about plaintext structure. From 8dfb59ba88dcc29c294d7f536eb440ed69928647 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 13:00:27 +0000 Subject: [PATCH 58/61] design notes for main spec files --- spec/agent.md | 6 ++++++ spec/clients.md | 12 +++++++++++- spec/routers.md | 14 ++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/spec/agent.md b/spec/agent.md index 02f93314b..af49ac63f 100644 --- a/spec/agent.md +++ b/spec/agent.md @@ -4,6 +4,12 @@ The SimpleX Agent is the Layer 3 connection manager. It builds duplex encrypted For usage and API overview, see [docs/AGENT.md](../docs/AGENT.md). For protocol specifications, see [Agent Protocol](../protocol/agent-protocol.md), [PQDR](../protocol/pqdr.md). +**Split-phase encryption**: Message sending separates ratchet advancement (API thread, serialized) from body encryption (delivery worker, parallel). This prevents ratchet lock contention across queues while maintaining correct message ordering. See [infrastructure.md](agent/infrastructure.md#message-delivery). + +**Worker taxonomy**: Three worker families handle background operations - delivery workers (per send queue), async command workers (per connection), and NTF workers (per server). All use the same create-or-reuse pattern with restart rate limiting. See [infrastructure.md](agent/infrastructure.md#worker-framework). + +**Suspension cascade**: Operations drain in dependency order: `AORcvNetwork` → `AOMsgDelivery` → `AOSndNetwork` → `AODatabase`. Suspending receive processing cascades through to database access, ensuring clean shutdown. See [infrastructure.md](agent/infrastructure.md#operation-suspension-cascade). + --- **Module specs**: [Agent](modules/Simplex/Messaging/Agent.md) · [Agent Client](modules/Simplex/Messaging/Agent/Client.md) · [Agent Protocol](modules/Simplex/Messaging/Agent/Protocol.md) · [Store Interface](modules/Simplex/Messaging/Agent/Store/Interface.md) · [NtfSubSupervisor](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) · [XFTP Agent](modules/Simplex/FileTransfer/Agent.md) · [Ratchet](modules/Simplex/Messaging/Crypto/Ratchet.md) diff --git a/spec/clients.md b/spec/clients.md index 3ab9d5868..acb513e9d 100644 --- a/spec/clients.md +++ b/spec/clients.md @@ -8,6 +8,10 @@ For deployment and usage, see [docs/CLIENT.md](../docs/CLIENT.md). For protocol ## SMP Client (ProtocolClient) +**Four threads**: Send and receive threads are separate to allow backpressure - a slow receiver doesn't block sending. The process thread decouples parsing from delivery, preventing a slow consumer from stalling the receive loop. The monitor thread provides application-level keepalive beyond TCP - detecting protocol-level stalls. See [transport.md](topics/transport.md#connection-management). + +**Correlation ID lifecycle**: IDs are generated before send and removed on response OR timeout. Removal on timeout prevents unbounded growth of `sentCommands` when the router is unresponsive. + **Module specs**: [Client](modules/Simplex/Messaging/Client.md) · [Protocol](modules/Simplex/Messaging/Protocol.md) · [Transport](modules/Simplex/Messaging/Transport.md) · [Crypto](modules/Simplex/Messaging/Crypto.md) Generic protocol client used for both SMP and NTF connections. Manages a single TLS connection with multiplexed command/response matching via correlation IDs. @@ -59,6 +63,10 @@ sequenceDiagram ## SMPClientAgent +**Dual consumers**: Used by both SMP router (for proxy connections to relays) and NTF router (for NSUB subscriptions to SMP routers). Same connection pooling and reconnection logic, different command sets. + +**Session ID gating**: Subscription responses are validated against the current TLS session ID. A response from a stale session (connection dropped and reconnected between send and receive) is discarded rather than corrupting state. See [infrastructure.md](agent/infrastructure.md#subscription-tracking). + **Module specs**: [Client Agent](modules/Simplex/Messaging/Client/Agent.md) Connection manager that multiplexes multiple ProtocolClient connections. Tracks subscriptions, handles reconnection with backoff, and forwards server messages and connection events upward. Used by SMP router (proxying) and NTF router (subscriptions). @@ -114,9 +122,11 @@ sequenceDiagram ## XFTP Client +**No subscriptions**: File operations complete independently - no persistent server-side state to track. This allows XFTPClient to be a thin wrapper with no threads of its own. + **Module specs**: [Client](modules/Simplex/FileTransfer/Client.md) · [Protocol](modules/Simplex/FileTransfer/Protocol.md) · [HTTP/2 Client](modules/Simplex/Messaging/Transport/HTTP2/Client.md) -Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own; each operation is a synchronous HTTP/2 request/response. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. +Stateless wrapper around HTTP2Client. XFTPClient adds no threads of its own. Serialization and multiplexing happen inside HTTP2Client's internal request queue and process thread. ### XFTP Client components diff --git a/spec/routers.md b/spec/routers.md index b7b8761ef..d141afaf3 100644 --- a/spec/routers.md +++ b/spec/routers.md @@ -8,6 +8,12 @@ For deployment and configuration, see [docs/ROUTERS.md](../docs/ROUTERS.md). For ## SMP Router +**Thread model**: Client handler threads process commands synchronously, but subscription registration goes through a separate `serverThread` via `subQ`. This split-STM pattern reduces contention - client handlers don't block on the shared `SubscribedClients` map. See [subscriptions.md](topics/subscriptions.md) for details. + +**Store separation**: `QueueStore` holds queue metadata and auth keys; `MsgStore` holds message bodies. Different durability tradeoffs - queue metadata needs consistency (Postgres option), message bodies optimize for throughput (STM/Journal options). + +**Proxy architecture**: The proxy router maintains an `SMPClientAgent` that pools connections to destination relays - one connection per relay server, shared across all proxy sessions to that relay. Each proxy session gets its own `SessionId` (from the relay's TLS session) and DH keys, but the underlying TCP connection is reused. The proxy is stateless for command forwarding - it doesn't subscribe to queues or maintain transaction state, just relays encrypted commands and responses. + **Module specs**: [Server](modules/Simplex/Messaging/Server.md) · [Main](modules/Simplex/Messaging/Server/Main.md) · [QueueStore](modules/Simplex/Messaging/Server/QueueStore.md) · [QueueStore Postgres](modules/Simplex/Messaging/Server/QueueStore/Postgres.md) · [MsgStore](modules/Simplex/Messaging/Server/MsgStore.md) · [StoreLog](modules/Simplex/Messaging/Server/StoreLog.md) · [Control](modules/Simplex/Messaging/Server/Control.md) · [Prometheus](modules/Simplex/Messaging/Server/Prometheus.md) · [Stats](modules/Simplex/Messaging/Server/Stats.md) ### SMP Router components @@ -70,6 +76,10 @@ sequenceDiagram ## XFTP Router +**Stateless operations**: Unlike SMP, XFTP has no subscriptions or delivery threads. Each command completes independently. This simplifies scaling - no subscription state to synchronize across instances. + +**Quota reservation**: File size is reserved atomically on FNEW (before upload), released on deletion or expiration. This prevents overcommit - a client cannot upload more than they reserved. + **Module specs**: [Server](modules/Simplex/FileTransfer/Server.md) · [Main](modules/Simplex/FileTransfer/Server/Main.md) · [Store](modules/Simplex/FileTransfer/Server/Store.md) · [StoreLog](modules/Simplex/FileTransfer/Server/StoreLog.md) · [Stats](modules/Simplex/FileTransfer/Server/Stats.md) · [Transport](modules/Simplex/FileTransfer/Transport.md) ### XFTP Router components @@ -120,6 +130,10 @@ sequenceDiagram ## NTF Router +**Inverted role**: The NTF router is itself an SMP *client* - it maintains NSUB subscriptions to SMP routers, receiving NMSG events when messages arrive. It doesn't serve queues; it subscribes to them. + +**Token batching**: `tokenLastNtfs` aggregates notifications per token before push. Multiple queue notifications for the same device are combined into a single APNs payload, reducing push overhead. + **Module specs**: [Server](modules/Simplex/Messaging/Notifications/Server.md) · [Main](modules/Simplex/Messaging/Notifications/Server/Main.md) · [Store Postgres](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) · [APNS](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) · [Control](modules/Simplex/Messaging/Notifications/Server/Control.md) · [Client](modules/Simplex/Messaging/Notifications/Client.md) · [Protocol](modules/Simplex/Messaging/Notifications/Protocol.md) ### NTF Router components From 486d3251fc1af531bd12899eb97c198c91ef52cd Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 13:05:55 +0000 Subject: [PATCH 59/61] update readme --- spec/README.md | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/spec/README.md b/spec/README.md index c993f108d..54eda20d9 100644 --- a/spec/README.md +++ b/spec/README.md @@ -58,12 +58,30 @@ Module doc entry format: ## Index +### Architecture + +Component topology and message flow diagrams for each layer: + +- [routers.md](routers.md) — Layer 1: SMP, XFTP, NTF routers +- [clients.md](clients.md) — Layer 2: protocol client libraries +- [agent.md](agent.md) — Layer 3: connection manager + ### Topics -- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) -- [encoding.md](encoding.md) — Binary and string encoding -- [version.md](version.md) — Version ranges and negotiation -- [compression.md](compression.md) — Zstd compression +Cross-cutting concerns that span multiple modules: + +- [topics/transport.md](topics/transport.md) — TLS, HTTP/2, WebSocket transport layers +- [topics/patterns.md](topics/patterns.md) — Exception handling, encoding, compression, TMap +- [topics/subscriptions.md](topics/subscriptions.md) — Queue subscriptions and delivery +- [topics/notifications.md](topics/notifications.md) — Push notification flow +- [topics/xftp.md](topics/xftp.md) — File transfer protocol +- [topics/client-services.md](topics/client-services.md) — Service certificates for bulk operations + +### Agent internals + +- [agent/infrastructure.md](agent/infrastructure.md) — Workers, store, operation suspension +- [agent/connections.md](agent/connections.md) — Connection lifecycle and states +- [agent/xrcp.md](agent/xrcp.md) — Remote control protocol ### Modules From ae57fef89851342e6f072c11bbeabc49b973719a Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 16:57:42 +0000 Subject: [PATCH 60/61] encryption schemes --- spec/README.md | 1 + spec/topics/encryption.md | 139 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 140 insertions(+) create mode 100644 spec/topics/encryption.md diff --git a/spec/README.md b/spec/README.md index 54eda20d9..394eac772 100644 --- a/spec/README.md +++ b/spec/README.md @@ -76,6 +76,7 @@ Cross-cutting concerns that span multiple modules: - [topics/notifications.md](topics/notifications.md) — Push notification flow - [topics/xftp.md](topics/xftp.md) — File transfer protocol - [topics/client-services.md](topics/client-services.md) — Service certificates for bulk operations +- [topics/encryption.md](topics/encryption.md) — Encryption layers (TODO) ### Agent internals diff --git a/spec/topics/encryption.md b/spec/topics/encryption.md new file mode 100644 index 000000000..567b2c384 --- /dev/null +++ b/spec/topics/encryption.md @@ -0,0 +1,139 @@ +# Encryption + +TODO - subjects to cover: + +## TLS layer + +**Protocol**: [simplex-messaging.md#tls-transport-encryption](../../protocol/simplex-messaging.md#tls-transport-encryption) +**Code**: [Transport.hs](../../src/Simplex/Messaging/Transport.hs), [Transport/Credentials.hs](../../src/Simplex/Messaging/Transport/Credentials.hs) + +- **Cipher suites**: CHACHA20-POLY1305-SHA256 (TLS 1.3), ECDHE-ECDSA-CHACHA20-POLY1305 (TLS 1.2) +- **Signature algorithms**: Ed448, Ed25519 (HashIntrinsic) +- **DH groups**: X448, X25519 +- **Certificate fingerprints**: SHA256 +- **Browser-compatible extension**: RSA, ECDSA-SHA256/384/512, P521 for XFTP web + +## Transport block encryption (optional, v11+) + +**Protocol**: [simplex-messaging.md#transport-handshake](../../protocol/simplex-messaging.md#transport-handshake) +**Code**: [Crypto.hs#sbcInit](../../src/Simplex/Messaging/Crypto.hs), [Transport.hs#tPutBlock](../../src/Simplex/Messaging/Transport.hs) + +- **Algorithm**: XSalsa20-Poly1305 (NaCl secret_box) +- **Key derivation**: `sbcInit` - HKDF-SHA512(salt=sessionId, ikm=dhSecret, info="SimpleXSbChainInit") +- **Chain advancement**: `sbcHkdf` - HKDF-SHA512(salt="", ikm=chainKey, info="SimpleXSbChain") +- **16-byte auth tag** reduces available payload + +## SMP queue layer + +**Protocol**: [simplex-messaging.md#cryptographic-algorithms](../../protocol/simplex-messaging.md#cryptographic-algorithms), [simplex-messaging.md#deniable-client-authentication-scheme](../../protocol/simplex-messaging.md#deniable-client-authentication-scheme) +**RFC** (design rationale): [2026-03-09-deniability.md](../../rfcs/standard/2026-03-09-deniability.md) +**Code**: [Crypto.hs#cbEncrypt](../../src/Simplex/Messaging/Crypto.hs), [Protocol.hs](../../src/Simplex/Messaging/Protocol.hs) + +- **Message body encryption**: NaCl crypto_box (X25519 + XSalsa20-Poly1305) with per-queue DH secret +- **Recipient/notifier commands**: Ed25519/Ed448 signatures +- **Sender commands**: X25519 DH-based `CbAuthenticator` (80 bytes = SHA512 hash encrypted with crypto_box) - provides deniability +- **Nonce**: correlation ID (24 bytes) +- **Server-to-recipient encryption**: `encryptMsg` with XSalsa20-Poly1305, nonce derived from message ID + +## SMP proxy layer + +**Protocol**: [simplex-messaging.md#sending-messages-via-proxy-router](../../protocol/simplex-messaging.md#sending-messages-via-proxy-router) +**Code**: [Protocol.hs#PRXY](../../src/Simplex/Messaging/Protocol.hs), [Server.hs](../../src/Simplex/Messaging/Server.hs) + +- **Double encryption**: client encrypts for relay (s2r), proxy adds layer for relay (p2r) +- **Per-session X25519 keys**: PKEY response contains relay's DH key signed by relay certificate +- **Session ID binding**: `tlsunique` from proxy-relay TLS session included in encrypted transmission +- **PFWD/RFWD**: correlation ID (24 bytes) used as crypto_box nonce + +## Agent/E2E layer (double ratchet) + +**Protocol**: [pqdr.md](../../protocol/pqdr.md) +**RFC** (versioning/migration): [2026-03-09-pqdr-version.md](../../rfcs/standard/2026-03-09-pqdr-version.md) +**Code**: [Crypto/Ratchet.hs](../../src/Simplex/Messaging/Crypto/Ratchet.hs), [Crypto/SNTRUP761.hs](../../src/Simplex/Messaging/Crypto/SNTRUP761.hs) + +- **DH algorithm**: X448 (not X25519) - `RatchetX448` +- **Post-quantum KEM**: SNTRUP761, hybrid secret = SHA3-256(DHSecret || KEMSharedKey) +- **Key derivation**: HKDF-SHA512 with context strings +- **Header encryption**: AES-256-GCM with header key (HKs) +- **Body encryption**: AES-256-GCM with message key derived from chain key +- **Associated data**: ratchet AD concatenated with encrypted header +- **Split-phase**: header encryption (API thread, serialized) vs body encryption (delivery worker, parallel) + +## XFTP file layer + +**Protocol**: [xftp.md#cryptographic-algorithms](../../protocol/xftp.md#cryptographic-algorithms) +**Code**: [Crypto/File.hs](../../src/Simplex/Messaging/Crypto/File.hs), [Crypto/Lazy.hs](../../src/Simplex/Messaging/Crypto/Lazy.hs), [FileTransfer/Crypto.hs](../../src/Simplex/FileTransfer/Crypto.hs) + +- **File encryption**: XSalsa20-Poly1305 (NaCl secret_box), random 32-byte key + 24-byte nonce per file +- **File integrity**: SHA512 digest in FileDescription +- **Command signing**: Ed25519 per-chunk keys from FileDescription +- **Transit encryption**: per-download X25519 DH, server returns ephemeral key with FGET response +- **Streaming**: Poly1305 state updated per chunk, 16-byte auth tag at end (tail tag pattern) + +## NTF (notifications) + +**Protocol**: [push-notifications.md](../../protocol/push-notifications.md) +**Code**: [Notifications/Protocol.hs](../../src/Simplex/Messaging/Notifications/Protocol.hs), [Notifications/Transport.hs](../../src/Simplex/Messaging/Notifications/Transport.hs) + +- **E2E encryption**: NaCl crypto_box between router and client +- **Key exchange**: X25519 DH (clientDhPubKey in TNEW, routerDhPubKey in response) +- **Command auth**: Ed25519 + +## Short links + +**Protocol**: [agent-protocol.md#short-invitation-links](../../protocol/agent-protocol.md#short-invitation-links) +**Code**: [Crypto/ShortLink.hs](../../src/Simplex/Messaging/Crypto/ShortLink.hs) + +- **Link key derivation**: SHA3-256(fixedLinkData) +- **Data encryption**: NaCl secret_box (XSalsa20-Poly1305) with HKDF-derived key +- **Fixed/user data**: padded to fixed sizes (2008/13784 bytes) for traffic analysis resistance +- **Signatures**: Ed25519 for owner authentication + +## Remote control (XRCP) + +**Protocol**: [xrcp.md](../../protocol/xrcp.md) +**Code**: [RemoteControl/Client.hs](../../src/Simplex/RemoteControl/Client.hs), [Crypto/SNTRUP761.hs](../../src/Simplex/Messaging/Crypto/SNTRUP761.hs) + +- **Session key**: SHA3-256(dhSecret || kemSharedKey) - hybrid DH + SNTRUP761 KEM +- **Chain keys**: `sbcInit` with HKDF-SHA512, keys swapped between controller and host +- **Command signing**: Ed25519 session key + long-term key (dual signature) + +## Service certificates + +**Protocol**: [simplex-messaging.md#service-certificates](../../protocol/simplex-messaging.md#service-certificates) +**RFC** (design rationale): [2026-03-10-client-certificates.md](../../rfcs/standard/2026-03-10-client-certificates.md) +**Code**: [Agent/Client.hs#getServiceCredentials](../../src/Simplex/Messaging/Agent/Client.hs), [Transport/Credentials.hs](../../src/Simplex/Messaging/Transport/Credentials.hs) + +- **Certificate type**: X.509 with Ed25519 signing key +- **Per-session keys**: fresh Ed25519 key pair per connection, signed by X.509 key +- **Fingerprint**: SHA256 of identity certificate +- **Proof-of-possession**: session key signed by service certificate + +## Primitives reference + +**Code**: [Crypto.hs](../../src/Simplex/Messaging/Crypto.hs) + +- **NaCl crypto_box** (`cbEncrypt`/`cbDecrypt`): X25519 DH + XSalsa20-Poly1305 +- **NaCl crypto_secretbox** (`sbEncrypt`/`sbDecrypt`): symmetric XSalsa20-Poly1305 +- **AES-256-GCM** (`encryptAEAD`/`decryptAEAD`): for ratchet message bodies +- **SNTRUP761**: post-quantum KEM via C FFI bindings - [Crypto/SNTRUP761.hs](../../src/Simplex/Messaging/Crypto/SNTRUP761.hs) +- **CbAuthenticator**: 80-byte authenticator = crypto_box(SHA512(message)) +- **HKDF**: SHA512-based, used with various context strings +- **Hashes**: SHA256 (fingerprints), SHA512 (authenticators, HKDF), SHA3-256 (hybrid KEM, short links) + +## Padding + +**Code**: [Crypto.hs#pad](../../src/Simplex/Messaging/Crypto.hs) + +- **Message padding** (`pad`/`unPad`): 2-byte big-endian length prefix + '#' fill +- **Short link data**: fixed-size encrypted blobs +- **XFTP hello**: 16384 bytes (indistinguishable from commands) +- **Ratchet header**: padded before encryption to hide KEM state + +## Key type constraints + +**Code**: [Crypto.hs](../../src/Simplex/Messaging/Crypto.hs) + +- `SignatureAlgorithm`: Ed25519, Ed448 only +- `DhAlgorithm`: X25519, X448 only +- `AuthAlgorithm`: Ed25519, Ed448, X25519 (NOT X448) - for queue command auth From 152c30ca6881e3cee8689d13fbee31332b44d1a6 Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Sun, 15 Mar 2026 18:19:06 +0000 Subject: [PATCH 61/61] rcv-services issues --- spec/rcv-services.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/spec/rcv-services.md b/spec/rcv-services.md index 6518059f2..0e98c0605 100644 --- a/spec/rcv-services.md +++ b/spec/rcv-services.md @@ -478,6 +478,8 @@ update (s, idsHash) = | **R-SVC-05** | Fold blocking | Low | `foldRcvServiceMessages` iterates all service queues sequentially, reading queue records and first messages. For services with many queues, this could take significant time. It runs in a forked thread, so it doesn't block the client's command processing, but the ALLS marker is delayed. No progress signal between SOKS and ALLS -- client doesn't know how many messages to expect. | | **R-SVC-06** | XOR hash collision | Very Low | IdsHash uses XOR of MD5 hashes. XOR is commutative and associative, so different queue sets with the same XOR-combined hash would not be detected. Given 16-byte hashes, collision probability is negligible for realistic queue counts, but the hash provides no ordering information. | | **R-SVC-07** | Count underflow in subtractServiceSubs | Very Low | If `n <= n'`, the function returns `(0, mempty)` -- a full reset. This is a defensive fallback but could mask accounting errors. | +| **R-SVC-08** | Big agent service handling diverged from small agent | Medium | Small agent (Client/Agent.hs, NTF-proven) has cleaner service unavailable handling: `notifyUnavailable` clears pending service sub and sends `CAServiceUnavailable` event, triggering queue-by-queue resubscription. Big agent (Agent/Client.hs) lacks equivalent path - errors throw without clearing pending state. TransportSessionMode adds complexity (per-entity vs per-user sessions). Service role validation differs (small agent checks `partyServiceRole`, big agent doesn't). These differences may cause subtle bugs when releasing rcv-services. | +| **R-SVC-09** | Server deferred delivery broken for service queues | Critical | In `tryDeliverMessage` (Server.hs), when a message arrives and the subscribed client's `sndQ` is full, the sync path correctly checks `rcvServiceId qr` to find the service subscriber (lines 1996-1998). But the spawned `deliverThread` (line 2043) hardcodes `getSubscribedClient rId (queueSubscribers subscribers)` - it looks in `queueSubscribers` instead of `serviceSubscribers`. For service-subscribed queues, `deliverThread` will never find the client. The message remains marked `SubPending` but is never delivered. Only reconnection or explicit re-subscription will deliver it. Impact: under load when sndQ fills, service clients silently lose message delivery until reconnection. | ### Considered and dismissed @@ -705,3 +707,7 @@ Triggers use `xor_combine` (Postgres equivalent of XOR hash combine) and fire on | **TG-SVC-10** | Medium | No agent-level test for concurrent reconnection — service resubscription racing with individual queue resubscription | | **TG-SVC-11** | Medium | No test for `SERVICE_END` agent event handling — what does the agent do after receiving ENDS? | | **TG-SVC-12** | Low | No test for SQLite trigger correctness — verifying `service_queue_count`/`service_queue_ids_hash` match expected values after insert/delete/update cycles | +| **TG-SVC-13** | High | Big agent lacks `CAServiceUnavailable` equivalent — no clean path to resubscribe all queues individually when service becomes unavailable. Small agent has `notifyUnavailable` which triggers queue-by-queue resubscription; big agent just throws error | +| **TG-SVC-14** | Medium | `pendingServiceSub` not cleared on service errors — small agent clears pending in `notifyUnavailable`; big agent may retain stale pending service subs after `clientServiceError` or `SSErrorServiceId` | +| **TG-SVC-15** | High | Missing `rcvServiceAssoc` cleanup on service unavailable — TODO at Agent/Client.hs:1742 notes this is incomplete. When service ID changes or becomes unavailable, queue associations should be cleared in database | +| **TG-SVC-16** | Critical | **Server bug**: `deliverThread` uses wrong subscriber lookup for service queues — At Server.hs:2043, deferred delivery (when sndQ is full) always uses `queueSubscribers`, but service clients are in `serviceSubscribers`. The sync path (lines 1996-1998) correctly checks `rcvServiceId qr`. Messages sent when sndQ is full will never be delivered to service subscribers until reconnection/resubscription. |