Skip to content

feat: multi-box deployment (wg-mesh + frigate-edge)#1

Merged
josibake merged 8 commits into
mainfrom
feat/multi-box
May 19, 2026
Merged

feat: multi-box deployment (wg-mesh + frigate-edge)#1
josibake merged 8 commits into
mainfrom
feat/multi-box

Conversation

@josibake
Copy link
Copy Markdown
Member

Summary

  • New wireguard-mesh module (modules/wireguard-mesh.nix) — thin n-peer mesh wrapper around networking.wireguard.interfaces. Identical peers block on every member; only thisHost and privateKeyFile differ per node.
  • New frigate-edge preset (modules/presets/frigate-edge.nix) — TLS + ACME + frigate, with bitcoind/fulcrum/ZMQ on another host. USERPASS auth (cookie can't cross host boundaries). Does not import nix-bitcoin.
  • Refactor: shared TLS + ACME wiring extracted into private modules/_internal/frigate-tls-acme.nix. Both public-frigate and frigate-edge import it. No behavior change for existing public-frigate consumers.
  • New exposeBackends option on public-frigate — binds bitcoind RPC + ZMQ + fulcrum on a configurable mesh address in addition to loopback, with firewall rules scoped to a single interface.

Tests added

  • checks.<system>.wireguard-mesh — two-VM mesh ping + firewall scope.
  • checks.<system>.regtest-edge — two-VM end-to-end: backend runs full nix-bitcoin + public-frigate with exposeBackends on; edge runs frigate-edge against it; verifies the edge serves an Electrum response with chain tip via the remote fulcrum proxy.

Test plan

  • CI: `regtest-e2e`, `regtest-preset`, `regtest-edge`, `wireguard-mesh` all green
  • CI: nix fmt --ci green
  • Local eval-clean confirmed (`nix flake check --all-systems --no-build`)

🤖 Generated with Claude Code

josibake and others added 8 commits May 18, 2026 16:45
A thin wrapper around `networking.wireguard.interfaces` that takes a
mesh-wide `peers` definition (same on every host) and builds the
interface from it, dropping the entry that matches `thisHost`. Adding
a new node is a one-place edit to `peers` plus `thisHost` on the new
box.

Mesh is point-to-point with /32 peer allowedIPs — no subnets routed
through. Exposure of services on the mesh interface is the consumer's
concern (scope via `networking.firewall.interfaces.<iface>`).

Includes a two-VM nixosTest (`checks.<system>.wireguard-mesh`) that
brings up the mesh on a shared subnet, asserts cross-mesh reachability,
and confirms the firewall opens the WG UDP port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a frigate-edge preset for the multi-box deployment shape: TLS +
ACME + frigate, with bitcoind/fulcrum/ZMQ on another host. Edge nodes
authenticate to bitcoind via USERPASS (cookie auth doesn't cross host
boundaries); the credentials file is consumed via systemd's
LoadCredential and templated into config.toml at start.

Refactor: the ACME/nginx/TLS wiring shared between public-frigate and
frigate-edge moves into a private `_internal/frigate-tls-acme.nix`
helper, set up by the parent preset via the `services._roost.*`
internal namespace. No behavior change for existing public-frigate
consumers; the regtest-preset test still passes the same assertions.

New `exposeBackends` option block on public-frigate lets a backend
host bind its bitcoind RPC, ZMQ sequence publisher, and fulcrum on a
mesh interface in addition to loopback, with firewall rules scoped to
that interface only. Backed by the existing nix-bitcoin typed options
where they support it (rpc.users, rpc.allowip), `extraConfig` where
they're single-bind (rpcbind, fulcrum tcp).

Test: `checks.<system>.regtest-edge` boots two VMs (backend running
the full local stack with exposeBackends on; edge running frigate-edge
against it) and runs the same scan-end-to-end checks regtest-preset
runs, driven against the edge's Electrum listener.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream wireguard module produces two units per interface: a
`wireguard-wg0.service` that brings up the interface itself, and one
`wireguard-wg0-peer-<X>.service` per peer that installs the peer's
config. A target `wireguard-wg0.target` aggregates both.

The test was waiting on the bare interface service, which returns as
soon as the interface is up — before the per-peer services have
installed any peers in the kernel. Pings fired immediately after
hit "ping: sendmsg: Required key not available" because there was no
peer matching 10.42.0.2 yet, and the 30s timeout expired before the
peer service finished its setup.

Wait on the target instead. It is `wantedBy = [ "multi-user.target" ]`
and `wants` both the interface service and every peer service, so a
target-reached state is the right "everything's installed" signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exposeBackends firewall rule was hardcoding port 8332. That's
mainnet's bitcoind RPC port; nix-bitcoin's `rpc.port` default tracks
the chain (regtest → 18443, testnet → 18332, signet → 38332). On any
non-mainnet network the firewall opens 8332 while bitcoind listens
elsewhere, and edge consumers see connection refused.

Surfaced by regtest-edge: bitcoind on the backend bound 18443
(regtest), the firewall opened 8332, the edge's frigate hit
"Cannot connect to Bitcoin Core at http://192.168.1.2:18443" and the
service exited.

Pull the port from `config.services.bitcoind.rpc.port` so the firewall
follows whatever bitcoind is actually doing. Safe inside this
mkIf-block because exposeBackends already asserts bitcoind.manage = true,
which guarantees the option is defined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The NixOS test framework assigns each node's primary interface address
as 192.168.<vlan>.<nodeNumber>, starting at nodeNumber 1 in
declaration order (range 1 254 in lib/testing/network.nix:23). I had
hardcoded .2 / .3 throughout, which is off-by-one — the first declared
node is .1, not .2.

Consequence in mesh.nix: nodeA's wireguard peer pointed at
192.168.1.3:51820 (a non-existent IP), nodeB pointed at .2 (which was
*its own* address). Handshake never completed; ping timed out with no
peer alive.

Consequence in regtest-edge.nix: the edge tried to reach the backend
at 192.168.1.2:18443, but the backend was actually at .1. Even with
the firewall fix in the preceding commit, the edge couldn't find the
backend at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous HMAC was computed with `openssl ... -macopt hexkey:$salt`,
which decodes the salt from hex into raw bytes and uses those bytes as
the HMAC key. bitcoind's rpcauth.py uses the salt's literal UTF-8
string bytes as the key:

  hmac.new(salt.encode("utf-8"), password.encode("utf-8"), "SHA256")

Two different keys → two different HMACs. Every auth attempt the edge
sent to bitcoind was rejected with "incorrect password attempt".

Recomputed with the correct algorithm. Comment now states the exact
key derivation so the next person who hits this doesn't trip over the
same mistake.

Verifiable via:
  python3 -c "import hmac; print(hmac.new('2316d0a5e8ee6339ffb4d86c983bb421'.encode(), 'testpassword'.encode(), 'SHA256').hexdigest())"
  # → 34cc4776187170b359d40928b25deb28ea2bfc436c96fdd0db7150ec5211de85

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`wait_for_open_port` defaults to 900s. When frigate fails its first
connect to bitcoind (auth error, DNS, ZMQ) systemd's Restart=on-failure
keeps trying every ~13s, and with DefaultStartLimitBurst=5 /
DefaultStartLimitIntervalSec=10s the burst limit never trips — the
restarts are spaced just far enough apart. The unit stays in eternal
auto-restart while the port never opens; the test waits 15 minutes
and then fails with a useless "port never opened" message.

Replace the bare `wait_for_open_port(50001)` with a 60s polling loop
that, on timeout, dumps the last 50 lines of frigate's journal. The
60s bound covers ~4–5 restart cycles — plenty for a legitimately slow
backend boot, and short enough to surface a real configuration bug
quickly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The probe pipes Electrum JSON-RPC into `nc -q 3 127.0.0.1 50001`. The
`-q` flag (wait N seconds after stdin EOF before closing) is a
netcat-openbsd extension; NixOS's default nc supports `-z` but not
`-q`, so the probe silently emitted nothing and the 120s loop timed
out with empty responses on every iteration.

regtest-preset.nix avoids this by adding `pkgs.netcat-openbsd` to
`environment.systemPackages`. Mirror that here on the edge node.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@josibake josibake merged commit b9c80be into main May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant