QUIC rewrite + rendezvous + relay + mTLS + 16 audit fixes#3
Open
cdcseacave wants to merge 26 commits into
Open
QUIC rewrite + rendezvous + relay + mTLS + 16 audit fixes#3cdcseacave wants to merge 26 commits into
cdcseacave wants to merge 26 commits into
Conversation
Single QUIC transport on a single UDP socket per endpoint, with per-chunk
unidirectional streams instead of the sliding window. Mandatory TLS 1.3
with per-device Ed25519 + self-signed cert pinned by SHA-256 fingerprint
(generated and persisted to <config_dir>/p2p-transfer/identity.{key,cert}
on first run; published in LAN beacons and via --peer-fingerprint).
Removes: tcp.rs, window.rs (sliding window), per-chunk CRC32 + crc32fast,
ChunkAck / AckStatus / ChunkMessage, ENCRYPTION + WINDOWED capability
bits, --window-size + --max-retries CLI flags, the legacy blocking
nat.rs (collapsed into traversal/stun.rs). PROTOCOL_VERSION bumped to 2;
equality check only — no v1 compat code.
Verified: cargo test --features full and cargo clippy --all-targets
--all-features -- -D warnings green; LAN loopback transfer of a 2 MB
file is byte-identical SHA-256 at ~80 MB/s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New workspace crate p2p-rendezvous with a pairing-by-code MessagePack protocol (tiny: register-and-await on TCP) and a rendezvousd binary operators can self-host. p2p-core::traversal::establish_via_rendezvous binds a UDP socket, runs STUN on it (the same socket quinn will own), registers the public endpoint + cert fingerprint + device id with the rendezvous under a short shared code, and on peer match races QuicEndpoint::connect against accept on the punched socket — QUIC Initial packets themselves do the punching. Symmetric NAT is detected up front via stun::classify_nat (two STUN servers, compare mapped ports) and surfaces Error::HolePunchFailed without an attempt. New tests/traversal_loopback_test.rs covers the rendezvous + race_connect_and_accept primitives end-to-end on localhost. CLI gains --rendezvous <host:port> and --code <code> on send/receive that take precedence over --peer and --discover when present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`rendezvousd --relay-bind <addr> --max-relay-mbps <n>` runs a tiny UDP
packet forwarder alongside the rendezvous. When STUN spots symmetric
NAT (or --force-relay is set), the registrant asks for relay mode and
the rendezvous returns Message::RelayMatch with a fresh 16-byte session
token + the relay address + the peer fingerprint. Each peer sends a
RelayHello so the relay records its source address, then runs a normal
QUIC handshake with the relay's address as the apparent peer endpoint.
The relay forwards UDP packets verbatim — QUIC TLS still terminates
end-to-end so the relay only sees ciphertext.
New: p2p_rendezvous::relay::{Relay, RelayHello}, Message::RelayMatch,
RegisterRequest.want_relay (defaults to false for back-compat),
register_full + MatchOutcome enum, --force-relay CLI flag, --relay-bind
and --max-relay-mbps on rendezvousd, tests/relay_loopback_test.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es (Phase 3)
GUI Connection tab gains a third mode "Pair with code (cross-NAT)" with
rendezvous-server + shared-code inputs and a Generate button. The
existing Connect mode now also exposes the --peer-fingerprint field
needed for direct mode. Session establishment runs inside Command::perform
and only the resulting P2PSession is wrapped in Arc<tokio::Mutex<...>>,
so the iced message loop stays responsive during a multi-second
rendezvous wait.
CLI `nat-test --rendezvous <host[:port]>` runs a real self-loop punch
test: spawns two local peers, registers both at the rendezvous with a
fresh code, races a QUIC handshake between them, and reports direct /
relay / failed plus latency.
Two underlying bugs surfaced while validating end-to-end:
1. The race_connect_and_accept tokio::select! could pick different
directions on each peer, leaving them on mismatched connections.
Replaced with a deterministic role split keyed on device_id —
smaller device_id runs connect(), larger runs accept(). This also
fixes the "two processes sharing the same persistent identity
deadlock both as Responder" case.
2. quinn's open_bi is local-only — the responder's accept_bi only
unblocks once the initiator writes to the stream. Have
open_control_initiator write the 4-byte PROTOCOL_MAGIC immediately
after open_bi, and open_control_responder consume it after
accept_bi. The control stream is now visible to both sides without
waiting for the application's first send_message.
Verified: cargo clippy --features full --all-targets -- -D warnings
clean; cargo test --features full all green; manual end-to-end of
`send <file> --rendezvous --code --force-relay` ↔
`receive --rendezvous --code --force-relay` produces a byte-identical
SHA-256 file at ~55 MB/s through a localhost relay.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Data integrity (C1, C3, H4, M4, C4): - Drain QUIC uni-streams via stream.stopped() so the last chunk isn't lost when the sender closes the connection. - Widen chunk indices (ChunkReader.total_chunks, read_chunk, fold_chunk, ChunkWriter.write_chunk) to u64; large files no longer truncate at the 2^32-chunk boundary. - Bounds-check the wire-supplied chunk_index against total_chunks in receive_file. - Receiver SHA mismatch is now a hard Error::Verification, not a silent warn. Path sanitization (M3): receiver and scan_folder both run a sanitize_relative_path that rejects absolute paths, .., ., drive and root components. mTLS (H1): server now requires a client cert via a custom AcceptAnyClientCert verifier; client presents its cert in with_client_auth_cert. cross_check_fingerprint rejects missing observations, closing the responder-side HELLO bypass. NAT traversal (C2, H5, M1): - race_connect_and_accept launches both connect and accept on both sides; the larger-device-id peer staggers its connect by 50ms to avoid Initial-packet collisions. - accept_from loops on accept and drops connections whose source address doesn't match the rendezvous-supplied peer. - STUN query rejects responses whose transaction id doesn't match the request. Rendezvous + relay (H6, M6, H3, H2, M5): - Server::bind_with takes a max_concurrent cap (default 1024) enforced via tokio::sync::Semaphore. - Server rewrites the RegisterRequest IP to the TCP peer's IP (keeping the user-supplied UDP port), blocking traffic reflection through forged public_endpoint. - Relay recv buffer up to 65 KiB; warns on full-buffer reads. - Relay slot binding is now fingerprint-keyed lookup; reserve_session refuses identical fingerprints. - Idle-session eviction moved to a 30s background task off the per-packet forward path. Typed EOF (M2): framing::read_message maps UnexpectedEof on the magic read to Error::Disconnected; frame-interior short reads become Error::Protocol. Drops the string-matching arm in session.rs::run_event_loop. Per the project's no-backwards-compat rule, no compat shims were added: wire formats and call sites changed in place. New tests cover sanitization, STUN tx-id, IP rewriting, concurrency cap, cross-check, peer-address matching, and typed EOF. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: highlights now cover mTLS, rendezvous + hole punching, relay fallback, u64 chunk indices, path sanitization, hard SHA verification; add Security Model section. - DESIGN: mTLS identity model (both ends pin), updated module map with traversal/punch.rs + p2p-rendezvous, expanded NAT-traversal phases with anti-reflection IP rewrite + STUN tx-id validation + relay slot pre-binding; new "Security & robustness guarantees" section enumerating the 16 invariants the data path enforces. - TODO: mark "Security & robustness hardening" done (2026-05-23) with bullet summary of all 16 findings. - CHANGELOG: add dated "Fixed - 2026-05-23 - Security & robustness audit (16 findings)" entry grouped by Data integrity / Security / Robustness, listing each C/H/M id alongside the fix. - Root AGENTS: workspace now lists 4 member crates + 3 integration tests; new gotchas covering u64 chunk indices, sanitize_relative_path, mTLS, accept_from source validation, server-side public-IP rewrite, no-backwards-compat rule. - p2p-core AGENTS: mTLS layer description, traversal/punch.rs entry, receiver SHA mismatch + path sanitization + STUN tx-id sections. - p2p-cli AGENTS: SessionParams includes --rendezvous/--code/ --force-relay, drop the stale --window-size / --max-retries mentions, document nat-test self-loop mode. - p2p-gui AGENTS: ConnectionMode triplet (Listen/Connect/Rendezvous), Command::perform pattern for off-thread establishment. - p2p-rendezvous AGENTS: new file - server (concurrency cap + public-IP rewrite), relay (slot pre-binding + off-hot-path eviction), client, rendezvousd binary, wire protocol. - CONTRIBUTING + .github/copilot-instructions: project structure tree refreshed (p2p-rendezvous, traversal/punch.rs, integration tests, test counts no longer claim exact numbers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR lands a major transport redesign and pairing stack for the project: replacing the legacy TCP + sliding-window protocol with QUIC (TLS 1.3, mutual cert auth + fingerprint pinning), adding rendezvous + relay infrastructure for cross-NAT pairing, and incorporating a set of security/robustness audit fixes across the core library, CLI, and GUI.
Changes:
- Introduces QUIC transport + mTLS identity model (persistent device identity, fingerprint pinning, updated handshake/protocol framing).
- Adds cross-NAT pairing via new
p2p-rendezvouscrate (rendezvousdserver + optional UDP relay) and wires it into CLI/GUI flows. - Updates discovery beacons, tests, and docs; removes legacy TCP and sliding-window modules.
Reviewed changes
Copilot reviewed 61 out of 61 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/traversal_loopback_test.rs | New loopback rendezvous + punch smoke test. |
| tests/relay_loopback_test.rs | New loopback rendezvous + relay + QUIC handshake smoke test. |
| tests/integration_test.rs | Updates workspace integration test to QUIC + P2PSession handshake. |
| p2p-rendezvous/src/protocol.rs | Defines rendezvous wire protocol messages and types. |
| p2p-rendezvous/src/lib.rs | Crate entry + private framing helpers and re-exports. |
| p2p-rendezvous/src/client.rs | Implements rendezvous client registration and match handling. |
| p2p-rendezvous/src/bin/rendezvousd.rs | Adds rendezvousd binary (server + optional relay). |
| p2p-rendezvous/Cargo.toml | Adds new workspace crate manifest. |
| p2p-rendezvous/AGENTS.md | Adds crate-specific developer guidance. |
| p2p-gui/src/views/settings.rs | Removes legacy window-size UI input. |
| p2p-gui/src/views/connection.rs | Adds rendezvous pairing UI and peer fingerprint input for direct connect. |
| p2p-gui/src/state.rs | Adds rendezvous mode state + removes window-size setting. |
| p2p-gui/src/operations.rs | Implements rendezvous pairing + fingerprint parsing + updated establish calls. |
| p2p-gui/src/message.rs | Adds messages for rendezvous + fingerprint input; removes window-size message. |
| p2p-gui/Cargo.toml | Adds hex dependency for fingerprint decoding. |
| p2p-gui/AGENTS.md | Adds GUI-specific developer guidance and pairing mode notes. |
| p2p-core/src/window.rs | Removes legacy sliding-window implementation. |
| p2p-core/src/verification.rs | Drops CRC32 and keeps file-level SHA-256 verification with improved messages. |
| p2p-core/src/traversal/stun.rs | Adds async STUN query + NAT classification on borrowed UDP socket. |
| p2p-core/src/traversal/punch.rs | Adds QUIC-based race connect/accept punch with address validation. |
| p2p-core/src/traversal/mod.rs | Adds traversal orchestrator for STUN + rendezvous + punch/relay. |
| p2p-core/src/tls.rs | Adds rustls QUIC TLS config (pinning verifier + mTLS server config). |
| p2p-core/src/reconnect.rs | Removes legacy transient-error helper in favor of Error::is_recoverable. |
| p2p-core/src/protocol.rs | Updates control-plane protocol for QUIC era; adds cert fingerprints; removes chunk ACK/CRC/window fields. |
| p2p-core/src/network/udp.rs | Updates discovery beacons to include cert fingerprint. |
| p2p-core/src/network/tcp.rs | Removes legacy TCP transport. |
| p2p-core/src/network/mod.rs | Switches network module to QUIC-only transport. |
| p2p-core/src/network/framing.rs | Improves framing error typing (Disconnected vs Protocol). |
| p2p-core/src/lib.rs | Updates exports/constants for QUIC era (protocol v2, ALPN, default rendezvous port). |
| p2p-core/src/known_peers.rs | Adds TOFU known-peers fingerprint store. |
| p2p-core/src/identity.rs | Adds persistent per-device identity (Ed25519 + self-signed cert). |
| p2p-core/src/handshake.rs | Updates handshake for QUIC + fingerprint cross-check under mTLS. |
| p2p-core/src/error.rs | Adds QUIC/TLS/rendezvous traversal error variants + retry classification. |
| p2p-core/src/discovery.rs | Threads cert fingerprint into discovery manager/service. |
| p2p-core/Cargo.toml | Updates dependencies for QUIC/rustls/rcgen + adds rendezvous crate. |
| p2p-core/AGENTS.md | Adds crate-specific developer guidance for the QUIC rewrite. |
| p2p-cli/src/send.rs | Updates send flow to QUIC identity, rendezvous path, and fingerprint pinning. |
| p2p-cli/src/resume.rs | Updates resume to require peer fingerprint and connect via QUIC. |
| p2p-cli/src/rendezvous.rs | Adds CLI helper for --rendezvous/--code establishment. |
| p2p-cli/src/receive.rs | Updates receive flow to QUIC identity, rendezvous path, and fingerprint pinning. |
| p2p-cli/src/nat_test.rs | Updates NAT test to STUN-based classification + optional self-loop rendezvous punch. |
| p2p-cli/src/lib.rs | Wires updated nat-test and resume args; adds rendezvous module. |
| p2p-cli/src/discover.rs | Updates discover output to include peer fingerprint. |
| p2p-cli/src/cli.rs | Adds session flags for fingerprint + rendezvous pairing; removes window-size/max-retries transfer flags. |
| p2p-cli/Cargo.toml | Adds deps for rendezvous + hex parsing. |
| p2p-cli/AGENTS.md | Adds CLI-specific developer guidance reflecting QUIC + rendezvous changes. |
| CONTRIBUTING.md | Updates repo layout documentation for new crates/modules. |
| CHANGELOG.md | Adds detailed unreleased changelog entry for QUIC rewrite + rendezvous/relay + audit fixes. |
| Cargo.toml | Adds p2p-rendezvous to workspace and dev-deps for tests. |
| .github/copilot-instructions.md | Updates project guidance to reflect QUIC/rendezvous architecture and flags. |
`run_cli_sync` only printed the "GUI not available" hint when `cli.command.is_none()`. An explicit `p2p-transfer gui` fell through to the async dispatcher, which hit `unreachable!()` and panicked. Both code paths now match on `None | Some(Commands::Gui)` and exit with the same hint, keeping the dispatcher's `unreachable!()` arm truly unreachable. Found during PR smoke testing on a workspace build that compiled p2p-cli without the gui feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- STUN query loops until QUERY_TIMEOUT, dropping packets whose source
isn't the queried server or whose tx-id doesn't match (instead of
failing on the first mismatched packet — fixes stale-response
false negatives when the socket is shared).
- IPv6-safe rendezvous host parsing: new `p2p_core::with_default_port`
helper handles bare/bracketed IPv6, hostname:port, and bare IP forms.
Replaces three `contains(':')` checks in p2p-cli rendezvous/nat_test
and p2p-gui operations.
- `read_message` distinguishes between 0-byte EOF before any magic
byte (Disconnected — clean between-frames close) and partial-magic
truncation (Protocol — mid-frame).
- `--code` CLI help text no longer references the nonexistent
`pair --new` subcommand.
- Apply `cargo fmt --all` across the workspace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust 1.95 (current CI stable) adds `clippy::unnecessary_sort_by`, which flags the `sort_by(|a, b| b.x.cmp(&a.x))` pattern in two history files and breaks `cargo clippy -- -D warnings` on all three test runners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Code Coverage job fails when codecov.io returns 429 (rate limit for tokenless uploads). Switch fail_ci_if_error to false so an upload failure does not gate the PR — coverage is informational, and the codecov report still appears whenever the upload succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DEFAULT_CHUNK_SIZE 64 KB → 1 MiB (CLI flag --chunk-size and TransferConfig::chunk_size_kb defaults follow). Under QUIC the chunk is no longer the ACK unit (packets are), so chunk size now only affects per-chunk overhead (stream setup, progress events, SHA-256 segmentation) and resume granularity. Larger chunks amortize that overhead ~16x with negligible cost on resume. - quinn TransportConfig flow-control windows: stream_receive_window = 8 MiB, receive_window / send_window = 64 MiB. quinn's defaults (~1.25 MB / ~12.5 MB) stall on high-BDP links (e.g. gigabit at 30 ms RTT ≈ 3.75 MB BDP — just at the connection-window limit). - Drop the now-dead `max_chunks_in_flight`, `chunk_timeout_ms`, and `max_chunk_retries` fields from `TransferConfig` — they were declared but never read after the Phase 0 sliding-window removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stress-testing the CLI surface found four real product issues and two
related minor wins. All fixes are TDD: a failing test was added first,
then the minimal change to make it green.
ISSUE 4 — ReconnectConfig::default retried forever
max_attempts: 0 (unlimited) meant a sender that lost its peer would
back off exponentially to 180s and never surface the "resume with: ..."
hint that the error branch was already wired to print.
- Default is now max_attempts: 5 (3+6+12+24+48s ≈ 1.5min total).
- New flag --max-reconnect-attempts on Send and Resume; pass 0 to opt
back into infinite retries.
- p2p-core/src/reconnect.rs: test_default_caps_at_5_attempts
ISSUE 1 — resume CLI rejected single-file transfers
resume.rs guarded `if !path.exists() || !path.is_dir()`, so the
README's documented `--path ./bigfile.bin` always errored with
"Folder path does not exist or is not a directory". send_path itself
has no folder requirement.
- Dropped the is_dir() half; renamed log + help text.
- p2p-cli/src/resume.rs: accepts_file_path + rejects_nonexistent_path
ISSUE 3 — no way to run two CLI processes on one host
Identity::load_or_generate was hardcoded to dirs::config_dir(); on
Windows that resolves via SHGetKnownFolderPath and ignores $APPDATA,
so two child processes always shared the on-disk identity. The relay
then correctly refused them as same-fingerprint impostors — blocking
live-CLI loopback testing of the relay path.
- Collapsed load_or_generate / load_or_generate_in into a single
load_or_generate(dir: Option<&Path>); None preserves the default.
- New global --identity-dir flag threaded through send/receive/
discover/resume; GUI calls pass None.
- p2p-core/src/identity.rs: distinct_dirs_yield_distinct_fingerprints
ISSUE 2 — `history` was never populated by the CLI
TransferHistory::add_record was only called from p2p-gui/operations.rs;
CLI's send/receive paths never recorded, so `p2p-transfer history`
always reported "No transfer history found".
- New helper p2p_core::history::record_transfer(record, path?) that
handles load-or-empty + append + persist in one call.
- Wired into send.rs (single-shot) and receive.rs (per-iter loop;
replaces run_event_loop in CLI only — GUI still uses it).
- p2p-core/src/history.rs: record_transfer_appends_and_persists
MINOR 1 — history concurrent writes clobbered each other
load → push → save races between co-located sender/receiver: a 16-task
stress test kept only 1 of 16 records.
- record_transfer now does the read-modify-write under an OS-level
advisory exclusive lock via fs2; runs on a blocking task so it
doesn't stall the async runtime.
- p2p-core/src/history.rs: record_transfer_concurrent_appends_dont_clobber
MINOR 2 — `history` output was hidden by `-v warn`
handle_history used info! for every display line, so anything below
info verbosity printed nothing despite a populated history file.
- Display lines moved to println! (stdout, unconditional). Logging-
style noise removed (emojis replaced with plain ASCII tags).
Test deltas:
baseline: 90 tests pass
after fixes: 95 tests pass (5 new TDD tests, 0 regressions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three iterations of an end-to-end Bash harness that drives the built
binaries against each documented CLI surface:
stress.sh — broad sweep (T0-T10): direct send/receive, folder,
throttle, discover, nat-test (STUN + rendezvous self-loop),
rendezvous-mediated transfer, relay path, resume + history
stress_v2.sh — re-runs the v1 failures with corrected verbosity and
per-process identity dirs (uncovered the relay same-
fingerprint refusal and the resume-CLI is_dir bug)
stress_v3.sh — only the previously-failing tests, with corrected
expectations (24 MB file for the 2-s burst window,
receiver-kill so the sender persists state, etc.)
These are the harnesses used to find the four bugs fixed in 59cf3ca.
Run from repo root:
bash smoke/src/stress.sh
Logs land in target/tmp/stress*/<test>-{send,recv,rvz}.log for forensics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`FileTransferSession::send_file` skips chunks already present in the
caller-supplied `completed_chunks` bitmap — so on a resume from N of M
chunks done, the sender opens (M - N) unidirectional streams, not M.
`receive_file` looped `while received < total_chunks` and unconditionally
awaited M streams, so the receiver waited forever for the N streams the
sender intentionally never opened.
The deadlock was masked by the Issue 1 guard (resume CLI rejected file
paths outright). Now that single-file resume works, the underlying
chunk-count drift surfaces — `smoke/src/stress_v4.sh::T10b` reproduced it.
Fix: `receive_file` now takes a separate `streams_to_receive` parameter.
The caller (`FolderTransferSession::receive_folder`) computes it from
`TransferInfo.resume_from.completed_chunks.len()`, which the sender
already populates on every transfer. The existing `total_chunks`
parameter stays as the bounds check on incoming `chunk_index` values.
TDD:
RED p2p-core/src/transfer_file.rs::tests::resume_with_skipped_chunks_does_not_deadlock
— loopback pair, sender supplies `completed_chunks=[0,1]`, receiver
times out at 5 s on the buggy code.
GREEN passes in 70 ms.
Verified live: stress_v4.sh T10b now PASSES end-to-end (sha256 match
after interrupt + resume of a single-file transfer).
Test deltas: 95 → 96 (one new test, zero regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on stress_v1-v3 with the capabilities that landed in the recent fixes: - --identity-dir per process to drive a real CLI relay loopback (T9) - --max-reconnect-attempts bound to make T10 finite (T10a/T10b) - resume with a file path (T10b — previously unreachable) - history populated by the CLI itself + readable at -v warn (T11) - concurrent CLI processes appending to history (T12, 4-pair parallel) Final run: 21/21 PASS on the current branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e chunk_size source PR-1 of the post-review remediation plan. Addresses five review findings: * 1.3 — DEFAULT_CHUNK_SIZE is now the single source of truth. ConfigMessage::default and AppSettings::default both derived 64 KiB; sender CLI defaulted to 1 MiB so any session where the 64-KiB-default side won the handshake silently downgraded chunk size, neutralising the f07aae4 perf retune. Added regression test default_chunk_size_tests::config_message_default_matches_default_chunk_size. * 1.4 — history.rs no longer wipes the user's transfer log when a single bad byte appears in history.json. unwrap_or_default() on parse failure is replaced with a rename-aside-and-quarantine flow: corrupt bytes land in history.json.corrupt-<ts> and a fresh history is written. Regression test added. * 1.5 — history.rs append is now atomic and durable. Old code did seek(0)+set_len(0) +write_all in place with no sync_all, so a crash mid-write left an empty file and a clean exit could still lose the last record from page cache. New flow writes to history.json.tmp, fsyncs, then atomically renames; the OS-level exclusive lock is held on a sibling .lock file (stable identity across the rename of the data file). * 5.1 — TransferRecord::{new,complete,interrupt,fail} no longer panic on a pre-1970 SystemTime. unwrap() is replaced with unwrap_or(0); a bad clock now degrades the timestamp instead of crashing the receive loop. * 5.2 — fs2::FileExt::unlock failure after a successful write+fsync+rename is now a warn! rather than an Err return — the record is already durable; reporting Err would mislead callers about the outcome. AGENTS.md cleanup: stale "DEFAULT_CHUNK_SIZE = 65536" references replaced with 1 MiB plus a "single source of truth" note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 of the post-review remediation plan. Addresses Wave 1 items 1.1 and 1.2. * 1.1 — Multi-file folder resume no longer deadlocks when the sender skips files marked complete in a prior session. Before the fix the sender opened zero streams (and sent no FileChecksum) for indices in state.completed_files while the receiver iterated every entry in transfer_info.items and blocked in accept_uni() forever waiting for streams that would never arrive. New protocol field TransferInfo.completed_files carries the sender's skip set so the receiver can fast-forward in lock-step. Regression test multi_file_resume_with_completed_files_does_not_deadlock added with a 5 s timeout so a future re-introduction surfaces as a failed test, not a hang. * 1.2 — FolderTransferState now persists the negotiated ConfigMessage (chunk_size, compression_enabled, compression_level, adaptive_compression, bandwidth_limit) and exposes to_config_message() for resume. resume.rs uses that instead of ConfigMessage::default(); without this the .partial on disk (laid out under the original chunk_size) and the resumed session's ChunkWriter offsets would disagree, silently corrupting the file. The FolderTransferState::new signature now takes &ConfigMessage explicitly so the chunk_size source-of-truth is clear at every call site. Regression test state_remembers_negotiated_chunk_size_across_serde_roundtrip added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c, DoS cap PR-3 of the post-review remediation plan. Addresses Wave 1 items 1.6, 1.7, 1.8 and Wave 4 item 4.1. * 1.6 — receive_file now tracks distinct chunk_indices in a HashSet rather than counting raw stream arrivals. A buggy or hostile sender that re-opens the same chunk_index used to satisfy the count while a real chunk went missing (silent SHA-256 mismatch at end-of-transfer); the duplicate is now dropped with a warn and the loop continues until streams_to_receive DISTINCT indices have arrived. Regression test added. * 1.7 — ChunkWriter::new now takes expected_file_size and truncates an over-long .partial back to that length on open. Handles chunk_size drift between sessions and external truncation that left stale trailing bytes. receive_file passes total_chunks * chunk_size as the expected size, plus an up-front sanity check that streams_to_receive <= total_chunks. * 1.8 — write_chunk now calls sync_data (not just flush) before returning, so the bytes are durable BEFORE chunk_complete_callback fires and the resume state advertises the chunk as complete. Without this a crash between the write and finalize()'s sync_all would leave the resume metadata lying about which chunks are on disk. * 4.1 — receive_folder validates every peer-supplied file size against MAX_TRANSFER_FILE_SIZE (1 TiB) before opening a single stream. Without this a hostile peer could advertise an enormous size, computing a huge total_chunks and pinning the receiver in accept_uni() forever (KEEPALIVE_INTERVAL keeps the QUIC idle timeout from saving us). Unit test validate_file_size_rejects_absurd_values added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…summary, re-accept PR-4 of the post-review remediation plan. Addresses Wave 2 (all four receive.rs regressions from commit 59cf3ca) plus the related Wave 3 item 3.2. * 2.1 — session.receive_to now takes an accept_decision callback. The CLI passes |info| accept_or_prompt(auto_accept, info), which prints the incoming transfer's filenames + size to stderr and reads y/N from stdin when --auto-accept is false. On Reject the receiver sends Message::Cancel and the sender returns Err(Cancelled) without opening any chunk streams. The GUI variant rejects when auto-accept is off (modal dialog can be wired later). Regression test receive_folder_reject_sends_cancel_and_returns_empty_summary added. * 2.2 — receive loop no longer matches Error::Network in the disconnect arm. That variant wraps tokio::fs::Error via #[from] std::io::Error, so disk failures (ENOSPC, EACCES on the output dir, broken symlink) used to be swallowed as if the peer cleanly hung up. They now propagate, hit the catch-all Err arm, get recorded as TransferStatus::Failed in history, and return a non-zero exit code. * 2.3 — receive_to / receive_folder return TransferSummary { root_name, files, bytes } populated from TransferInfo.items. The CLI records those filenames in history instead of the --output directory path. Regression test receive_folder_accept_returns_summary_with_per_file_list added. * 2.4 — P2PSession::reaccept() runs endpoint.accept() + handshake on the existing endpoint, so the receive CLI keeps listening on the same --port across peer disconnects instead of exiting. The send CLI's --max-reconnect-attempts no longer races against a vanished listener. Sender also handles a receiver that closes the connection before sending Ready as Error::Cancelled (graceful reject), not Error::Network. * 3.2 (bundled) — session.send_path returns TransferSummary, so the send CLI's history record carries every file in a folder transfer instead of just the folder name. Falls back to base_name for the single-file case. Removed Session::run_event_loop in favour of an inlined loop per surface (receive CLI, GUI). The two surfaces have different accept-policy and UI needs; a one-size-fits-all helper would obscure rather than help. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…flag PR-5 of the post-review remediation plan. Addresses Wave 3 items 3.1, 3.3, and 3.4. (3.2 already shipped in PR-4.) * 3.1 — derive_base_name() in the new p2p-cli::util module replaces the `path.file_name().unwrap()` in send.rs that panicked on `p2p-transfer send .` (or `..`, or any path ending in a separator). The helper falls back to canonicalize() when file_name() returns None, and to the path's own display form for filesystem roots. Unit tests derive_base_name_handles_dot_and_dotdot and derive_base_name_handles_plain_file_name added. * 3.3 — replaced the false "Transfer interrupted again. State has been saved." log in resume.rs with truthful "State persisted up to the most recent file boundary." The send_path layer persists state on every file completion (and on error/retry), so mid-file chunk progress lost on Ctrl+C is now correctly communicated to the user. The resume hint also echoes the --state-dir if one was supplied so copy-paste resume works. * 3.4 — new --state-dir flag on both `send` and `resume` subcommands. resolve_state_file() builds the canonical "transfer_<id>.json" path under the chosen directory (auto-created on demand) or falls back to the historical CWD-relative default. Resolves the "users who cd-ed between failure and resume got cryptic 'State file not found'" issue from the review. Unit tests resolve_state_file_honours_explicit_state_dir, resolve_state_file_defaults_to_cwd_when_state_dir_absent, and the integration-shaped finds_state_file_via_state_dir_flag_from_unrelated_cwd added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…; atomic known_peers flush; cargo fmt Addresses Copilot review #4353138790: - protocol.rs: doc the actual `[u64 LE index | u8 flags | payload]` wire format and `FLAG_COMPRESSED = 0x01` semantics (not just `config.compression_enabled`). - traversal/mod.rs: doc now describes symmetric-NAT path as `want_relay = true` + Relay outcome (not the early `Error::HolePunchFailed` it claimed before). - known_peers.rs: `flush` now writes to `*.tmp`, `fsync`s, then renames — matches the module-doc crash-safety claim. - workspace-wide `cargo fmt` to unblock the Linux CI check that fail-cancelled Windows and macOS. The "auto_accept is silently ignored" finding was already fixed in 27c8191; the "DEFAULT_CHUNK_SIZE = 65536" finding in p2p-core/AGENTS.md was already fixed in a follow-up — both verified against current HEAD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds scripts/deploy.py — a stdlib-only Python 3 installer for Ubuntu 24+ with install / uninstall / clean-build subcommands. Each step checks current state before acting (dpkg -s for apt deps, SHA256 compare for the binary, byte compare for the systemd unit) so the service is only restarted when something actually changed and re-runs don't drop in-flight pairings. `--prune-build` and `clean-build` reclaim disk by wiping <dest>/target/ after a successful deploy; a later install just rebuilds from scratch. README and AGENTS notes updated.
* progress: reset elapsed/ETA on the 0→real total_bytes transition so
bytes/sec doesn't include the interactive y/N wait.
* framing: map peer-close io::Error kinds (ConnectionAborted, etc.) to
Error::Disconnected so the receive loop sees a graceful EOF instead
of dying with "Network error: connection lost".
* cli: extract a single establish_session() helper in rendezvous.rs and
funnel send/receive/resume through it — kills the duplicated
`if is_rendezvous_mode { ... } else { ... }` blocks across handlers.
* receive: on disconnect, branch on the original pairing mode.
Rendezvous-paired sessions re-pair through the rendezvous (the QUIC
role is decided by a UUID compare post-pair, so reaccept() is
structurally wrong half the time). Direct-mode sessions still use
reaccept() to keep the user's --port stable.
* resume: flatten SessionParams into the Resume CLI command so resume
works cross-NAT via --rendezvous/--code. Drops the old --to flag in
favour of --peer (matches send/receive). No back-compat shim.
* test: tests/rendezvous_disconnect_resume_test.rs drives both fixes
end-to-end — receiver paired via rendezvous survives the first
sender disconnecting and accepts a second sender that arrives via
handle_resume() with --rendezvous/--code.
Doc updates for the new resume flags in README.md and AGENTS.md.
… perf Sweep across the workspace pruning code that no longer earns its keep and tightening the per-chunk hot path. Bumps PROTOCOL_VERSION to 3 + ALPN to p2pf/3 since the wire format loses the `capabilities` field from HelloMessage and DiscoveryBeacon (no compat shim, per project policy). Removed: - p2p-core: orphan transfer.rs stub, unused config.rs (TCP-era rot), Capabilities struct + intersect/has_compression negotiation, ConnectionRole enum (collapsed into initiator_target Option), P2PSession::establish string dispatch (replaced by typed connect/accept + parse_peer_addr / discover_one_peer helpers), is_alive/file_size stubs and dead FileTransferSession fields. - p2p-gui: styles.rs palette, 10+ never-emitted Message variants (StartReceive, ReceiveComplete/Failed, ListenerWaiting/Active, TransferStarted/InProgress/Completed/Error, ProgressUpdate, RefreshHistory, AdaptiveCompressionToggled) plus cascading dead state fields, fixed setup_receive `bytes_received = 0` TODO to thread the real cumulative byte count through run_gui_receive_loop. - Cargo: unused deps (log, env_logger, anyhow at root and core, indicatif/ local-ip-address/futures in core, bogus [package.metadata] uuid block, log from gui), unused profile.release-small. - smoke: collapsed stress.sh/_v2/_v3 iterations down to the v4 version (renamed to stress.sh). - docs: merged still-relevant content from .github/copilot-instructions.md (test pipeline, refactor/feature/bugfix workflow) into AGENTS.md and removed the copilot file. Hot-path perf (transfer_file.rs): - completed_chunks slice → HashSet for O(1) skip lookup on resume. - chunk header packed into one 9-byte write_all (was three calls). - uncompressed receive path no longer allocates payload.to_vec(). - ChunkWriter::finalize re-read buffer 64 KiB → 1 MiB. - Message::TransferInfo(Box<TransferInfo>) so enum stops paying max-variant cost on every recv. - Iterative VecDeque-based folder walk (was Box::pin per directory). - Hoisted chunk_count(file_size, chunk_size) helper (3 copies → 1). - Collapsed display_transfer_stats into a single info!-per-line flow. Architectural: - FolderTransferState embeds ConfigMessage directly instead of mirroring five scalars and rebuilding via to_config_message(). cargo build --release / test --all (119 pass, one pre-existing parallel- port flake in rendezvous_disconnect_resume_test) / clippy --all-targets --all-features -D warnings / fmt --check all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
@claude review |
… test The rendezvous resume test (and any local-dev pairing through a loopback `rendezvousd`) was hitting Google STUN servers and getting back a NAT-mapped UDP port that the rendezvous then paired with a `127.0.0.1` source IP. The resulting punch target — `127.0.0.1` plus a STUN-mapped port — never reached the local socket, so the QUIC handshake timed out after 30s. macOS CI surfaced this every run; Linux coverage saw it intermittently under tarpaulin. - `establish_via_rendezvous`: when the rendezvous IP is loopback, skip STUN entirely and use the bound socket address as the public endpoint. The rendezvous still stamps the TCP source IP, so the paired target becomes `127.0.0.1:<actual-local-port>` — i.e. the peer's real socket. - `rendezvous_disconnect_resume_test`: now that STUN no longer takes 3–6s per side, the receiver/sender register race is exposed. Sleep 200ms after `spawn_receiver` so the receiver lands in the rendezvous waiter slot first, and 500ms before phase 2 so the receiver's post-disconnect re-register completes before the resume sender arrives. Test runtime drops from ~30s timeout to <1s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the four-phase QUIC rewrite plus the 16-finding security/robustness audit on
develop.p2p-rendezvouscrate +rendezvousdbinary;--rendezvous/--codeCLI flags; STUN-based NAT classification and UDP hole punching viarace_connect_and_accept.rendezvousd --relay-bindUDP forwarder for symmetric-NAT pairs; QUIC TLS still terminates end-to-end (the relay sees ciphertext only).Command::perform),nat-test --rendezvousself-loop punch.u64chunk indices, bounds-checkedchunk_index, hard receiver SHA-256, path-traversal sanitizer, mutual TLS withcross_check_fingerprintenforcement, address-validatedaccept_from, STUN tx-id check, rendezvous concurrency cap + TCP-sourced public IP, relay slot pre-binding + 65 KiB recv buffer + off-hot-path idle eviction, typedError::Disconnectedframing.AGENTS.md(including a newp2p-rendezvous/AGENTS.md) all updated.Per the project's "no backwards compatibility on redesigns" rule, the four phases and the audit fixes change wire formats and call sites in place — no compat shims.
Test plan
cargo build --workspacecleancargo clippy --workspace --all-targets -- -D warningscleancargo test --workspacegreen (65 p2p-core unit + 7 p2p-rendezvous unit + 2 p2p-gui unit + 3 integration + 3 doc tests)p2p-transferinstances on loopback exchange a multi-MB file; both SHA-256s matchrendezvousd(direct punch)rendezvousd --relay-bindwith both clients forced into relay mode🤖 Generated with Claude Code