Skip to content

feat(dkg): implement sync#311

Open
iamquang95 wants to merge 14 commits intomainfrom
iamquang95/dkg/sync
Open

feat(dkg): implement sync#311
iamquang95 wants to merge 14 commits intomainfrom
iamquang95/dkg/sync

Conversation

@iamquang95
Copy link
Copy Markdown
Collaborator

Should address #133

@iamquang95
Copy link
Copy Markdown
Collaborator Author

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @iamquang95's task in 6m 16s —— View job


Code Review

  • Gather context and understand PR scope
  • Read changed files
  • Compare against Go reference implementation
  • Review sync module files
  • Review p2p proto changes
  • Compile final review

Summary

This PR implements the DKG step-synchronization protocol (/charon/dkg/sync/1.0.0/) in Rust, porting the Go Charon dkg/sync package. The overall structure is sound — Server/Client/Handler/Behaviour decomposition mirrors the Go architecture, and the proto wire format matches. There are a few correctness issues worth addressing before merge, plus some lower-priority observations.


Findings

[High] Step is updated for peers that fail validation

In handle_inbound_stream, after validate_request_with_public_key fails the code sets the server error and response error string, but then falls through to server.update_step(peer_id, message.step) and continues the loop to read the next message.

Evidence: crates/dkg/src/sync/handler.rs:365-416

if let Err(error) = protocol::validate_request_with_public_key(...) {
    // sets server error, response.error
} else {
    server.mark_connected(peer_id).await;
}

// ← this runs even after validation failure
if server.update_step(peer_id, message.step).await? {
    send_inbound_event(...);
}
// loop continues - next read will also come from the invalid peer

A peer that sends an invalid hash signature (wrong cluster) can still advance the step tracker and the loop keeps reading from that peer. The Go reference returns from the handler after an invalid message. The function should break (or return Err) immediately after setting the error and writing the error response. Fix this →


[Medium] Protocol ID has trailing slash — verify against Go

protocol.rs:11:

pub const PROTOCOL_NAME: libp2p::StreamProtocol = StreamProtocol::new("/charon/dkg/sync/1.0.0/");

The trailing / needs to exactly match what the Charon Go server advertises. libp2p protocol negotiation is byte-exact. If the Go side uses /charon/dkg/sync/1.0.0 (no trailing slash), all protocol negotiations will fail silently — the handler is never invoked, connections appear established but no sync traffic flows. Please confirm against the Go source. Fix this →


[Medium] await_all_at_step does not check started — can deadlock before Server::start()

await_all_connected guards against this with an is_started() check (server.rs:74), but await_all_at_step (server.rs:120) has no such guard. If called before start(), it spins on the notify but the started-notify never fires, so cancellation is the only exit. This likely doesn't matter in practice (DKG won't await steps before starting), but it's an asymmetry worth documenting or guarding. Fix this →


[Low] Redundant explicit arms before wildcard in on_connection_event

handler.rs:279-285:

ConnectionEvent::AddressChange(_)
| ConnectionEvent::LocalProtocolsChange(_)
| ConnectionEvent::RemoteProtocolsChange(_) => {}
ConnectionEvent::ListenUpgradeError(_) => {}
_ => {}    // ← makes the lines above unreachable

The named arms are unreachable because _ => {} catches everything. Since ConnectionEvent is #[non_exhaustive] you need the wildcard — the named arms can be removed. Fix this →


[Low] inbound_events_tx error logged at error! level, but handler owns both ends

handler.rs:426-428: If inbound_events_tx.send() fails it logs error!. Since the handler holds both the sender and receiver for the channel's lifetime, this can only happen after the handler is dropped — at which point the error is silently swallowed anyway. A debug! level or silent ignore would be less alarming. Fix this →


[Low] run_outbound_stream does not flush on shutdown

handler.rs:336-338: When shutdown=true the final MsgSync is sent and the function returns GracefulShutdown immediately after reading the response. The stream is then dropped. On the write path write_fixed_size_protobuf does flush, so the actual bytes make it to the peer — this is fine. Just confirming this is intentional and the inbound side reads the shutdown flag and terminates gracefully.


Parity Matrix

Component Go (charon/dkg/sync) Rust (pluto/crates/dkg/src/sync) Match Notes
Protocol ID /charon/dkg/sync/1.0.0 /charon/dkg/sync/1.0.0/ Unverified Trailing slash needs confirmation
Wire framing i64 LE length prefix i64 LE length prefix Yes write_fixed_size_protobuf
Proto schema MsgSync, MsgSyncResponse Generated from same .proto Yes Fields match
Step validation (initial) 0 or 1 0 or 1 Yes server.rs:227
Step validation (jump) current+1 or current+2 current+1 or current+2 Yes server.rs:218
Hash sig verification secp256k1 / libp2p sign secp256k1 / libp2p sign Yes protocol.rs:44
Version check major.minor exact full semver == Yes SemVer is major.minor only
Shutdown flow send shutdown flag, wait ACK send shutdown=true, read response Yes handler.rs:296-338
Error propagation (invalid msg) break loop continues loop No High severity, see above

Tests

Tests were not run (no cargo test permission). Unit tests cover:

  • update_step_rules_match_go — validates step acceptance rules
  • sync_round_trip_matches_go_shape — 3-node integration test across real TCP
  • active_client_requests_dial — behaviour poll emits Dial
  • connection_closed_keeps_client_state_until_last_connection — multi-connection tracking

Missing coverage:

  • Validation failure path (hash sig mismatch) — the high-severity bug above is untested
  • relay_reset reconnect path via on_dial_upgrade_error
  • Protocol negotiation failure (Unsupported variant)

Open Questions

  1. Trailing slash in protocol ID — Can you confirm against charon/dkg/sync/? The Go source either has or doesn't have a trailing slash and it must match byte-for-byte.
  2. Wire format — Was the i64 LE framing verified against the Charon Go dkg/sync package? (Other Charon p2p protocols use the standard varint framing; DKG sync is an exception worth calling out explicitly.)
  3. step=0 initial value — The client initializes step: AtomicI64::new(0) (client.rs:79) and begins sending step=0 immediately. The Go sync sends step=0 until the caller advances it. Is step 0 meant to mean "not yet started"? The test in mod.rs starts at step 0 but the example starts advancing from step 1 — this asymmetry should be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant