Follow-up to the Copilot review on #1775 (feat(agent): implement certificate renewal). The PR is merged as-is because none of the items below affect the happy path, but they should be cleaned up.
Items
[P0] Atomic write for renewed cert + CA — devolutions-agent/src/tunnel.rs:521
Why this matters: the current std::fs::write truncates in place. A crash, power loss, or full disk between the two writes leaves the client cert / CA file half-written; future mTLS handshakes then fail and the agent cannot recover remotely — admin has to re-enroll the machine on-site. Happy path is fine; only the failure mode is bad, and the failure mode is expensive.
[P1] Don't fail the connection on local CSR-prep errors — devolutions-agent/src/tunnel.rs:498
Why this matters: the doc comment on try_renew_certificate says only control-stream IO / protocol errors should be Err, but right now a transient local file read failure (e.g. permission blip) bubbles up to run_single_connection's Err branch and forces a backoff/reconnect on an otherwise healthy QUIC connection. Should be a recoverable "skip renewal, keep using existing cert" path.
[P2] Unit tests for new renewal helpers — devolutions-agent/src/enrollment.rs:248-318
[P3] Fix misleading comment — devolutions-agent/src/enrollment.rs:282
Original review
#1775 (review)
Follow-up to the Copilot review on #1775 (
feat(agent): implement certificate renewal). The PR is merged as-is because none of the items below affect the happy path, but they should be cleaned up.Items
[P0] Atomic write for renewed cert + CA —
devolutions-agent/src/tunnel.rs:521std::fs::write(cert_path, ...)/std::fs::write(ca_path, ...)calls with a write-temp-then-rename sequence (same directory, thenstd::fs::rename).0o600on the temp file before rename — match whatpersist_enrollment_responsealready does for the initial enrollment.Why this matters: the current
std::fs::writetruncates in place. A crash, power loss, or full disk between the two writes leaves the client cert / CA file half-written; future mTLS handshakes then fail and the agent cannot recover remotely — admin has to re-enroll the machine on-site. Happy path is fine; only the failure mode is bad, and the failure mode is expensive.[P1] Don't fail the connection on local CSR-prep errors —
devolutions-agent/src/tunnel.rs:498try_renew_certificate, replace the?onread_agent_name_from_certandgenerate_csr_from_existing_keywithmatcharms that log a warning and returnOk(None).Why this matters: the doc comment on
try_renew_certificatesays only control-stream IO / protocol errors should beErr, but right now a transient local file read failure (e.g. permission blip) bubbles up torun_single_connection'sErrbranch and forces a backoff/reconnect on an otherwise healthy QUIC connection. Should be a recoverable "skip renewal, keep using existing cert" path.[P2] Unit tests for new renewal helpers —
devolutions-agent/src/enrollment.rs:248-318is_cert_expiring: generate a short-lived rcgen cert, assert true above threshold and false below.read_agent_name_from_cert: round-trip a known CN.generate_csr_from_existing_key: assert the produced PEM parses back viarcgen::CertificateSigningRequestParams::from_pemand that the public key matches the input key.[P3] Fix misleading comment —
devolutions-agent/src/enrollment.rs:282read_agent_name_from_cert. The current text implies the gateway looks the agent up in its registry byCommonName, butAgentRegistryis keyed byagent_id(UUID, extracted from SANurn:uuid:). The real reason to reuse the CN in the renewal CSR is just to keep the issued cert's CN stable across renewals — the gateway ignores the CSR subject anyway.Original review
#1775 (review)