fix(e2e): strict wireserver validation — fail fast on unexpected curl exits#8580
fix(e2e): strict wireserver validation — fail fast on unexpected curl exits#8580r2k1 wants to merge 1 commit into
Conversation
… exits The previous validation retried for 1 minute, passing if curl eventually timed out (exit 28). This had two problems: 1. Silently accepted other "success-looking" exit codes (e.g. 0 = reachable) if they happened on the last poll iteration in earlier variants. 2. Retried through what is fundamentally a binary security check — any successful curl from a pod means the FORWARD DROP/REJECT rules are missing or wrong, which is a regression to surface immediately, not a transient condition to wait out. Changes: - Whitelist exit codes 28 (FORWARD DROP timeout) and 7 (FORWARD REJECT refused) as the only valid "wireserver blocked" signals. - Anything else fails loudly with full diagnostics: FORWARD chain, KUBE-FORWARD chain, iptables-save filter, and conntrack entries for the wireserver IP. - Retry the exec call only on transient kube-apiserver exec failures, never on the curl result itself — a single observation of an unexpected exit code is enough to fail the security check. This is strictly more defensive than the original (which only accepted exit 28) because it also accepts REJECT-based blocks, while failing on every other class of regression instead of swallowing them. Extracted from #8480. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR tightens the e2e security validator that ensures unprivileged pods cannot reach the Azure WireServer IP (168.63.129.16), changing it from “retry until timeout exit code appears” to “fail fast on any unexpected curl result,” while still retrying transient Kubernetes exec failures.
Changes:
- Accept curl exit codes
28(timeout/DROP) and7(connect failed/REJECT) as the only valid signals that WireServer is blocked. - Stop retrying based on curl outcomes; instead, fail immediately on any other curl exit code.
- Add richer failure diagnostics (FORWARD + KUBE-FORWARD chain, iptables-save filter excerpt, conntrack entries) when an unexpected exit code occurs.
| }, | ||
| } | ||
|
|
||
| allowedExitCodes := map[string]bool{"28": true, "7": true} |
There was a problem hiding this comment.
This would be easier to read with a list of allowable exit codes rather than a map. The constant time lookup benefit we get below doesn't seem worth it given the length of time these tests take to run.
There was a problem hiding this comment.
Converting to slice won't improve much. You will trade it for slightly more comlicated lookup.
Either way it isolated to function and doesn't seem important
What this PR does / why we need it:
Tightens the e2e wireserver-block validator so it fails fast on any unexpected curl exit code from an unprivileged pod, instead of retrying for a minute and only failing if no acceptable exit code ever showed up.
Why
validateWireServerBlockedis a security check: pods must not be able to reach the wireserver IP (168.63.129.16). The previous implementation retried for 1 minute, passing the check if curl eventually returned exit 28 (timeout). Two problems with that:What
This is strictly more defensive than the original (which only accepted exit 28) because it also accepts REJECT-based blocks, while failing on every other class of regression instead of swallowing them.
Scope
e2e/validation.goonly. Test-only change, no product code touched. Extracted from #8480.Which issue(s) this PR fixes:
N/A — test-only hardening, no linked issue.