fix(nico-dhcp): dual-name kea hook params + require operator IPs#2270
Open
shayan1995 wants to merge 2 commits into
Open
fix(nico-dhcp): dual-name kea hook params + require operator IPs#2270shayan1995 wants to merge 2 commits into
shayan1995 wants to merge 2 commits into
Conversation
Contributor
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The kea DHCP hook library (C++ in crates/dhcp/src/kea/loader.cc) reads parameters by their pre-rename names — carbide-api-url, carbide-nameservers, carbide-ntpserver, carbide-provisioning-server-ipv4, carbide-metrics-endpoint. Commit a91aea0 renamed every key the chart writes to the nico-* form without rebuilding the binary. The result on any cluster running the modern chart with a pre-rename DHCP hook binary: getParameter returns null, the binary falls back to its hardcoded localhost defaults (api at [::1]:1079, nameservers/ntpserver at 127.0.0.1, etc.), every hook callout fails with "tcp connect error: ConnectionRefused". Symptom in the nico-dhcp pod log: ERROR rpc::forge_tls_client - error connecting client to forge api (url: https://[::1]:1079) INFO forge_http_connector::connector - connect error for [::1]:1079: ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused }) Fix mirrors the dual-name pattern from PR NVIDIA#2249 (cert SANs and ExternalName Service aliases): the chart writes both spellings of every hook parameter with identical values. Old binaries find their carbide-* keys; rebuilt binaries find their nico-* keys; both pick up the operator-supplied value rather than localhost defaults. Drop the carbide-* mirror once every consuming binary has been rebuilt to read nico-*. A template-side Go comment documents this. The chart's `nameservers`, `ntpServer`, `provisioningServer` defaults are also changed from "127.0.0.1" placeholders to obvious "REPLACE_WITH_..." strings. The old 127.0.0.1 fallback was syntactically valid IP that produced a silently-non-functional cluster when operators forgot to override; the new placeholders are loud at kea startup so a missed override is obvious. The helm-prereqs example site values file already overrides all three with real example VIPs, so a vanilla setup.sh run is unaffected.
ade8660 to
3bd84c3
Compare
…_mode
The `model::site_explorer::EndpointExplorationReport` struct gained a
required `remediation_error` field upstream, and every other test
construction site in the repo (host_bmc_firmware_test.rs,
site_explorer.rs at 9 sites, etc.) was updated to set it. The
`host_bmc_report()` helper in preingestion_dpu_nic_mode.rs was missed,
leaving every CI build that includes this test failing with:
error[E0063]: missing field `remediation_error` in initializer of
`model::site_explorer::EndpointExplorationReport`
--> crates/api-core/src/tests/preingestion_dpu_nic_mode.rs:53:5
This is a pre-existing upstream issue (still present on main as of the
rebase that produced this branch). Fixing it here as a drive-by so the
chart change in the preceding commit can land — CI is otherwise blocked
on the same break for any branch built from current main.
Sets remediation_error to None to match the convention used by every
other call site.
thossain-nv
approved these changes
Jun 5, 2026
Contributor
thossain-nv
left a comment
There was a problem hiding this comment.
Looks good, thanks @shayan1995
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The kea DHCP hook library (C++ in crates/dhcp/src/kea/loader.cc) reads parameters by their pre-rename names — carbide-api-url, carbide-nameservers, carbide-ntpserver, carbide-provisioning-server-ipv4, carbide-metrics-endpoint.
Commit a91aea0 renamed every key the chart writes to the nico-* form without rebuilding the binary. The result on any cluster running the modern chart with a pre-rename DHCP hook binary: getParameter returns null, the binary falls back to its hardcoded localhost defaults (api at [::1]:1079, nameservers/ntpserver at 127.0.0.1, etc.), every hook callout fails with "tcp connect error: ConnectionRefused".
Symptom in the nico-dhcp pod log:
ERROR rpc::forge_tls_client - error connecting client to forge api
(url: https://[::1]:1079)
INFO forge_http_connector::connector - connect error for [::1]:1079:
ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused })
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
helm templatefails with the operator-facing error message;helm lintflags the same warnings.Additional Notes