Conversation
0d52887 to
60cc001
Compare
f0f23e8 to
eb8e316
Compare
68a2704 to
d494735
Compare
d494735 to
277e340
Compare
277e340 to
1ae3542
Compare
| @@ -0,0 +1,5 @@ | |||
| tls_server_config: | |||
| cert_file: "/etc/kubernetes/certs/kubeletserver.crt" | |||
There was a problem hiding this comment.
these paths aren't necessarily correct - they depend on whether kubelet serving certificate rotation is enabled - when it's disabled these paths are correct, however when it's enabled both cert_file and key_file should point towards: /var/lib/kubelet/pki/kubelet-server-current.pem
There was a problem hiding this comment.
interesting this is copy paste from the aks-vm-extension repo and what is on my node today. I don't see any where it's touched, just a static file.
root@aks-sys-41317600-vmss000000:/etc/node-exporter.d# cat web-config.yml
tls_server_config:
cert_file: "/etc/kubernetes/certs/kubeletserver.crt"
key_file: "/etc/kubernetes/certs/kubeletserver.key"
client_auth_type: "RequireAndVerifyClientCert"
client_ca_file: "/etc/kubernetes/certs/ca.crt"
i think we could address this in the node-exporter-startup.sh and check for the existence of /var/lib/kubelet/pki/kubelet-server-current.pem and if it exists use it. And if not use /etc/kubernetes/certs/kubeletserver.crt
There was a problem hiding this comment.
that's tricky - kubeelet-server-currnet.pem won't exist until after kubelet requests it from the control plane after CSE exists, though we can check IMDS during CSE to see whether the feature itself will be enabled (like we currently do in configureKubeletServing)
There was a problem hiding this comment.
alrighty. Adjusted the service to wait for kubelet. Startup script to check the imds cache file first. If that's not around then call imds ourselves. And at the very end check if /var/lib/kubelet/pki/kubelet-server-current.pem exists and everything else said no for some reason.
...ux/cloud-init/artifacts/node-exporter/baseline/etc/systemd/system/node-exporter-restart.path
Outdated
Show resolved
Hide resolved
c578ecf to
d15e2e8
Compare
4aa60c9 to
1dca0b4
Compare
1dca0b4 to
206ef20
Compare
206ef20 to
855a38c
Compare
…cker Packer file provisioner cannot upload directories with nested subdirectories. Replace directory upload with individual file entries for all node-exporter baseline files.
… vhd-image* locations, updated test to make sure skip is missing when we want
…sable, startup args issue, imds log, unit description
855a38c to
f922c1c
Compare
| version_info=$(node_exporter_extract_package_version "${package_json}" "ubuntu" "current") | ||
| fi | ||
|
|
||
| IFS=':' read -r NODE_EXPORTER_VERSION NODE_EXPORTER_REVISION NODE_EXPORTER_UBUNTU_VERSION <<< "${version_info}" | ||
|
|
There was a problem hiding this comment.
If jq/sed parsing ever fails (unexpected version format, missing JSON path, etc.), version_info can be empty and the subsequent IFS split will set NODE_EXPORTER_VERSION/REVISION to empty strings, leading to invalid download URLs/paths. Add a sanity check after computing version_info / after the split to fail fast with a clear error when parsing didn’t produce the expected fields.
| # Skip for OSGuard, Flatcar, Kata, and Mariner (only AzureLinux 3.0 gets node-exporter) | ||
| if ! { isAzureLinuxOSGuard "$OS" "$OS_VARIANT" || isFlatcar "$OS" || grep -q "kata" <<< "$FEATURE_FLAGS" || isMariner "$OS"; }; then | ||
| cpAndMode $NODE_EXPORTER_STARTUP_SRC $NODE_EXPORTER_STARTUP_DEST 755 |
There was a problem hiding this comment.
The inline comment says "only AzureLinux 3.0 gets node-exporter", but this block also runs on Ubuntu (it only skips OSGuard/Flatcar/Kata/Mariner). Please update the comment to match the actual install/copy behavior so future readers don't assume Ubuntu is excluded.
| # Skip check for OS variants that don't have node-exporter, but verify the skip file is NOT present | ||
| # Mariner/CBLMariner is skipped - only AzureLinux 3.0 gets node-exporter | ||
| if [ "$os_sku" = "AzureLinuxOSGuard" ] || [ "$os_sku" = "Flatcar" ] || [ "$os_sku" = "CBLMariner" ] || echo "$FEATURE_FLAGS" | grep -q "kata"; then | ||
| local skip_file_check="/etc/node-exporter.d/skip_vhd_node_exporter" |
There was a problem hiding this comment.
The comment says Mariner/CBLMariner is skipped and "only AzureLinux 3.0 gets node-exporter", but this test (and the VHD build scripts) also expect node-exporter on Ubuntu. Please correct the comment to reflect the actual supported OSes to avoid confusion when adjusting the skip logic later.
| # Skip for Flatcar, OSGuard, Kata, and Mariner (we only build AzureLinuxV3 now, mariner entry removed from components.json) | ||
| if isFlatcar "$OS" || isAzureLinuxOSGuard "$OS" "$OS_VARIANT" || [ "${IS_KATA}" = "true" ] || [ "$OS" = "$MARINER_OS_NAME" ]; then | ||
| echo "Skipping node-exporter installation for ${OS} ${OS_VARIANT:-default} (IS_KATA=${IS_KATA})" | ||
| else |
There was a problem hiding this comment.
This comment implies node-exporter is only for AzureLinuxV3/and that the Mariner entry was removed, but the "node-exporter" component is also defined for Ubuntu in components.json and is installed on Ubuntu builds. Please update the comment so it doesn't contradict the actual behavior.
| echo "WARNING: No kubelet serving certs found after ${WAIT_TIMEOUT}s, node-exporter will run without TLS. Restart the service after certs are available to enable TLS." | ||
| fi | ||
|
|
||
| # Configure TLS if we found valid cert paths | ||
| if [ -n "$CERT_FILE" ] && [ -n "$KEY_FILE" ]; then | ||
| cat > "$TLS_CONFIG_PATH" <<EOF | ||
| tls_server_config: | ||
| cert_file: "$CERT_FILE" | ||
| key_file: "$KEY_FILE" | ||
| client_auth_type: "RequireAndVerifyClientCert" | ||
| client_ca_file: "/etc/kubernetes/certs/ca.crt" | ||
| EOF | ||
| TLS_CONFIG_ARG="--web.config.file=${TLS_CONFIG_PATH}" | ||
| fi | ||
|
|
||
| ARGS=( | ||
| --web.listen-address="${NODE_IP}:19100" | ||
| --no-collector.wifi | ||
| --no-collector.hwmon | ||
| --collector.cpu.info | ||
| --collector.filesystem.mount-points-exclude="^/(dev|proc|sys|run/containerd/.+|var/lib/docker/.+|var/lib/kubelet/.+)($|/)" | ||
| --collector.netclass.ignored-devices="^(azv.*|veth.*|[a-f0-9]{15})$" | ||
| --collector.netclass.netlink | ||
| --collector.netdev.device-exclude="^(azv.*|veth.*|[a-f0-9]{15})$" | ||
| --no-collector.arp.netlink | ||
| ) | ||
|
|
||
| if [ -n "$TLS_CONFIG_ARG" ]; then | ||
| ARGS+=("$TLS_CONFIG_ARG") | ||
| fi | ||
|
|
||
| exec /opt/bin/node-exporter "${ARGS[@]}" |
There was a problem hiding this comment.
node-exporter-startup.sh falls back to starting node-exporter without any TLS or client authentication if kubelet serving certs are not found within the wait timeout. In that case, node-exporter listens on ${NODE_IP}:19100 with plaintext HTTP, exposing detailed node metrics to any client that can reach that IP/port (e.g., pods or VNet peers), which enables host reconnaissance and information disclosure. To avoid this, consider failing or delaying service startup until TLS certs are available (or binding only to localhost in the fallback) so that node-exporter is never exposed unauthenticated on the network.
What this PR does / why we need it:
this is adding node-exporter into the vhdbuild by default. At the end of the vhdbuild we systemctl disable to allow a fresh start during cse letting the node-exporter-startup script run and gather up node specific details needed.
Which issue(s) this PR fixes:
Fixes #