Skip to content

collector: Add link_layer label to all per-port infiniband metrics#3666

Open
pallamidessi wants to merge 1 commit into
prometheus:masterfrom
pallamidessi:infiniband-port-info-link-layer
Open

collector: Add link_layer label to all per-port infiniband metrics#3666
pallamidessi wants to merge 1 commit into
prometheus:masterfrom
pallamidessi:infiniband-port-info-link-layer

Conversation

@pallamidessi
Copy link
Copy Markdown

Adds link_layer ("InfiniBand" or "Ethernet") as a label on every per port metric exposed by the infiniband collector

Context

Modern Mellanox/NVIDIA NICs (ConnectX-6/7, BlueField, B200 platforms) routinely present some ports in Ethernet link layer (RoCE / generic Ethernet) while others run pure InfiniBand. Both modes appear under /sys/class/infiniband/* and currently emit indistinguishable node_infiniband_* series, so operators running mixed fleets cannot scope alerts (link_downed_total, link_error_recovery_total, etc.) easily

link_layer is already parsed by procfs: prometheus/procfs#700

Testing

Fixture i40iw0/ports/1/link_layer is set to "Ethernet" (iWARP runs over Ethernet by definition) so the e2e test exercises both link_layer values.

Alternative design

First time contributor here, as an alternative design, we can add a new node_infiniband_port_info{device, port, link_layer} that could be used with a promQL join and is a more additive/contained change instead.

@discordianfish

…port metric exposed by the infiniband collector

## Context
Modern Mellanox/NVIDIA NICs (ConnectX-6/7, BlueField, B200 platforms) routinely present some ports in Ethernet link layer (RoCE / generic Ethernet) while others run pure InfiniBand. Both modes appear under /sys/class/infiniband/* and currently emit indistinguishable node_infiniband_* series, so operators running mixed fleets cannot scope alerts (link_downed_total, link_error_recovery_total, etc.) easily

`link_layer` is already parsed by [prometheus/procfs/sysfs into InfiniBandPort.LinkLayer](https://github.com/prometheus/procfs/blob/v0.20.1/sysfs/class_infiniband.go#L112)

## Testing
Fixture i40iw0/ports/1/link_layer is set to "Ethernet" (iWARP runs over Ethernet by definition) so the e2e test exercises both link_layer values.

Signed-off-by: Joseph Pallamidessi <joseph.pallamidessi@fluidstack.io>
@pallamidessi pallamidessi changed the title infiniband: Add link_layer label to all per-port metrics collector: Add link_layer label to all per-port infiniband metrics May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant