[CI-1951] fix(podiprecovery): add Pod watch so recovery is level-triggered by coutinhop · Pull Request #4987 · tigera/operator

coutinhop · 2026-07-03T17:27:38Z

Recovery was edge-triggered on the Node watch alone: the reconcile fired only when a node's host IPs changed. On a KubeVirt VM reboot the node's new IP is reported promptly, but the node's host-networked pods are still restarting at that instant with empty status.podIPs, so the reconcile skips them. Seconds later a surviving pod comes back reporting its old, now-stale IP (Kubernetes never refreshes status.podIPs for a surviving hostNetwork pod) — but the node's host IP has already settled, no further Node event fires, and the stale pod is never re-evaluated. The earlier autoscaler-tick approach did not have this gap because it re-checked on every tick.

Add a second watch on operator-managed host-networked Pods that re-enqueues a pod's node when the pod settles into a state where its status.podIPs can be judged — its IPs appear/change, or it becomes Ready. Both watches funnel into the same node-keyed, idempotent Reconcile, so recovery is now level-triggered on both inputs to its decision (node addresses and pod IPs) while staying event-driven — no return to polling.

The predicate is gated on the host-networked marker label plus spec.hostNetwork so event volume stays to the handful of such pods cluster-wide.

Adds unit tests for the pod-settle predicate (create/update/delete, label and hostNetwork gating, IPs-appear and became-Ready transitions, steady-state no-op) and the podToNode mapping.)

Description

Release Note

TBD

For PR author

Tests for change.
If changing pkg/apis/, run make gen-files
If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

Milestone set according to targeted release.
Appropriate labels:
- kind/bug if this is a bugfix.
- kind/enhancement if this is a a new feature.
- enterprise if this PR applies to Calico Enterprise only.

…gered Recovery was edge-triggered on the Node watch alone: the reconcile fired only when a node's host IPs changed. On a KubeVirt VM reboot the node's new IP is reported promptly, but the node's host-networked pods are still restarting at that instant with empty status.podIPs, so the reconcile skips them. Seconds later a surviving pod comes back reporting its old, now-stale IP (Kubernetes never refreshes status.podIPs for a surviving hostNetwork pod) — but the node's host IP has already settled, no further Node event fires, and the stale pod is never re-evaluated. The earlier autoscaler-tick approach did not have this gap because it re-checked on every tick. Add a second watch on operator-managed host-networked Pods that re-enqueues a pod's node when the pod settles into a state where its status.podIPs can be judged — its IPs appear/change, or it becomes Ready. Both watches funnel into the same node-keyed, idempotent Reconcile, so recovery is now level-triggered on both inputs to its decision (node addresses and pod IPs) while staying event-driven — no return to polling. The predicate is gated on the host-networked marker label plus spec.hostNetwork so event volume stays to the handful of such pods cluster-wide. Adds unit tests for the pod-settle predicate (create/update/delete, label and hostNetwork gating, IPs-appear and became-Ready transitions, steady-state no-op) and the podToNode mapping.)

coutinhop requested a review from caseydavenport July 3, 2026 17:27

coutinhop self-assigned this Jul 3, 2026

coutinhop requested a review from a team as a code owner July 3, 2026 17:27

marvin-tigera added this to the v1.44.0 milestone Jul 3, 2026

marvin-tigera added docs-pr-required release-note-required labels Jul 3, 2026

coutinhop added release-note-not-required docs-not-required and removed docs-pr-required release-note-required labels Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI-1951] fix(podiprecovery): add Pod watch so recovery is level-triggered#4987

[CI-1951] fix(podiprecovery): add Pod watch so recovery is level-triggered#4987
coutinhop wants to merge 1 commit into
tigera:masterfrom
coutinhop:pedro-CI-1951-2

coutinhop commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

coutinhop commented Jul 3, 2026

Description

Release Note

For PR author

For PR reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants