feat(planner): add replace-pod task to NodeUpdate plan (closes #211)#216
Merged
feat(planner): add replace-pod task to NodeUpdate plan (closes #211)#216
Conversation
K8s native StatefulSet RollingUpdate refuses to delete pods that are not Ready, which deadlocks the rollout when seid is intentionally unready — the canonical case is halted at a chain upgrade height awaiting a binary swap. The NodeUpdate plan would loop forever in observe-image until a human ran `kubectl delete pod`. Adds replace-pod task between apply-service and observe-image. It lists pods owned by the StatefulSet, identifies any whose controller-revision-hash != sts.Status.UpdateRevision, and deletes them. StatefulSet then recreates the pods at the new revision via its standard creation path, which doesn't gate on readiness. Idempotent: pods at update revision are skipped, terminating pods are skipped, missing StatefulSet returns transient retry. Tested: - Stale-revision pod present → deleted, task complete - Already at update revision → no-op, task complete - Terminating pod → skipped (not double-deleted) - StatefulSet missing → transient wait - UpdateRevision empty (status not yet populated) → transient wait Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- RBAC: add `delete` verb on pods (was get;list;watch only — Delete call would fail with forbidden in production); regenerate role.yaml. - ObservedGeneration gate: wait until STS controller has observed our spec before consulting CurrentRevision/UpdateRevision; without this, stale revisions could cause spurious deletes. - Multi-replica safeguard: fail Terminal if replicas>1 (we'd violate reverse-ordinal RollingUpdate semantics; today every SeiNode is replicas=1). - Owner-reference check: skip pods that match selector by label but aren't actually owned by the StatefulSet. - Skip pods missing the controller-revision-hash label entirely (would otherwise be classified "stale" and deleted). - Tests: add coverage for stale ObservedGeneration, missing hash label, non-owned pod, and multi-replica cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Trim Execute docstring + remove what-comments inside the function body (per repo guidance: comments only when WHY is non-obvious). - Replace test literals "default" / "node-1" with package-local consts; drop unused UID/ChainID/Image to remove goconst hits the PR introduced. - Drop unused string param from podForReplace; replace ptrBool helper with inline bool var (modernize). - gofmt: spaces around concat operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Kubernetes-specialist review: with the previous default RollingUpdate strategy, replace-pod was working *around* the STS controller's "don't disturb unready pods" guardrail. The idiomatic K8s pattern for "controller manages pod lifecycle, StatefulSet manages identity" is OnDelete. Sets `Spec.UpdateStrategy.Type = OnDeleteStatefulSetStrategyType` in the generated StatefulSet. With OnDelete: - Pod replacement is *always* explicit, via replace-pod - No race with STS rolling logic — STS won't touch pods on its own - replace-pod becomes the natural counterpart to the spec change, not an escape hatch around a guardrail Behavior is unchanged on first apply (no template change → no rollout). Existing pods continue running until next image bump triggers replace-pod, exactly as today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
62ba709 to
26b3ee8
Compare
…ses Bugbot race) Tasks running in the same reconcile after apply-statefulset (which does SSA on the StatefulSet) were reading from the controller-runtime cached client, which lags etcd by milliseconds-to-unbounded. The freshness gate on (ObservedGeneration < Generation) doesn't fire when the entire cached object reflects pre-apply state — both fields equal at the old value. Result: replace-pod sees CurrentRevision == UpdateRevision (both old), marks Complete with no deletion. observe-image sees UpdatedReplicas == Replicas (both pre-update) and stamps currentImage on the SeiNode. Plan completes green, pod stays on old image. Worse than pre-PR. Adds APIReader (uncached, direct-to-API-server) to ExecutionConfig. replace-pod and observe-image use APIReader for their StatefulSet Get; all other reads stay on the cached client. APIReader wired in cmd/main.go from mgr.GetAPIReader(). Test fixtures pass the same fake client for both fields (it satisfies both interfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6ed60e5. Configure here.
This was referenced May 9, 2026
bdchatham
added a commit
that referenced
this pull request
May 9, 2026
…e rollouts) (#217) Pulls in #216 — NodeUpdate plans now include a `replace-pod` task that proactively deletes pods at the old StatefulSet revision. StatefulSets generated by this controller now use OnDelete update strategy, so pod lifecycle is the SeiNode controller's responsibility (replace-pod). This unblocks the chain-upgrade rollout pattern where seid is intentionally unready (halted at upgrade height) and native RollingUpdate refuses to disturb it. Closes #211. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Closes #211.
Summary
K8s native StatefulSet
RollingUpdaterefuses to delete pods that are not Ready, which deadlocks the rollout when seid is intentionally unready — the canonical case is halted at a chain upgrade height awaiting a binary swap. The NodeUpdate plan would loop forever inobserve-imageuntil a human rankubectl delete pod. We hit this on 2026-05-08 during pacific-1's archive-0 v6.3.1 → v6.4.3 swap; ~9000 reconcile cycles before manual unstick.This PR inserts a new
replace-podtask betweenapply-serviceandobserve-imagein the NodeUpdate plan:replace-pod:Status.UpdateRevisionSelector.MatchLabels)controller-revision-hashlabel !=UpdateRevisionIdempotent: re-running on a converged StatefulSet (
CurrentRevision == UpdateRevision) is a no-op completion. Missing StatefulSet or emptyUpdateRevisionreturns transient wait so the executor retries on next reconcile.Files
internal/task/replace_pod.go(new)internal/task/replace_pod_test.go(new)internal/task/task.goTaskTypeReplacePodin deserializer registryinternal/planner/planner.gointernal/planner/node_update_test.goCLAUDE.mdTest plan
References
kubectl delete podunstuck🤖 Generated with Claude Code
Note
Medium Risk
Changes rollout semantics for all
SeiNodeStatefulSets by switching toOnDeleteupdates and introducing controller-driven pod deletions; mistakes could cause stuck updates or unintended pod restarts. Also expands RBAC to allow pod deletions and adds cache-bypassing reads that could mask eventual consistency issues if misused.Overview
Fixes
SeiNodeimage rollouts that can deadlock when pods are intentionally NotReady by inserting a newreplace-podstep into the NodeUpdate plan (apply-statefulset → apply-service → replace-pod → observe-image → mark-ready).To support controller-driven rollouts, generated StatefulSets now use
OnDeleteupdate strategy, the newreplace-podtask deletes old-revision pods (single-replica only) to force recreation at the new revision, andobserve-image(plus task execution wiring) now uses an uncachedAPIReaderto read freshly-applied resources. RBAC is updated to grant poddelete, and tests/docs are updated accordingly.Reviewed by Cursor Bugbot for commit aac07fa. Bugbot is set up for automated code reviews on this repo. Configure here.
Note on rollback
If this PR is reverted in the controller, in-cluster StatefulSets will keep
updateStrategy: OnDelete(server-side apply does not reset unset fields to defaults). The pre-PR controller has noreplace-podtask, so image bumps would stall on halted/unready pods exactly as they did before this PR — recoverable, not wedging, but rollback does not auto-restoreRollingUpdate. To fully revert, edit the StatefulSets manually:kubectl patch sts <name> -n <ns> --type=json -p '[{"op":"replace","path":"/spec/updateStrategy/type","value":"RollingUpdate"}]'.