Skip to content

Drain nodegroup: gracefully handle transient control plane failures (#7550)#8782

Open
abhishek-dalbanjan wants to merge 1 commit into
eksctl-io:mainfrom
abhishek-dalbanjan:feat-leader-changed
Open

Drain nodegroup: gracefully handle transient control plane failures (#7550)#8782
abhishek-dalbanjan wants to merge 1 commit into
eksctl-io:mainfrom
abhishek-dalbanjan:feat-leader-changed

Conversation

@abhishek-dalbanjan

Copy link
Copy Markdown

Description

Fixes #7550

When draining nodegroups, the kubernetes api server might return transient HTTP 5xx errors like etcdserver: leader changed or raft proposal dropped during leadership transitions. Previously, eksctl would fail the drain operation entirely when encountering these errors.

This PR introduces a isTransientControlPlaneError check to recognize these string patterns as recoverable errors inside isEvictionErrorRecoverable. This ensures eksctl correctly retries evicting the pods instead of surfacing an unrecoverable failure to the user.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

When draining nodegroups, the kubernetes api server might return transient HTTP 5xx errors like 'etcdserver: leader changed' or 'raft proposal dropped' during leadership transitions. This commit recognizes these string patterns as recoverable errors so eksctl retries evicting the pods instead of surfacing an unrecoverable failure.

Fixes eksctl-io#7550

Signed-off-by: Abhishek Dalbanjan <abhishekdalbanjan1@gmail.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Hello Abhidalbanjan 👋 Thank you for opening a Pull Request in eksctl project. The team will review the Pull Request and aim to respond within 1-10 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Gracefully handle transient failures "leader changed" from control plane instances

1 participant