feat: soft pod anti-affinity for broker, etcd, and proxy#153
Open
kamir wants to merge 3 commits into
Open
Conversation
…— BUG-0012 Operator-managed broker and etcd StatefulSets shipped with no anti-affinity, so all replicas could schedule onto one node and a single node loss took out the whole quorum. Add a preferred (soft) podAntiAffinity over each StatefulSet's own pod labels keyed on kubernetes.io/hostname. Soft so the single-node KIND demo still schedules every replica; on a multi-node cluster the scheduler spreads them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hook
PLAN-06 iter-7 E-14 / G-007. The chart-templated proxy is the only
multi-replica workload bp-001 controls through this chart (operator-
managed brokers + etcd are tracked in BUG-0012). Defaults match the
soft-anti-affinity shape we want everywhere:
* podAntiAffinity / preferredDuringSchedulingIgnoredDuringExecution,
weight 100, topologyKey kubernetes.io/hostname. Soft so single-
node KIND clusters still schedule both proxy replicas; flip to
requiredDuring in a multi-node production overlay.
* topologySpreadConstraints: [] by default; populate per-cluster
when multi-zone topology is available.
Templates extended:
* proxy-deployment.yaml: added topologySpreadConstraints hook
next to the existing affinity hook (both gated by `with`).
Chart bump 0.4.1 -> 0.4.2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…etcd/proxy
Address review on the soft pod anti-affinity work.
Selector-drift fix: the proxy default podAntiAffinity selector was
hardcoded to app.kubernetes.io/name: kafscale-proxy in values.yaml.
Under nameOverride the pod label becomes <override>-proxy, so the
selector matched nothing and HA was silently disabled with no error.
Move the default into proxy-deployment.yaml, templated from the same
componentSelectorLabels helper that produces the pod labels, applied
only when proxy.affinity is empty so user overrides still win. The
selector now tracks the pod name under nameOverride/fullnameOverride.
Operator unit test (pkg/operator/antiaffinity_test.go): reconcile a
KafscaleCluster with the fake client, fetch the broker and etcd
StatefulSets, and assert the preferred podAntiAffinity term's label
selector EQUALS the pod-template labels and topologyKey ==
kubernetes.io/hostname. That equality is the property that makes the
rule spread its own pods.
Chart template assertion (test/chart/proxy-antiaffinity_test.sh +
make test-chart-antiaffinity): assert the rendered proxy anti-affinity
selector equals the rendered pod labels, including a --set
nameOverride=foo case that would have caught the drift, plus soft-only
and user-override checks.
Comment scrub: replace the internal tracker reference in
softPodAntiAffinity with a self-contained rationale.
Scope: drop the empty topologySpreadConstraints values hook (no default
behavior, no test); possible follow-up. Operator-set affinity is not
yet overridable via the CR (would need new CR API fields + CRD
regen); tracked as a follow-up.
Backward-compat: proxy.affinity default returns to {} and the chart
supplies the default; setting proxy.affinity replaces the default
rather than merging.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add soft (preferred) pod anti-affinity so replicas of the same component avoid co-locating on one node:
pkg/operator), via a sharedsoftPodAntiAffinity(labels)that reuses each workload's pod-template label map as the selector.proxy-deployment.yaml, applied only whenproxy.affinityis empty.All terms are soft (
preferredDuringSchedulingIgnoredDuringExecution), so single-node clusters still schedule every replica. Flip to required in multi-node production by setting an explicit affinity.Why
Without it the scheduler can place all replicas of a component on one node, so a single node loss takes the whole quorum (broker or etcd) or the only ingress (proxy) at once, defeating the replication. Soft anti-affinity is the standard HA default and stays safe on single-node clusters.
Selector-drift fix (review follow-up)
The earlier revision hardcoded the proxy default anti-affinity selector to
app.kubernetes.io/name: kafscale-proxyinvalues.yaml. UndernameOverridethe proxy pod label becomes<override>-proxy, so the selector matched nothing: the rule was present but spread no pods, disabling HA with no error.The fix moves the default into
proxy-deployment.yamland templates the selector from the samecomponentSelectorLabelshelper that produces the pod labels. The selector now always matches the proxy's own pods, including undernameOverride/fullnameOverride. The operator path already used this property (the StatefulSet selector reuses the pod-template label map); the chart now matches that discipline.Test
Two concrete, runnable artefacts. No cluster required for either.
Operator unit test (
pkg/operator/antiaffinity_test.go): reconciles aKafscaleClusteragainst the fake client, fetches the broker and etcd StatefulSets, and asserts thatTemplate.Spec.Affinity.PodAntiAffinity.PreferredDuringSchedulingIgnoredDuringExecution[0].PodAffinityTerm.LabelSelector.MatchLabelsEQUALS the pod-template labels, andTopologyKey == "kubernetes.io/hostname", andThat equality is the property that makes the rule work: the selector targets the very pods it is meant to spread.
Chart template assertion (
test/chart/proxy-antiaffinity_test.sh, run viamake test-chart-antiaffinity): renders the proxy Deployment withhelm templateand asserts the anti-affinity selector EQUALS the rendered pod labels in the default case AND under--set nameOverride=foo(the case that would have caught the drift), plus soft-only and user-override checks.Both pass on this branch.
helm lint deploy/helm/kafscaleis clean andgo build ./...is green.Reproducible before/after (live evidence recipe)
A live multi-node capture is pending lab availability. In the meantime this recipe reproduces the effect on any host with kind:
For the real chart, the equivalent check is: install the chart on a 3-node cluster with
proxy.replicaCount=2(or a 3-brokerKafscaleCluster) and runkubectl get pods -o wide; the proxy replicas (and broker/etcd pods) land on distinct nodes where capacity allows.Scope decisions
topologySpreadConstraints: []values hook: it added surface with no default behavior and no test. Possible follow-up if a multi-zone default is wanted.Affinityfields onBrokerSpec/EtcdSpecplus CRD regeneration, which is out of scope for this single-purpose PR. Tracked as a follow-up.Backward-compat note
proxy.affinityreturns to a{}default and the chart now supplies the default soft anti-affinity in the template. Settingproxy.affinityREPLACES the chart default rather than merging with it. Users who want anti-affinity plus extra affinity rules should include the anti-affinity block in their override.Part of a small series upstreaming deployment-hardening deltas we currently carry.