Skip to content

feat: soft pod anti-affinity for broker, etcd, and proxy#153

Open
kamir wants to merge 3 commits into
KafScale:mainfrom
kamir:pr/anti-affinity
Open

feat: soft pod anti-affinity for broker, etcd, and proxy#153
kamir wants to merge 3 commits into
KafScale:mainfrom
kamir:pr/anti-affinity

Conversation

@kamir

@kamir kamir commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

What

Add soft (preferred) pod anti-affinity so replicas of the same component avoid co-locating on one node:

  • operator-managed broker and etcd StatefulSets (pkg/operator), via a shared softPodAntiAffinity(labels) that reuses each workload's pod-template label map as the selector.
  • proxy Deployment (chart): a default soft anti-affinity templated into proxy-deployment.yaml, applied only when proxy.affinity is empty.

All terms are soft (preferredDuringSchedulingIgnoredDuringExecution), so single-node clusters still schedule every replica. Flip to required in multi-node production by setting an explicit affinity.

Why

Without it the scheduler can place all replicas of a component on one node, so a single node loss takes the whole quorum (broker or etcd) or the only ingress (proxy) at once, defeating the replication. Soft anti-affinity is the standard HA default and stays safe on single-node clusters.

Selector-drift fix (review follow-up)

The earlier revision hardcoded the proxy default anti-affinity selector to app.kubernetes.io/name: kafscale-proxy in values.yaml. Under nameOverride the proxy pod label becomes <override>-proxy, so the selector matched nothing: the rule was present but spread no pods, disabling HA with no error.

The fix moves the default into proxy-deployment.yaml and templates the selector from the same componentSelectorLabels helper that produces the pod labels. The selector now always matches the proxy's own pods, including under nameOverride/fullnameOverride. The operator path already used this property (the StatefulSet selector reuses the pod-template label map); the chart now matches that discipline.

Test

Two concrete, runnable artefacts. No cluster required for either.

Operator unit test (pkg/operator/antiaffinity_test.go): reconciles a KafscaleCluster against the fake client, fetches the broker and etcd StatefulSets, and asserts that

  • Template.Spec.Affinity.PodAntiAffinity.PreferredDuringSchedulingIgnoredDuringExecution[0].PodAffinityTerm.LabelSelector.MatchLabels EQUALS the pod-template labels, and
  • TopologyKey == "kubernetes.io/hostname", and
  • the term is soft-only (no required term).

That equality is the property that makes the rule work: the selector targets the very pods it is meant to spread.

go test ./pkg/operator/... -run AntiAffinity -v

Chart template assertion (test/chart/proxy-antiaffinity_test.sh, run via make test-chart-antiaffinity): renders the proxy Deployment with helm template and asserts the anti-affinity selector EQUALS the rendered pod labels in the default case AND under --set nameOverride=foo (the case that would have caught the drift), plus soft-only and user-override checks.

make test-chart-antiaffinity

Both pass on this branch. helm lint deploy/helm/kafscale is clean and go build ./... is green.

Reproducible before/after (live evidence recipe)

A live multi-node capture is pending lab availability. In the meantime this recipe reproduces the effect on any host with kind:

# 3-node kind cluster
cat <<'EOF' > /tmp/kind-3node.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
EOF
kind create cluster --name aa-demo --config /tmp/kind-3node.yaml

# BEFORE: a 3-replica Deployment with no anti-affinity can co-locate.
kubectl create deployment demo --image=registry.k8s.io/pause:3.9 --replicas=3
kubectl get pods -l app=demo -o wide          # observe replicas sharing nodes

# AFTER: add the same soft anti-affinity this PR ships and let the
# scheduler re-place. Replicas prefer distinct nodes.
kubectl patch deployment demo --type merge -p '{
  "spec": {"template": {"spec": {"affinity": {"podAntiAffinity": {
    "preferredDuringSchedulingIgnoredDuringExecution": [{
      "weight": 100,
      "podAffinityTerm": {
        "labelSelector": {"matchLabels": {"app": "demo"}},
        "topologyKey": "kubernetes.io/hostname"}}]}}}}}}'
kubectl rollout status deployment/demo
kubectl get pods -l app=demo -o wide          # replicas now spread across nodes

kind delete cluster --name aa-demo

For the real chart, the equivalent check is: install the chart on a 3-node cluster with proxy.replicaCount=2 (or a 3-broker KafscaleCluster) and run kubectl get pods -o wide; the proxy replicas (and broker/etcd pods) land on distinct nodes where capacity allows.

Scope decisions

  • Dropped the empty topologySpreadConstraints: [] values hook: it added surface with no default behavior and no test. Possible follow-up if a multi-zone default is wanted.
  • Operator-set affinity is not yet overridable via the CR. A clean passthrough would need new Affinity fields on BrokerSpec/EtcdSpec plus CRD regeneration, which is out of scope for this single-purpose PR. Tracked as a follow-up.

Backward-compat note

proxy.affinity returns to a {} default and the chart now supplies the default soft anti-affinity in the template. Setting proxy.affinity REPLACES the chart default rather than merging with it. Users who want anti-affinity plus extra affinity rules should include the anti-affinity block in their override.

Part of a small series upstreaming deployment-hardening deltas we currently carry.

kamir and others added 3 commits June 4, 2026 19:38
…— BUG-0012

Operator-managed broker and etcd StatefulSets shipped with no anti-affinity,
so all replicas could schedule onto one node and a single node loss took out
the whole quorum. Add a preferred (soft) podAntiAffinity over each
StatefulSet's own pod labels keyed on kubernetes.io/hostname. Soft so the
single-node KIND demo still schedules every replica; on a multi-node cluster
the scheduler spreads them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hook

PLAN-06 iter-7 E-14 / G-007. The chart-templated proxy is the only
multi-replica workload bp-001 controls through this chart (operator-
managed brokers + etcd are tracked in BUG-0012). Defaults match the
soft-anti-affinity shape we want everywhere:

  * podAntiAffinity / preferredDuringSchedulingIgnoredDuringExecution,
    weight 100, topologyKey kubernetes.io/hostname. Soft so single-
    node KIND clusters still schedule both proxy replicas; flip to
    requiredDuring in a multi-node production overlay.
  * topologySpreadConstraints: [] by default; populate per-cluster
    when multi-zone topology is available.

Templates extended:
  * proxy-deployment.yaml: added topologySpreadConstraints hook
    next to the existing affinity hook (both gated by `with`).

Chart bump 0.4.1 -> 0.4.2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…etcd/proxy

Address review on the soft pod anti-affinity work.

Selector-drift fix: the proxy default podAntiAffinity selector was
hardcoded to app.kubernetes.io/name: kafscale-proxy in values.yaml.
Under nameOverride the pod label becomes <override>-proxy, so the
selector matched nothing and HA was silently disabled with no error.
Move the default into proxy-deployment.yaml, templated from the same
componentSelectorLabels helper that produces the pod labels, applied
only when proxy.affinity is empty so user overrides still win. The
selector now tracks the pod name under nameOverride/fullnameOverride.

Operator unit test (pkg/operator/antiaffinity_test.go): reconcile a
KafscaleCluster with the fake client, fetch the broker and etcd
StatefulSets, and assert the preferred podAntiAffinity term's label
selector EQUALS the pod-template labels and topologyKey ==
kubernetes.io/hostname. That equality is the property that makes the
rule spread its own pods.

Chart template assertion (test/chart/proxy-antiaffinity_test.sh +
make test-chart-antiaffinity): assert the rendered proxy anti-affinity
selector equals the rendered pod labels, including a --set
nameOverride=foo case that would have caught the drift, plus soft-only
and user-override checks.

Comment scrub: replace the internal tracker reference in
softPodAntiAffinity with a self-contained rationale.

Scope: drop the empty topologySpreadConstraints values hook (no default
behavior, no test); possible follow-up. Operator-set affinity is not
yet overridable via the CR (would need new CR API fields + CRD
regen); tracked as a follow-up.

Backward-compat: proxy.affinity default returns to {} and the chart
supplies the default; setting proxy.affinity replaces the default
rather than merging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant