eks: support nodepool labels and taints#6744
Conversation
`aws ec2 import-key-pair --public-key-material` rejects raw OpenSSH public keys as "Invalid base64" when the value is passed as a CLI string. The CLI expects either base64-encoded bytes via the string form, or raw bytes via a fileb://<path> URI. PKB was passing the raw key string via cat -> --public-key-material=$keyfile, which produces the rejected case. Switch to fileb://<path> so the CLI reads the bytes directly. Without this fix, AwsKeyFileManager.ImportKeyfile raises PrepareException on every AWS provisioning, blocking EksCluster (and AwsVm) Create().
e374b59 to
2a4de67
Compare
Add _ParseTaint helper and update _RenderNodeGroupJson in BaseEksCluster to pass node_labels and node_taints through to eksctl's node group JSON. Taints are parsed from 'key=value:Effect' or 'key:Effect' strings into the dict format eksctl expects.
2a4de67 to
c20ce1b
Compare
|
What are some sample values y'all are setting this to? We've certainly use(d) selectors on pkb_nodepool or similar in places, so I can see some value.. but of course there are questions of "should this go in spec config, or in hardcoded values in yaml or pkb code?" which can sometimes be unfortunately arbitrary. I have been trying to put more useful info in the nodepool spec, so I could see it but would also appreciate examples/reasoning. |
These labels and taints should live in the nodePoolSpec because it guarantees enforcement at all times. While it's technically possible to label nodes after they come up, that can get sketchy under node churn — and in the benchmarks I've come across, node labeling happens once during provisioning rather than as an ongoing activity. The gVisor-installer DaemonSet selects on the label and tolerates the taint to install runsc and register it with containerd; the sandbox pods use the same selector/toleration so they land on properly-configured nodes. And I expect utility beyond this benchmark — any benchmark with heterogeneous nodepools could use these fields. That said — good question on where this should live, at least with respect to the Right now that runtime selector appears in both the nodepool spec and the hardcoded DaemonSet/RuntimeClass manifests, with nothing keeping them in sync; if the spec value changed, the manifests wouldn't follow. The generic node_labels/node_taints fields are the right primitive to expose, but this particular selector is really an internal invariant of how the benchmark wires gVisor together, so I'd rather single-source it than hand-maintain the same string in four places. Let me make a couple tweaks to #6740 with that in mind — I think the pool can rendezvous on the pkb_nodepool label that's already injected, with the toleration derived from node_taints, so the spec stays the source of truth. |
| logging.info('Successfully applied S3 PVC.') | ||
|
|
||
|
|
||
| def _ParseTaint(taint_str: str) -> dict[str, str]: |
There was a problem hiding this comment.
Could we add the matching GKE wiring in _AddNodeParamsToCmd as well? It would help symmetric setup for label/taint-based pinning for cross-cloud benchmarks.. You could merge node_labels
into --node-labels and --node-taints from node_taints in google_kubernetes_engine.py.
| self, nodepool: container.BaseNodePoolConfig | ||
| ) -> dict[str, Any]: | ||
| """Constructs the node group json dictionary.""" | ||
| labels = {'pkb_nodepool': nodepool.name} |
There was a problem hiding this comment.
This will work for eksctl setup, but #6748 uses eks create-group explicitly which cannot leverage this.
@ashishsuneja could we converge on the approach?
There was a problem hiding this comment.
Thanks Ajay — agree this is worth converging on. A few things that might help:
The kubernetes_management EKS path (#6748) already sets the pkb_nodepool label on every nodegroup — in both the eksctl-config path and the create-nodegroup payload — and already selects nodes by it (jsonpath on .metadata.labels.pkb_nodepool). So the label-based selection of single-node benchmarks need already works against nodegroups it provisions.
On the divergence you flagged: #6748 uses eks create-nodegroup directly rather than eksctl because the benchmark measures the managed-nodegroup create/upgrade/delete operations themselves — eksctl wraps those in CloudFormation, which adds provisioning latency we can't attribute to the control plane. So the two paths can't share the eksctl-config label mechanism directly, but they converge on the same pkb_nodepool contract.
Your broader point — shared nodepool-provisioning helpers so single-node benchmarks can reuse the logic — is the right direction, but it spans this PR, #6744, and the existing provisioning benchmark. Could we come to a conclusion maybe by having a short discussion or chat to agree on the shared interface before reshaping any one PR? I'm happy to align to whatever shared contract we land on.
Add support for
node_labelsandnode_taintson EKS node groups.What this adds
_ParseTaint(): parses taint strings inkey=value:Effectorkey:Effectformat into the dict structure eksctl expects in its node group JSON.BaseEksCluster._RenderNodeGroupJson(): mergesnodepool.node_labelsinto the labels dict and appends parsed taints whennodepool.node_taintsis set.These were written to simplify managing EKS clusters end-to-end with PKB for the
agent_sandboxbenchmark, but are neither required by nor exclusively useful for that benchmark -- any PKB benchmark that provisions EKS nodepools with label or taint requirements can use them.Also includes a fix for AWS key import to use the
fileb://URI scheme (--public-key-material=fileb://<path>), which avoids the AWS CLI's base64 interpretation of raw key material passed as a string argument.Tests
Added
testEksClusterNodepoolLabels,testEksClusterNodepoolTaints, andtestParseTainttoElasticKubernetesServiceTest. All 27 tests in the EKS test file pass.Thanks to @shubham-gaur-x for the inspiration and testing