agent_sandbox: load generator, metrics, and runnable benchmark#6740
agent_sandbox: load generator, metrics, and runnable benchmark#6740geojaz wants to merge 11 commits into
Conversation
Introduce the agent sandbox as a PKB resource modeled on the kubernetes inference server pattern, replacing the prior linux_package shape. This change adds only the class/spec/registration skeleton plus the cloud-agnostic container_cluster wiring. The install logic and the benchmark are added in follow-up changes. - BaseAgentSandbox resource and GetAgentSandbox factory, keyed on SANDBOX_TYPE so additional sandbox implementations can coexist. - BaseAgentSandboxConfigSpec and AgentSandboxConfigDecoder, embeddable under container_cluster in a benchmark config. - K8sAgentSandbox / K8sAgentSandboxConfigSpec: the Kubernetes (kubernetes-sigs/agent-sandbox) implementation stubs. - KubernetesCluster constructs and lifecycles cluster.agent_sandbox alongside cluster.inference_server.
…b-specs and flags Add ControllerSpec / SandboxTemplateSpec / SandboxWarmPoolSpec nested sub-specs, the agent_sandbox_* stack and controller-tuning flags bridged via _ApplyFlags, and rename the old controller_ref flag to agent_sandbox_manifest_ref.
…spec register The concrete resource module must import its concrete spec module (as wg_serving_inference_server imports wg_serving_inference_server_spec) so the agent_sandbox_* flags and K8sAgentSandboxConfigSpec register at runtime. Without it, a real pkb.py run fails at flag parsing / config decode even though unit tests (which import the spec module directly) pass.
Make the agent_sandbox benchmark run: a SandboxClaim load generator, the metrics it produces, the Run wiring that drives them, and a provision/prepare install split for fast iteration. The load generator (agent_sandbox_loadgen.py) submits SandboxClaim custom resources at a target QPS through a single shared Kubernetes Watch stream (no per-claim polling). ClaimDriver handles create/watch with 429 retry and separate connection pools, LoadGenerator paces submission, and readiness is tracked with bounded concurrency. Claims reference the warm pool directly via spec.warmPoolRef (kubernetes-sigs/agent-sandbox#899 replaced sandboxTemplateRef/warmpool with a single warmPoolRef; the controller resolves the template through the warm pool), and the default manifest ref is bumped to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match. The metrics module (agent_sandbox_metrics.py) computes startup-time percentiles, submit/completion QPS, peak concurrency, warm_served_fraction, error counts, and lifecycle/exec-duration percentiles from the recorded events. The benchmark Run constructs the load generator from the load-shape flags, runs it, and converts the recorded events into PKB samples (the stub Run from the resource PR returned nothing). Install is split across provision and prepare: provision installs only the cluster scaffolding (gVisor, CRDs, RBAC); the controller Deployment, sandbox template, and warm pool move to the prepare stage via a new K8sAgentSandbox.InstallWorkload. This lets the controller be reinstalled against an existing cluster with --run_stage=prepare to iterate on controller settings without recreating it. Because the benchmark spec is pickled at provision and unpickled without re-applying flags, Prepare calls RefreshSpecFromFlags on a resume so the controller, template, and warm pool config reflect the current command-line flags. Note: --run_stage=provision alone no longer installs the controller; run provision,prepare for a full setup. Adds the kubernetes Python client to requirements.txt, plus unit tests for the load generator, the metrics, and the provision/prepare split.
The gVisor scheduling selector and taint were duplicated as literal strings across the benchmark config, the installer DaemonSet, and the sandbox template, with nothing keeping them in sync. Untangle scheduling from runtime identity: - Scheduling: select the sandbox nodepool via the pkb_nodepool label PKB already injects on every pool, and derive the pod toleration from a single taint constant. nodeSelector/tolerations are now injected in Python (like _configure_controller_manifest) instead of being hardcoded in the manifests. - Runtime identity: runtimeClassName stays runsc, used only for the RuntimeClass, containerd registration, and the pod runtimeClassName. PKB does not yet apply nodepool taints to nodes (that lands in a follow-up), so the canonical taint lives in a _SANDBOX_TAINT constant with a TODO to read it from the nodepool config once that wiring exists. The SandboxWarmPool is unchanged: it inherits scheduling from the SandboxTemplate podTemplate.
| """ | ||
| sandbox = benchmark_spec.container_cluster.agent_sandbox | ||
| if sandbox is None: | ||
| return |
There was a problem hiding this comment.
can probably raise this as error (in general we like failing benchmarks rather than silently continuing)
| total=_TOTAL.value) | ||
| driver = agent_sandbox_loadgen.ClaimDriver( | ||
| namespace=spec.namespace, | ||
| template_name=k8s_agent_sandbox._SANDBOX_NAME, |
There was a problem hiding this comment.
great to see this is a hardcoded value (like, yes it actually should be. I mean maybe if it needs to be different every run it can have a uri component, but it shouldn't be flag passed).
But requesting actual change: Make this a public variable (no _ in front).
| a small urllib3 pool. | ||
| - the exec-plugin bearer-token remap (see _register_bearer_token_auth). | ||
| """ | ||
| from kubernetes import client # pylint: disable=import-error,no-name-in-module |
There was a problem hiding this comment.
I don't like this as a new PKB requirement. Each new requirement does add some load time to our internal runs & memory to everyone's machines.
My high level suggestion is mostly to run this from VMs:
- We often run load generation from VMs & that could be an option here. ie entirely running from VMs rather than from a cluster. This provides more isolation & in-same region but not same cluster latencies which can be more indicative of customer usecases (not sure if sandbox load comes from outside or inside a cluster for real customers)
- Similarly a VM can be used simply to handle additional dependencies. ie put this in like a data script, copy it to a runner VM, run it on said VM, copy out the results.
Otherwise:
- Justify it as new requirement
- Why is it imported inline rather than up top?
|
|
||
|
|
||
|
|
||
| def percentile(values, pct): |
There was a problem hiding this comment.
add pytyping lots of places https://google.github.io/pytype/
Third in the stacked agent_sandbox series. This is the PR that makes the
benchmark actually run.
Stack / merge order (each branch is cumulative on the one below):
Related (but not technically a dependency): #6741 adds the GKE provider options
(per-nodepool node_labels/taints, etc.) this benchmark uses to run on GKE.
Because this is a cross-fork PR the base has to be
master, so until #6730 and#6732 merge the diff here shows the whole stack. The only new commit in this PR
is the top one (9 files, +1600/-33); that commit is the real review scope, and
GitHub will narrow the diff as the lower PRs land.
What this adds
agent_sandbox_loadgen.py): submits SandboxClaim customresources at a target QPS through a single shared Kubernetes Watch stream (no
per-claim polling).
ClaimDriverhandles create/watch with 429 retry andseparate connection pools,
LoadGeneratorpaces submission, and readiness istracked with bounded concurrency. Claims reference the warm pool directly via
spec.warmPoolRef(api: Replacespec.templateRefinSandboxClaimwithspec.warmpoolRef. kubernetes-sigs/agent-sandbox#899 replacedsandboxTemplateRef/warmpoolwith a singlewarmPoolRef; the controllerresolves the template through the warm pool), and the default manifest ref is
bumped to the post-Unable to use run stages with OpenStack #899 main HEAD so the installed CRDs match.
agent_sandbox_metrics.py): startup-time percentiles,submit/completion QPS, peak concurrency, warm_served_fraction, error counts,
and lifecycle/exec-duration percentiles from the recorded events.
and converts the recorded events into PKB samples (the stub Run from the
resource PR returned nothing).
scaffolding (gVisor, CRDs, RBAC); the controller Deployment, sandbox template,
and warm pool move to the prepare stage via
K8sAgentSandbox.InstallWorkload.This lets the controller be reinstalled against an existing cluster with
--run_stage=prepareto iterate on controller settings without recreating it.PreparecallsRefreshSpecFromFlagson resume so the unpickled spec reflectsthe current flags. Note:
--run_stage=provisionalone no longer installs thecontroller; use
provision,preparefor a full setup.kubernetesPython client torequirements.txt, plus unit tests forthe load generator, the metrics, and the provision/prepare split.
Scheduling single-source
The gVisor selector and taint were duplicated as literal strings across the
benchmark config, the installer DaemonSet, and the sandbox template, with
nothing keeping them in sync. This untangles scheduling from runtime identity:
nodepool via the
pkb_nodepoollabel PKB already injects on every pool, andthe pod toleration is derived from a single taint constant.
nodeSelectorand
tolerationsare now injected in Python (same pattern as_configure_controller_manifest) instead of being hardcoded in the manifests.runtimeClassNamestaysrunsc, used only for theRuntimeClass, the containerd registration, and the pod's
runtimeClassName.It is no longer reused as a node selector value.
Known gap until #6741 lands: PKB does not yet apply nodepool taints to the
actual nodes (that wiring is in #6741). So on this branch the canonical taint
lives in a
_SANDBOX_TAINTconstant with aTODO(#6741), and the taint iseffectively a no-op on the nodes. Concretely: selection onto the sandbox pool
via
pkb_nodepoolworks today, but the fence (keeping other pods off the gVisornodes) is not live yet, because nothing taints those nodes until #6741. That gap
closes when #6741 wires
node_taintsand the constant is swapped fornodepool.node_taints. TheSandboxWarmPoolis unchanged; it inheritsscheduling from the
SandboxTemplatepodTemplate.