Skip to content

agent_sandbox: Kubernetes agent sandbox resource implementation#6732

Open
geojaz wants to merge 9 commits into
GoogleCloudPlatform:masterfrom
onix-net:geojaz/agent-sandbox-resource
Open

agent_sandbox: Kubernetes agent sandbox resource implementation#6732
geojaz wants to merge 9 commits into
GoogleCloudPlatform:masterfrom
onix-net:geojaz/agent-sandbox-resource

Conversation

@geojaz

@geojaz geojaz commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

What

Second step of reshaping the agent sandbox into a PKB resource (follows the skeleton in #6730). This fills in the Kubernetes implementation: a config-driven spec, the install orchestration, the data manifests, and a stub benchmark that provisions the resource.

Stacked on #6730 review and merge that first. Until #6730 merges, the diff below also includes the skeleton commit; GitHub narrows it to this PR's own commits once #6730 lands.

Changes

  • K8sAgentSandboxConfigSpec (k8s_agent_sandbox_spec.py): config-driven, with nested controller, sandbox_template, and sandbox_warmpool sub-specs. The sandbox_template block models the upstream SandboxTemplateSpec (pod shape rendered; template-level toggles like network_policy_management / env_vars_injection_policy / service are accepted and validated as stubs). The existing agent_sandbox_* flags are bridged via _ApplyFlags; the old controller_ref flag is renamed to agent_sandbox_manifest_ref.
  • K8sAgentSandbox (k8s_agent_sandbox.py): _Create orchestrates gVisor install, controller install, SandboxTemplate apply, and SandboxWarmPool install (private methods reading self.spec). _Delete is a no-op (the ephemeral cluster teardown reclaims the stack). The controller-manifest configuration and install helpers are ported from the prior linux_packages implementation.
  • Data manifests (data/agent_sandbox/): gVisor installer assets, and sandbox-template.yaml.j2 parameterized for runtime class, image, resources, and labels. The SandboxTemplate and SandboxWarmPool share a single fixed name.
  • Stub benchmark (agent_sandbox_benchmark.py): a container_cluster.agent_sandbox config that constructs a K8sAgentSandbox. Run returns no samples yet.
  • Tests (k8s_agent_sandbox_test.py): spec decode + flag overrides, controller-manifest injection, _Create orchestration, _Delete no-op, and benchmark-config construction.

Scope

Reviewable but not yet runnable end to end. The benchmark's Run (load generator + metrics) lands in the next PR, and the GKE/EKS/AKS nodepool changes that make it actually provision come after that. Cloud-agnostic: no provider changes here.

Follow-ups

Run load generation + metrics; then GKE, EKS, AKS provider support.

geojaz added 9 commits June 3, 2026 17:35
Introduce the agent sandbox as a PKB resource modeled on the kubernetes
inference server pattern, replacing the prior linux_package shape. This
change adds only the class/spec/registration skeleton plus the
cloud-agnostic container_cluster wiring. The install logic and the
benchmark are added in follow-up changes.

- BaseAgentSandbox resource and GetAgentSandbox factory, keyed on
  SANDBOX_TYPE so additional sandbox implementations can coexist.
- BaseAgentSandboxConfigSpec and AgentSandboxConfigDecoder, embeddable
  under container_cluster in a benchmark config.
- K8sAgentSandbox / K8sAgentSandboxConfigSpec: the Kubernetes
  (kubernetes-sigs/agent-sandbox) implementation stubs.
- KubernetesCluster constructs and lifecycles cluster.agent_sandbox
  alongside cluster.inference_server.
…b-specs and flags

Add ControllerSpec / SandboxTemplateSpec / SandboxWarmPoolSpec nested
sub-specs, the agent_sandbox_* stack and controller-tuning flags bridged
via _ApplyFlags, and rename the old controller_ref flag to
agent_sandbox_manifest_ref.
…spec register

The concrete resource module must import its concrete spec module (as
wg_serving_inference_server imports wg_serving_inference_server_spec) so
the agent_sandbox_* flags and K8sAgentSandboxConfigSpec register at
runtime. Without it, a real pkb.py run fails at flag parsing /
config decode even though unit tests (which import the spec module
directly) pass.
inference_server: (
kubernetes_inference_server_spec.BaseInferenceServerConfigSpec | None
)
agent_sandbox: agent_sandbox_spec.BaseAgentSandboxConfigSpec | None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inference server set the example here, but I'm not sure it's actually the correct location as opposed to having this in root benchmark_spec.py. Namely this approach creates some circular dependency issues, where agent_sandbox wants to reference a cluster (to call methods on it) & the cluster references it to create it.
I believe wg_serving_inference_server.py & kubernetes_inference_server.py get around this by having a parent / abstract service & a child - and/or by not using pytype at that top level. But yeah putting this in benchmark_spec.py is probably the right place.
See 4daab75 for how example_resource.py was added to benchmark_spec.py. The ConstructAgentSandbox call can also take a container_cluster in its init there.

'agent_sandbox_controller_otel_endpoint', None,
'OTLP exporter endpoint when tracing is enabled.')
flags.DEFINE_boolean(
'agent_sandbox_controller_leader_elect', False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to set all these variables just through config_overrides like --config_override=kubernetes_redis_memtier.container_cluster.agent_sandbox.image=yada & the flags are just convenience. That's not bad to have the convenience flags, but it means we probably only need flags for the ones which we're most likely to actually manually change.

config_values['runtime_class'] = flag_values.agent_sandbox_runtime_class


class SandboxWarmPoolSpec(spec.BaseSpec):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split some of these to their child PRs? IDK, this seems like it is implementing the base sandbox + a bunch of features. Each feature as an individual PR would be nice.

In general it seems like a ton of customization.. we can likely hardcode many of these values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misinterpreting this (I note below they are all referenced by the parent spec)

#
# Targets nodes labelled sandbox.gke.io/runtime=runsc (the label the
# benchmark applies to the sandbox node pool).
apiVersion: apps/v1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this turned into mostly a "setup spec" PR so I don't think the yamls are used right? Push to the PR where they are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants