feat: substrate support for BYO and python runtimes for SandboxAgent CR#2043
feat: substrate support for BYO and python runtimes for SandboxAgent CR#2043jmhbh wants to merge 16 commits into
Conversation
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
|
Warning Testing pausedMonthly snapshot limit reached. Update your plan for additional snapshots and to resume testing. |
EItanya
left a comment
There was a problem hiding this comment.
Awesome job overall. I have some design questions here about getting rid of old templates and how that should work.
| // then surfaces as a gVisor "inconsistent private memory files on restore" error because the | ||
| // golden snapshot captures only the pause container. The Go static binary needs none of this. | ||
| // Keep in sync with the final-stage ENV block of python/Dockerfile. | ||
| func pythonRuntimeImageEnv() []corev1.EnvVar { |
There was a problem hiding this comment.
This feels hacky, is there someway we can fix substrate, or the gvisor impl to properly do this?
There was a problem hiding this comment.
yup, this can be fixed in substrate. I can open a fix for this upstream.
There was a problem hiding this comment.
Can you please link to that issue
There was a problem hiding this comment.
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
| // then surfaces as a gVisor "inconsistent private memory files on restore" error because the | ||
| // golden snapshot captures only the pause container. The Go static binary needs none of this. | ||
| // Keep in sync with the final-stage ENV block of python/Dockerfile. | ||
| func pythonRuntimeImageEnv() []corev1.EnvVar { |
There was a problem hiding this comment.
yup, this can be fixed in substrate. I can open a fix for this upstream.
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
…tion is chosen also updated config secret to be based on a config hash since secret config is cached by substrate and stale config can be fetched since updates to the secret are in place but the secret name doesn't change Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
There was a problem hiding this comment.
Pull request overview
Adds full Agent Substrate support for SandboxAgent across declarative (Python/Go) and BYO runtimes, including config-hash based blue/green ActorTemplate handling and new Python runtime image variants.
Changes:
- Enable Substrate for BYO and Python declarative runtimes (UI + API validation + controller/runtime wiring).
- Introduce config-hash + desired-generation based ActorTemplate/actor selection to avoid stale goldens during config changes.
- Add “full” Python runtime/app images and controller digest plumbing to select slim vs full at runtime.
Reviewed changes
Copilot reviewed 44 out of 44 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/lib/sandboxAgentForm.ts | Updates Substrate support rules and runtime defaults in the UI. |
| ui/src/lib/tests/sandboxAgentForm.test.ts | Adjusts tests for updated Substrate support/runtime behavior. |
| ui/src/components/agent-form/ByoDeploymentFields.tsx | Adds byoCmd validation UI affordances. |
| ui/src/components/agent-form/agent-form-types.ts | Adds byoCmd to validation error shape. |
| ui/src/app/agents/new/page.tsx | Shows runtime selector on Substrate; adds Substrate-BYO cmd validation. |
| scripts/controller-digest-ldflags.sh | Adds Python “full” image digest embedding support. |
| python/packages/kagent-adk/src/kagent/adk/cli.py | Materializes Substrate env-injected config before loading config.json. |
| python/packages/kagent-adk/src/kagent/adk/_config_materialize.py | New helper to write env-injected config/token to files. |
| python/packages/kagent-adk/tests/unittests/test_config_materialize.py | Adds unit tests for config materialization behavior. |
| python/Dockerfile | Reworks slim Python ADK image to distroless with copied shared libs + venv. |
| python/Dockerfile.full | Adds full Python ADK image including sandbox runtime + tools venv. |
| python/Dockerfile.app | Documents tag-based base selection; minor formatting. |
| Makefile | Adds build targets/tags for full Python ADK + full app image and wires controller build. |
| helm/kagent-crds/templates/kagent.dev_sandboxagents.yaml | Removes CRD rule blocking BYO+Substrate. |
| go/api/config/crd/bases/kagent.dev_sandboxagents.yaml | Removes CRD rule blocking BYO+Substrate. |
| go/api/v1alpha2/sandboxagent_types.go | Removes BYO+Substrate XValidation restriction. |
| go/api/v1alpha2/agent_types.go | Removes substrate-specific runtime override helper. |
| go/api/v1alpha2/agent_spec_validation.go | Updates Substrate validation: allow Python, require BYO cmd. |
| go/api/v1alpha2/agent_spec_validation_test.go | Updates tests for new Substrate validation semantics. |
| go/api/v1alpha2/agent_runtime_test.go | Reworks tests to validate EffectiveDeclarativeRuntime behavior. |
| go/core/pkg/consts/annotations.go | Adds shared config-hash annotation constant. |
| go/core/internal/controller/translator/agent/manifest_builder.go | Plumbs config secret into sandbox backend build input; centralizes config-hash key. |
| go/core/pkg/sandboxbackend/backend.go | Adds ConfigSecret to sandbox backend BuildInput. |
| go/core/pkg/sandboxbackend/routing.go | Uses backend-specific prune types (vs watch types). |
| go/core/pkg/sandboxbackend/filter_translator_owned_test.go | Updates prune filtering tests for Substrate ActorTemplate lifecycle. |
| go/core/pkg/sandboxbackend/substrate/agents_backend.go | Stops generic pruning for Substrate; clones per-hash config Secret; readiness uses template resolution. |
| go/core/pkg/sandboxbackend/substrate/agent_lifecycle.go | Adds Python+BYO command support; stamps desired-generation; adds per-hash naming helpers. |
| go/core/pkg/sandboxbackend/substrate/lifecycle_shared.go | Adds ActorTemplate resolution helpers (desired-generation + Ready preference). |
| go/core/pkg/sandboxbackend/substrate/lifecycle_delete.go | Deletes goldens for all labeled templates on agent delete. |
| go/core/pkg/sandboxbackend/substrate/agent_actor.go | Resolves current ActorTemplate for chat; session actor IDs incorporate config hash. |
| go/core/pkg/sandboxbackend/substrate/actor_errors.go | Normalizes CreateActor “no free workers” handling. |
| go/core/pkg/sandboxbackend/substrate/config_hash_test.go | Adds tests for config-hash naming and template resolution. |
| go/core/pkg/sandboxbackend/substrate/agent_lifecycle_test.go | Expands tests for declarative (Go/Python) + BYO substrate commands/env. |
| go/core/pkg/app/app.go | Passes kube client into Substrate actor backend for template resolution. |
| go/core/internal/controller/sandboxagent_substrate.go | Centralizes “substrate configured” gating; updates delete path. |
| go/core/internal/controller/sandboxagent_controller.go | Gates substrate reconcile/watch on full substrate wiring. |
| go/core/internal/controller/translator/agent/deployments.go | Selects Python full image when SRT needed; removes substrate runtime override. |
| go/core/internal/controller/translator/agent/imageconfig_test.go | Adds tests for Python full image digest selection. |
| go/core/internal/controller/translator/agent/digest_testmain_external_test.go | Sets full Python digest in external test main. |
| go/core/internal/controller/translator/agent/remotemcpserver_tls_test.go | Switches to shared config-hash const. |
| go/core/internal/controller/translator/agent/testdata/outputs/*.json | Updates expected runtime image selection for skill/code cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
… image to be distroless Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
| return err | ||
| } | ||
| if r.SubstrateLifecycle != nil { | ||
| if r.substrateConfigured() { |
There was a problem hiding this comment.
Can we get rid of this r.substrateConfigured() in a follow-up. If substrate is not configured we shouldn't even run this reconciler at all, nothing will work. Right now it's scattered everywhere
There was a problem hiding this comment.
I think it makes sense to separate this into a separate substrate controller, I can work on that as a follow up.
…r is invoked that all actors for a given session are deleted and ensure cleanup matches actor by owning template Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
|
Is there a strong reason for this MR to proceed without skill support? It seems to me like this is one of the biggest advantages of the python substrate runtime. Particularly with this already trailblazer by initial Google ax, not apposed but I want to understand what the blocker is to a workaround for the init container. |
hey @azabris1 thanks for your comment. We are planning to add skills but will do so in a follow up PR |
eb67e09 to
785fa5d
Compare
KAGENT_CONFIG_JSONsecret key ref name which is static. This can result in stale config being fetched. The secret key ref name now contains the config hash as a suffix so different configs now have a unique cache key.Testing