Skip to content

feat: substrate support for BYO and python runtimes for SandboxAgent CR#2043

Open
jmhbh wants to merge 16 commits into
mainfrom
feat/substrate-python-byo
Open

feat: substrate support for BYO and python runtimes for SandboxAgent CR#2043
jmhbh wants to merge 16 commits into
mainfrom
feat/substrate-python-byo

Conversation

@jmhbh

@jmhbh jmhbh commented Jun 17, 2026

Copy link
Copy Markdown
Contributor
  • Implements substrate support for BYO typed agents and the python runtime for declarative agents.
  • Fixes an existing issue where on config changes the previous golden snapshot was still being served. We now correctly serve the newest golden snapshot with the updated config.
  • Fixes an issue where golden snapshot building could use stale config. Substrate caches config values based on the KAGENT_CONFIG_JSON secret key ref name which is static. This can result in stale config being fetched. The secret key ref name now contains the config hash as a suffix so different configs now have a unique cache key.

Testing

  • Tested locally in a kind cluster. Below is an example conversation with a declarative sandbox agent using the python runtime where I switch between model providers to test config rollout.
Screenshot 2026-06-22 at 4 27 29 PM

Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 17, 2026
@chromatic-com

chromatic-com Bot commented Jun 17, 2026

Copy link
Copy Markdown

Warning

Testing paused

Monthly snapshot limit reached. Update your plan for additional snapshots and to resume testing.

@EItanya EItanya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job overall. I have some design questions here about getting rid of old templates and how that should work.

Comment thread go/api/v1alpha2/agent_types.go Outdated
Comment thread python/Dockerfile Outdated
Comment thread go/core/internal/controller/sandboxagent_substrate.go Outdated
Comment thread go/core/internal/controller/sandboxagent_substrate.go Outdated
Comment thread go/core/pkg/sandboxbackend/substrate/agent_actor.go Outdated
Comment thread go/core/pkg/sandboxbackend/substrate/agent_lifecycle.go Outdated
// then surfaces as a gVisor "inconsistent private memory files on restore" error because the
// golden snapshot captures only the pause container. The Go static binary needs none of this.
// Keep in sync with the final-stage ENV block of python/Dockerfile.
func pythonRuntimeImageEnv() []corev1.EnvVar {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels hacky, is there someway we can fix substrate, or the gvisor impl to properly do this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this can be fixed in substrate. I can open a fix for this upstream.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please link to that issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Comment thread Makefile
Comment thread go/api/v1alpha2/agent_types.go Outdated
Comment thread go/core/internal/controller/sandboxagent_substrate.go Outdated
Comment thread go/core/internal/controller/sandboxagent_substrate.go Outdated
Comment thread go/core/pkg/sandboxbackend/substrate/agent_actor.go Outdated
// then surfaces as a gVisor "inconsistent private memory files on restore" error because the
// golden snapshot captures only the pause container. The Go static binary needs none of this.
// Keep in sync with the final-stage ENV block of python/Dockerfile.
func pythonRuntimeImageEnv() []corev1.EnvVar {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this can be fixed in substrate. I can open a fix for this upstream.

jmhbh added 5 commits June 18, 2026 15:49
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
…tion is chosen also updated config secret to be based on a config hash since secret config is cached by substrate and stale config can be fetched since updates to the secret are in place but the secret name doesn't change

Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
@jmhbh jmhbh marked this pull request as ready for review June 22, 2026 20:30
@jmhbh jmhbh requested a review from supreme-gg-gg as a code owner June 22, 2026 20:30
Copilot AI review requested due to automatic review settings June 22, 2026 20:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds full Agent Substrate support for SandboxAgent across declarative (Python/Go) and BYO runtimes, including config-hash based blue/green ActorTemplate handling and new Python runtime image variants.

Changes:

  • Enable Substrate for BYO and Python declarative runtimes (UI + API validation + controller/runtime wiring).
  • Introduce config-hash + desired-generation based ActorTemplate/actor selection to avoid stale goldens during config changes.
  • Add “full” Python runtime/app images and controller digest plumbing to select slim vs full at runtime.

Reviewed changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ui/src/lib/sandboxAgentForm.ts Updates Substrate support rules and runtime defaults in the UI.
ui/src/lib/tests/sandboxAgentForm.test.ts Adjusts tests for updated Substrate support/runtime behavior.
ui/src/components/agent-form/ByoDeploymentFields.tsx Adds byoCmd validation UI affordances.
ui/src/components/agent-form/agent-form-types.ts Adds byoCmd to validation error shape.
ui/src/app/agents/new/page.tsx Shows runtime selector on Substrate; adds Substrate-BYO cmd validation.
scripts/controller-digest-ldflags.sh Adds Python “full” image digest embedding support.
python/packages/kagent-adk/src/kagent/adk/cli.py Materializes Substrate env-injected config before loading config.json.
python/packages/kagent-adk/src/kagent/adk/_config_materialize.py New helper to write env-injected config/token to files.
python/packages/kagent-adk/tests/unittests/test_config_materialize.py Adds unit tests for config materialization behavior.
python/Dockerfile Reworks slim Python ADK image to distroless with copied shared libs + venv.
python/Dockerfile.full Adds full Python ADK image including sandbox runtime + tools venv.
python/Dockerfile.app Documents tag-based base selection; minor formatting.
Makefile Adds build targets/tags for full Python ADK + full app image and wires controller build.
helm/kagent-crds/templates/kagent.dev_sandboxagents.yaml Removes CRD rule blocking BYO+Substrate.
go/api/config/crd/bases/kagent.dev_sandboxagents.yaml Removes CRD rule blocking BYO+Substrate.
go/api/v1alpha2/sandboxagent_types.go Removes BYO+Substrate XValidation restriction.
go/api/v1alpha2/agent_types.go Removes substrate-specific runtime override helper.
go/api/v1alpha2/agent_spec_validation.go Updates Substrate validation: allow Python, require BYO cmd.
go/api/v1alpha2/agent_spec_validation_test.go Updates tests for new Substrate validation semantics.
go/api/v1alpha2/agent_runtime_test.go Reworks tests to validate EffectiveDeclarativeRuntime behavior.
go/core/pkg/consts/annotations.go Adds shared config-hash annotation constant.
go/core/internal/controller/translator/agent/manifest_builder.go Plumbs config secret into sandbox backend build input; centralizes config-hash key.
go/core/pkg/sandboxbackend/backend.go Adds ConfigSecret to sandbox backend BuildInput.
go/core/pkg/sandboxbackend/routing.go Uses backend-specific prune types (vs watch types).
go/core/pkg/sandboxbackend/filter_translator_owned_test.go Updates prune filtering tests for Substrate ActorTemplate lifecycle.
go/core/pkg/sandboxbackend/substrate/agents_backend.go Stops generic pruning for Substrate; clones per-hash config Secret; readiness uses template resolution.
go/core/pkg/sandboxbackend/substrate/agent_lifecycle.go Adds Python+BYO command support; stamps desired-generation; adds per-hash naming helpers.
go/core/pkg/sandboxbackend/substrate/lifecycle_shared.go Adds ActorTemplate resolution helpers (desired-generation + Ready preference).
go/core/pkg/sandboxbackend/substrate/lifecycle_delete.go Deletes goldens for all labeled templates on agent delete.
go/core/pkg/sandboxbackend/substrate/agent_actor.go Resolves current ActorTemplate for chat; session actor IDs incorporate config hash.
go/core/pkg/sandboxbackend/substrate/actor_errors.go Normalizes CreateActor “no free workers” handling.
go/core/pkg/sandboxbackend/substrate/config_hash_test.go Adds tests for config-hash naming and template resolution.
go/core/pkg/sandboxbackend/substrate/agent_lifecycle_test.go Expands tests for declarative (Go/Python) + BYO substrate commands/env.
go/core/pkg/app/app.go Passes kube client into Substrate actor backend for template resolution.
go/core/internal/controller/sandboxagent_substrate.go Centralizes “substrate configured” gating; updates delete path.
go/core/internal/controller/sandboxagent_controller.go Gates substrate reconcile/watch on full substrate wiring.
go/core/internal/controller/translator/agent/deployments.go Selects Python full image when SRT needed; removes substrate runtime override.
go/core/internal/controller/translator/agent/imageconfig_test.go Adds tests for Python full image digest selection.
go/core/internal/controller/translator/agent/digest_testmain_external_test.go Sets full Python digest in external test main.
go/core/internal/controller/translator/agent/remotemcpserver_tls_test.go Switches to shared config-hash const.
go/core/internal/controller/translator/agent/testdata/outputs/*.json Updates expected runtime image selection for skill/code cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ui/src/components/agent-form/ByoDeploymentFields.tsx
Comment thread go/core/pkg/sandboxbackend/substrate/agent_lifecycle.go
Comment thread go/api/v1alpha2/agent_spec_validation.go
Comment thread python/Dockerfile
Comment thread python/Dockerfile.full Outdated
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 22, 2026
jmhbh added 4 commits June 22, 2026 16:50
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
… image to be distroless

Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
return err
}
if r.SubstrateLifecycle != nil {
if r.substrateConfigured() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this r.substrateConfigured() in a follow-up. If substrate is not configured we shouldn't even run this reconciler at all, nothing will work. Right now it's scattered everywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to separate this into a separate substrate controller, I can work on that as a follow up.

jmhbh and others added 2 commits June 24, 2026 14:58
…r is invoked that all actors for a given session are deleted and ensure cleanup matches actor by owning template

Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
@azabris1

Copy link
Copy Markdown

Is there a strong reason for this MR to proceed without skill support? It seems to me like this is one of the biggest advantages of the python substrate runtime. Particularly with this already trailblazer by initial Google ax, not apposed but I want to understand what the blocker is to a workaround for the init container.

@jmhbh

jmhbh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Is there a strong reason for this MR to proceed without skill support? It seems to me like this is one of the biggest advantages of the python substrate runtime. Particularly with this already trailblazer by initial Google ax, not apposed but I want to understand what the blocker is to a workaround for the init container.

hey @azabris1 thanks for your comment. We are planning to add skills but will do so in a follow up PR

@jmhbh jmhbh force-pushed the feat/substrate-python-byo branch from eb67e09 to 785fa5d Compare June 25, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants