Skip to content

feat: sync upstream NVIDIA/OpenShell v0.0.49#9

Closed
Ladas wants to merge 157 commits into
mainfrom
feat/sync-upstream-v0.0.49
Closed

feat: sync upstream NVIDIA/OpenShell v0.0.49#9
Ladas wants to merge 157 commits into
mainfrom
feat/sync-upstream-v0.0.49

Conversation

@Ladas
Copy link
Copy Markdown

@Ladas Ladas commented May 27, 2026

Summary

Sync kagenti fork with NVIDIA/OpenShell upstream main (up to v0.0.49).

The fork was at v0.0.36-kagenti.8 (Apr 23). Upstream has released
13 versions since then (v0.0.37 through v0.0.49, May 26).

Key upstream changes

Merge result

Clean merge — no conflicts.

Next steps

After merge:

  • Tag new release (v0.0.49-kagenti.1)
  • Build and push new gateway + supervisor images
  • Update kagenti/kagenti Helm chart values.yaml with new tags
  • Verify policy normalization fixes #1669
  • Test agent policy with new wildcard handling #1647

Tracking: kagenti/kagenti#1695

Assisted-By: Claude Code

drew and others added 30 commits May 7, 2026 16:05
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
…lse) (NVIDIA#983)

Add opt-in support for Kubernetes user namespace isolation on sandbox
pods. When enabled, container UID 0 maps to an unprivileged host UID
and capabilities become namespaced, providing defense-in-depth for the
supervisor process.

Configuration is two-layered: a cluster-wide default via
OPENSHELL_ENABLE_USER_NAMESPACES (default false) and a per-sandbox
override via the new `user_namespaces` field on SandboxTemplate.

When user namespaces are active, the pod security context is extended
with SETUID, SETGID, and DAC_READ_SEARCH capabilities to match the
bounding-set requirements inside a user namespace.

Introduces SandboxPodParams struct to replace long argument lists on
sandbox_to_k8s_spec and sandbox_template_to_k8s.

Validated end-to-end on OCP 4.22 (K8s 1.35.3, CRI-O 1.35, RHEL
CoreOS, kernel 5.14) with full SSH tunnel and non-identity UID mapping.
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
* feat(providers): support sandbox provider attach lifecycle

Closes NVIDIA#1171

Adds sandbox provider list, attach, and detach API/CLI support while keeping provider policy and credential resolution derived from current sandbox attachments.

* fix(providers): refresh sandbox provider credentials

Adds provider environment revisions and generation-scoped sandbox credential snapshots so future SSH and exec launches pick up provider attach, detach, and credential updates without mutating already-running processes.

Also blocks provider deletion while attached to prevent stale sandbox provider references.

* fix(providers): serialize sandbox object mutations

* test(providers): cover sandbox provider attach lifecycle

* test(providers): accept versioned credential placeholders
Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
* fix(helm): derive grpcEndpoint from chart context

The chart hardcoded server.grpcEndpoint to
https://openshell.openshell.svc.cluster.local:8080, which only matched the
in-cluster Service DNS for the standard release name and namespace. A new
helper now builds <scheme>://<fullname>.<namespace>.svc.cluster.local:<port>
from chart context, picking the scheme from server.disableTls. An explicit
server.grpcEndpoint override is passed through verbatim.

* chore(scripts): validate k3d cluster name length early

helm-k3s-local.sh derives the cluster name from the current branch suffix.
Long branch names produced names exceeding k3d's 32-char cap and failed
deep inside k3d cluster create with a confusing validation error. cmd_create
now bails out before invoking docker/k3d with a copy-pasteable
HELM_K3S_CLUSTER_NAME override hint. Status, start, stop, delete, and help
remain unaffected so an over-long derived name does not block diagnostics.
…1280)

Signed-off-by: Drew Newberry <anewberry@nvidia.com>
…hardcoding (NVIDIA#1282)

Signed-off-by: Saurabh Agarwal <sauagarwa@redhat.com>
* docs(rfc): add agent-driven policy management

* docs(rfc): switch policy MVP to local API

* docs(rfc): clarify policy advisor skill and local logs

* feat(sandbox): add agent-driven policy proposal loop

* test(examples): add codex policy dogfood loop

* refactor(examples): make policy demo agent-agnostic

* refactor(examples): colocate policy validation harness

* docs(examples): add policy demo env sample

* docs(examples): use placeholder env example

* feat(sandbox): wire policy.local denials to OCSF JSONL log

Wires GET /v1/denials?last=N on the sandbox-local policy advisor API to read
recent OCSF JSONL events from /var/log/openshell-ocsf.YYYY-MM-DD.log, filter
to network/L7 denials (action_id=2, class_uid 4001/4002), and return a
compact summary newest-first. Default limit is 10, capped at 100. Ran inside
spawn_blocking so file I/O does not block the policy.local handler.

Other cleanup:

- POST /v1/proposals now uses the typed grpc_client wrapper instead of
  raw_client, so accepted/rejected counts surface to the agent uniformly.
  Wrapper return type extended to the response struct.
- Drop the 'add_rule' snake_case alias in the proposal JSON; canonical form
  is camelCase 'addRule', matching the PolicyMergeOperation convention used
  elsewhere.
- skills/policy_advisor.md updated to match: documents the now-real
  /v1/denials?last=10 endpoint and uses 'addRule' consistently.
- skills.rs test asserts on the canonical 'addRule' phrase rather than the
  removed 'PolicyMergeOperation' substring.

* feat(cli): show L7 protocol/method/path in rule get output

format_endpoint() previously rendered only host:port, dropping protocol,
access, and the L7 rules array. That made openshell rule get text output
unable to distinguish a broad L4 grant from a method/path-scoped L7 REST
rule -- exactly the distinction a developer needs at approval time.

New rendering tags each endpoint with its enforcement layer and surfaces
allow/deny rules:

  bare L4:           api.example:443 [L4]
  L7 read-only:      api.example:443 [L7 rest, access=read-only]
  L7 method/path:    api.example:443 [L7 rest, allow PUT /v1/foo/bar]

Pure display change: no proto, gateway, or behavior changes. Unit test
covers all three rendering cases with synthetic fixtures.

* refactor(examples): rewrite policy demo as Codex-default loop

Re-shape examples/agent-driven-policy-management/ to be a single, clean
end-to-end demonstration of the agent-driven policy loop. A Codex agent
inside an OpenShell sandbox attempts a GitHub Contents API write, hits a
structured 403 from the L7 proxy, reads the policy_advisor skill, drafts a
narrow addRule proposal via http://policy.local/v1/proposals, the host
auto-approves, the sandbox hot-reloads policy, and the agent's retry
succeeds. Whole loop runs in roughly two minutes.

Demo cleanup:

- Drop .env file ceremony. Defaults resolve from gh: owner via
  'gh api user --jq .login', repo defaults to 'openshell-policy-demo',
  token from gh auth token / GITHUB_TOKEN / GH_TOKEN. With gh auth login
  and codex login already done, 'bash demo.sh' Just Works.
- Codex-specific. Bootstraps ~/.codex/auth.json from credentials injected
  by the OpenShell provider, runs codex exec --sandbox danger-full-access
  (OpenShell is the actual security boundary; bwrap nesting cannot create
  user namespaces inside the sandbox container).
- Tighter narrative output: a single 'Preflight' step, a run summary banner
  before launch, an inline narration of what's happening inside the sandbox
  while we poll for the proposal (including the literal structured 403
  body the agent acts on), and an OCSF trace at the end filtered to the
  three events that tell the story (DENY, RELOAD, ALLOW).
- Replace Python heredoc templating with sed; uploads use the single-flag
  pattern (--upload "${PAYLOAD_DIR}:/sandbox") with files referenced at
  the basename-prefixed path that NVIDIA#952 / NVIDIA#1028 established.
- README documents the trust model honestly: structured rule is the
  contract, agent rationale is a hint, prover validation badge in
  progress per RFC 0001.

Move the deterministic no-LLM regression harness out of examples/ into
e2e/policy-advisor/ -- it was a parallel demo, not an example. Same loop
without the LLM, useful for iterating on the proxy and policy.local API.

* style(sandbox,cli): apply rustfmt

Whitespace-only fixups caught by mise run pre-commit. No functional change.

* perf(examples): cap Codex reasoning at 'low' in policy demo

The demo task is mechanical (one HTTP request, parse a structured 403,
post a JSON proposal, retry). Codex's default high-effort reasoning
roughly doubles the demo's wall time without improving outcomes; running
at 'low' lands the same minimal L7 grant in roughly half the time.

Override with DEMO_CODEX_REASONING=medium (or higher) to compare runs.

* fix(sandbox): harden policy.local denials endpoint

Three changes addressing review feedback before merging the agent-driven
policy management MVP:

- Distinguish "OCSF JSONL enabled, no denials" from "OCSF JSONL disabled,
  nothing to read." The endpoint now returns a `log_available` flag and an
  explanatory `note` when the log file is missing, so the in-sandbox agent
  can give the developer an accurate hint instead of a misleading empty
  list.
- Stop echoing the OCSF `message` field in the per-denial summary. The
  proxy's denial messages can include the request path with query string
  (e.g., `?access_token=...`); the structured `host`/`port`/`method`/
  `path`/`binary` fields carry everything the agent needs to draft a
  proposal, and `path` is sourced from `http_request.url.path` which
  already excludes the query string.
- Cap `read_request_body` at a 15s timeout. Bounds slowloris-style stalls
  from a misbehaving in-sandbox process. The proxy listener only accepts
  loopback connections so practical impact is small, but this is cheap
  defense-in-depth.

New tests cover the missing-log signal and the message-redaction guarantee.

* fix(examples): redact tokens in agent log tail and validate DEMO_FILE_DIR

Two small hardening passes on the policy management demo:

- `fail()` now pipes the agent log tail through a redactor that masks the
  GitHub token and Codex credential triple before printing. Codex itself is
  well-behaved about not echoing the token, but a misbehaving tool call
  could leak it; this is a final safety net before the log hits the
  developer's terminal (and any clipboard or chat history that follows).
- `validate_env` now regex-checks DEMO_FILE_DIR with the same allow-list
  the other path-shaped variables use. The value is interpolated through
  sed with `|` as the delimiter when rendering the agent task; rejecting
  unsupported characters keeps the templating predictable and stops a
  user-supplied value from breaking out into a shell context.

* refactor(sandbox): centralize policy.local routes and skill path

Addresses review feedback that the deny body's `next_steps` array and the
route table could drift apart. The route paths and skill location now live
as `pub const`s in `policy_local.rs` and feed both:

- the dispatcher in `route_request` that matches against them
- a new `agent_next_steps()` helper that builds the JSON the L7 deny body
  embeds

`l7/rest.rs::deny_response_body` calls `policy_local::agent_next_steps()`
instead of inlining the array, so adding or renaming a route is a one-line
change in `policy_local.rs` and the agent contract follows automatically.

* feat(sandbox): switch /v1/denials to shorthand log pass-through

Previously /v1/denials parsed `/var/log/openshell-ocsf.*.log` (OCSF JSONL)
and returned structured per-event objects. JSONL is opt-in via
`ocsf_json_enabled`, so the endpoint returned an empty list with a "log
not enabled" hint by default — agents had to navigate a setup step before
the inspect-recent-denials guidance was useful.

Switch to reading the shorthand log at `/var/log/openshell.*.log`, which
is always-on and the same human-readable format `openshell logs` displays.
The endpoint now returns raw shorthand lines (newest first) — the agent
reads them directly, no field parsing.

Tradeoffs:
- Removes the JSONL-on-by-default debate: shorthand is already on, no
  defaults change.
- Updating shorthand is a single-file change in this repo; no schema rev
  needed when we want to add fields.

Implementation:
- `read_recent_denial_lines` walks shorthand log files newest-first,
  filters lines with ` OCSF ` AND ` DENIED ` (the OCSF action label,
  uppercase, space-bounded).
- `collect_shorthand_log_files` matches `openshell.<date>.log`; the
  trailing dot in `SHORTHAND_LOG_PREFIX = "openshell."` excludes
  `openshell-ocsf.<date>.log` so JSONL-on doesn't bleed into responses.
- 4096-byte cap per surfaced line as defense against pathological inputs.
- Skill doc updated to reflect that `/v1/denials` returns raw shorthand
  lines, not structured fields.

Defense-in-depth on query-string secrets:
- `redact_query_strings` strips `?<query>` to `?[redacted]` from each
  surfaced line. The L7 relay path emits OCSF events using
  `redacted_target` (secret-placeholder redaction), but the FORWARD deny
  path in `proxy.rs` populates `OcsfUrl::new("http", host, path, port)`
  and `.message(...)` with the raw request path — query string included.
  Stripping queries at the consumer guards `/v1/denials` regardless of
  whether the upstream emit sites are tightened. The on-disk log is not
  rewritten by this change; that is a separate hardening task tracked
  for the FORWARD path emit sites in proxy.rs.
- `truncate_at_char_boundary` is UTF-8 safe; redaction runs before
  truncation so a cut cannot slice mid-secret.

Tests:
- `recent_denials_returns_newest_first_from_shorthand_lines` covers the
  happy path with mixed allowed/denied/non-OCSF lines.
- `recent_denials_skips_jsonl_log_files` confirms JSONL files don't
  surface even if present.
- `recent_denials_truncates_pathological_lines` covers the cap.
- `is_ocsf_denial_line_filters_correctly` covers the line-level filter.
- `redact_query_strings_removes_query_from_url_token` and
  `redact_query_strings_removes_query_in_reason_tag` cover the redaction
  in both URL token and `[reason:...]` contexts.
- `truncate_at_char_boundary_does_not_panic_on_multibyte` covers the
  UTF-8 safety.

* chore(sandbox): align proto inits with main's L7 GraphQL additions

Post-rebase fixups after NVIDIA#1083 (GraphQL L7 inspection) landed on main and
introduced new fields on the proto types this branch constructs:

- `crates/openshell-sandbox/src/l7/relay.rs`: two `deny_with_redacted_target`
  call sites (REST and GraphQL relay deny paths) now pass the
  `DenyResponseContext` argument that `rest::send_deny_response` expects.
  Both sites pass `host`, `port`, and `binary` from the existing
  `L7EvalContext`, matching the pattern used at the primary deny site.
- `crates/openshell-sandbox/src/policy_local.rs`: `L7Allow`, `L7DenyRule`,
  and `NetworkEndpoint` proto initializers now populate the new GraphQL
  and path-scoping fields with empty defaults. Agent-authored proposals
  via `policy.local` target REST/SQL/L4 today; GraphQL operation matching
  is set on the gateway side or via direct YAML, so empty defaults are
  correct here.

No behavior change. `cargo test -p openshell-sandbox --lib` (650 tests) and
`cargo clippy -p openshell-sandbox --lib --tests -- -D warnings` clean.

* feat(sandbox): gate agent policy proposals behind opt-in feature flag

The agent-driven policy proposal surface delivered by this PR (skill
install, `policy.local` API, `next_steps` array on L7 deny bodies) is
now opt-in via the new `agent_policy_proposals_enabled` setting. Default
false. Same shape as `providers_v2_enabled`: registered in
`openshell-core::settings`, sandbox-level, hot-toggleable via the
existing settings poll loop.

Why: the surface is a novel agent-controlled mutation point in every
sandbox. The per-proposal developer approval gate is a correctness
control, but it doesn't address "should this sandbox have an
agent-authoring API at all" — compliance teams may want that question
closed. The flag is the second gate.

Implementation:
- New registry entry + `AGENT_POLICY_PROPOSALS_ENABLED_KEY` constant in
  `openshell-core::settings`.
- `lib.rs`: process-wide `OnceLock<Arc<AtomicBool>>` mirroring the
  `OCSF_CTX` pattern. `agent_proposals_enabled()` is the single read
  point.
- Initial settings fetch added to `run_sandbox` so skill install honors
  the flag at startup (not just on the poll loop's first tick).
- Skill install in `run_sandbox` is gated on the flag.
- `policy_local::route_request` returns `404 feature_disabled` for all
  routes when the flag is off — including the otherwise-public
  `current_policy` and `denials` routes. When the surface is off it's
  off entirely.
- `policy_local::agent_next_steps` returns an empty array when the flag
  is off so deny bodies don't advertise routes that 404.
- Poll loop updates the atomic on each tick, lazily installs the skill
  on a false→true transition (no claw-back on true→false; stale skill
  on disk is harmless because route + next_steps gate on the live atom).

Tests:
- Shared `test_helpers::ProposalsFlagGuard` mutex+atomic guard for the
  process-wide flag, used across `policy_local::tests` and
  `l7::rest::tests`.
- New: `agent_next_steps_returns_empty_when_flag_off`,
  `agent_next_steps_returns_full_array_when_flag_on`,
  `route_request_returns_feature_disabled_when_flag_off`.
- Updated existing tests that exercise the deny body or the route
  dispatcher to set the flag on first.
- Full sandbox lib test suite: 653 pass, clippy clean.

Demo and e2e:
- `examples/agent-driven-policy-management/demo.sh` and
  `e2e/policy-advisor/test.sh` now snapshot the prior global value of
  the setting, set it to true before sandbox creation (so the
  supervisor's initial poll picks it up), and restore on exit (delete
  if previously unset, otherwise write the prior value back).

Docs:
- RFC 0001 MVP-implementation note documents the flag, default, and
  intended soft-launch posture.

* test(policy-advisor): require proposal opt-in for e2e

* refactor(sandbox): group policy poll loop state

* test(e2e): isolate Kubernetes user namespace test

---------

Co-authored-by: John Myers <9696606+johntmyers@users.noreply.github.com>
…DIA#1287)

The Podman and Kubernetes compute drivers require
OPENSHELL_SSH_HANDSHAKE_SECRET to be set. This was introduced in
2e0afea ("feat(vm): derive guest rootfs from sandbox images (NVIDIA#957)"),
which exempted only the Docker and VM drivers from the check.

The Getting Started instructions in CONTRIBUTING.md didn't mention
the variable, so developers using Podman (the default on systems
where it is installed) hit an opaque configuration error on first run.

Add the export as a separate setup step with a comment explaining
which drivers require it.

Signed-off-by: Russell Bryant <russell.bryant@gmail.com>
The Podman path in the dev gateway script references the
build:docker:supervisor-sideload mise task, which was removed in
d8b8477 ("feat(rpm): add RPM packaging with Packit/COPR and GHA
release publishing (NVIDIA#1126)"). That commit consolidated the sideload
and standalone supervisor image builds into a single
build:docker:supervisor task.

Update the reference so `mise run gateway` works with the Podman
compute driver.

Signed-off-by: Russell Bryant <russell.bryant@gmail.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
…VIDIA#1298)

Disables automountServiceAccountToken in sandbox pods for security
hardening. Sandbox pods should not have access to the Kubernetes API
by default.

Adds test case to verify the pod spec includes the disabled setting.

Signed-off-by: Derek Carr <decarr@redhat.com>
Colima, Lima, Rancher Desktop, and OrbStack all run dockerd inside a
host VM. Their bridge gateway IP is reachable from inside containers
but not from the OpenShell server process running on the host, the
same constraint Docker Desktop has — yet the existing
is_docker_desktop check rejects them, leaving callbacks routed at a
bridge IP nothing on the host can listen on.

Detect these runtimes by daemon Name (Lima sets the VM hostname to
colima*, lima-*, rancher-desktop, orbstack) and supplemental labels
(dev.rancherdesktop.*, dev.orbstack.*), and route them through
host-gateway like Docker Desktop.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
* feat(gpu): honor device IDs in Docker and Podman

Signed-off-by: Evan Lezar <elezar@nvidia.com>

* test(gpu): add Docker and Podman device selection e2e

Signed-off-by: Evan Lezar <elezar@nvidia.com>

* ci(gpu): run Docker GPU e2e workflow

Signed-off-by: Evan Lezar <elezar@nvidia.com>

---------

Signed-off-by: Evan Lezar <elezar@nvidia.com>
…1300)

* feat(k8s): support ImageVolumeSource for supervisor sideload

Add a config-driven switch between two supervisor binary delivery
methods: image-volume (default) mounts the supervisor OCI image
directly as a read-only volume (K8s >= v1.33), while init-container
preserves the existing emptyDir + copy-self pattern for older clusters.

Configurable via --supervisor-sideload-method CLI arg,
OPENSHELL_SUPERVISOR_SIDELOAD_METHOD env var, or Helm
supervisor.sideloadMethod value.

* feat(helm): auto-detect supervisor sideload method from cluster version

Use Helm .Capabilities.KubeVersion to choose the supervisor sideload
method automatically: image-volume on K8s >= v1.33, init-container on
older clusters. An explicit supervisor.sideloadMethod value overrides
auto-detection.

* fix(helm): raise auto-detect threshold to K8s v1.35

The ImageVolume feature gate is only enabled by default starting in
K8s v1.35, not v1.33 (where it is beta but off by default). Clusters
on v1.33-v1.34 can still opt in with an explicit sideloadMethod value.
TaylorMutch and others added 28 commits May 22, 2026 06:02
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
* docs(rfc): add sandbox resource requirements proposal

Signed-off-by: Evan Lezar <elezar@nvidia.com>

* docs(rfc): finalize sandbox resource requirements

---------

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
* fix(cli): add json output for policy get

* test(cli): cover policy get full json output

* fix(cli): address policy get json clippy

---------

Co-authored-by: John Myers <9696606+johntmyers@users.noreply.github.com>
* feat(providers): derive discovery from profiles

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* fix(providers): keep v2 discovery profile-only

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* docs(providers): update providers v2 behavior

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* fix(providers): make github profile read-only

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

---------

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
* fix(homebrew): repair local driver bootstrap state

* fix(bootstrap): satisfy default SAN doc lint
Thread the gateway_insecure flag through gateway_add(), gateway_login(),
and all OIDC HTTP clients so that --gateway-insecure and
OPENSHELL_GATEWAY_INSECURE apply to OIDC discovery, token exchange, and
token refresh requests.

Previously, the flag only affected gRPC connections to the gateway. OIDC
HTTP clients (reqwest::get and http_client) always verified TLS
certificates, causing gateway registration and login to fail when the
OIDC issuer used a self-signed certificate (common on OpenShift with
edge-terminated routes).

Fixes NVIDIA#1534

Signed-off-by: Adel Zaalouk <azaalouk@redhat.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Bumps [docker/login-action](https://github.com/docker/login-action) from 4.1.0 to 4.2.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](docker/login-action@4907a6d...650006c)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: 4.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…IA#1545)

* chore(helm): add missing SPDX header to gateway-config template

* chore(scripts): remove helm templates from license header exclusions

The bypass had no known rationale. Removing it ensures the header
script covers deploy/helm/openshell/templates uniformly going forward.

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

---------

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
…1544)

* ci: pin azure/setup-helm and helm/kind-action to commit SHAs

* chore(python): add py.typed marker for PEP 561 compliance

* ci: use full semver in pinned action version comments

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

---------

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
…tes (NVIDIA#1526)

Extract repeated patterns into shared helpers:

- Add impl_builder_setters! macro to openshell-ocsf/builders that
  generates the identical severity(), status(), and message() setter
  methods present on all 7 OCSF event builders
- Add SandboxContext::apply_common_fields() to consolidate the
  four-line build() finalization (set_status, set_message, set_device,
  set_container) repeated in every builder
- Add driver_utils::sandbox_token_path() to centralize the XDG state
  path construction for sandbox JWT files used by both the Docker and
  Podman drivers
- Add driver_utils::build_capabilities_response() to eliminate the
  identical GetCapabilitiesResponse struct literal repeated across the
  Docker, Podman, and Kubernetes compute drivers
…ror (NVIDIA#1547)

* fix(python): raise SandboxError instead of FileNotFoundError or KeyError

* fix(python): suppress exception chaining in SandboxError raises

Add `from None` to both `raise SandboxError(...)` calls inside `except
FileNotFoundError` blocks to satisfy ruff B904.
…elm-k3s-local (NVIDIA#1539)

macOS ships bash 3.2 which lacks mapfile/readarray. Replace all three
occurrences in configure_ghcr_credentials, cluster_has_image, and
cluster_image_platform with a portable while-read loop, consistent
with the fix applied to docker-build-image.sh in NVIDIA#1334.
Signed-off-by: Ann Marie Fred <afred@redhat.com>
This makes it so you can run the dev gateway and sandbox with:

```
mise run gateway
# in another shell
mise run sandbox
```

Signed-off-by: Kris Hicks <khicks@nvidia.com>
… L4/L7 split (NVIDIA#1412)

* fix(sandbox): add mechanistic smoke test for L4 deny and document the L4/L7 split

The old smoke script exercised an L7 PUT which hung because the denial
aggregator is only wired to L4 CONNECT denies, not L7 enforcement.

Add mechanistic-smoke.sh which triggers an L4 deny, waits for the
aggregator to flush, and asserts a pending chunk appears under
openshell rule get --status pending.

Document the intentional L4-only scope of the mechanistic mapper in
architecture/sandbox.md.

Fixes NVIDIA#1333

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

* refactor(smoke): remove redundant variable inits and merge double step call

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

* fix(smoke): wire mechanistic smoke into mise and guard TMP_DIR

- Initialize TMP_DIR before trap to prevent unbound variable on early exit
- Add e2e:mechanistic-smoke mise task with gateway setup
- Document mechanistic smoke in policy-advisor README

* test(proxy): verify L4 deny enqueues a DenialEvent

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

* fix(proxy): remove unnecessary path qualifications in L4 denial smoke test

---------

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
Signed-off-by: Kris Hicks <khicks@nvidia.com>
Signed-off-by: Kris Hicks <khicks@nvidia.com>
NVIDIA#1585)

On kernels without Landlock (e.g. gVisor's sentry returns ENOSYS for
syscall 444), the previous best_effort path still logged "Applying
Landlock" + "Landlock ruleset built" events even though no enforcement
was happening. Probe at the top of `landlock::prepare` and short-circuit
with a single High-severity "Sandbox Unavailable" finding.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@Ladas
Copy link
Copy Markdown
Author

Ladas commented May 29, 2026

Closing per team discussion — holding fork sync until v0.6.0 release is stable. The correct process is: (1) Sync Fork on GitHub to update main, (2) rebase mvp onto main, (3) evaluate which kagenti patches are still needed. See docs/research/openshell-fork-analysis.md for the full analysis.

@Ladas Ladas closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.