feat: sync upstream NVIDIA/OpenShell v0.0.49 by Ladas · Pull Request #9 · kagenti/OpenShell

Ladas · 2026-05-27T13:36:11Z

Summary

Sync kagenti fork with NVIDIA/OpenShell upstream main (up to v0.0.49).

The fork was at v0.0.36-kagenti.8 (Apr 23). Upstream has released
13 versions since then (v0.0.37 through v0.0.49, May 26).

Key upstream changes

Policy normalization fixes (port vs ports — 🐛 Bugs around Openshell sandbox policy and normalization kagenti#1669)
Host wildcard validation improvements (🐛 Openshell sandboxes can't access internal cluster services kagenti#1647)
Draft policy recommendation system (SubmitPolicyAnalysis RPCs)
Custom provider profiles (ImportProviderProfiles)
WatchSandbox streaming (status + logs + events)
GPU device selection (gpu_device field)
Decoupled GPU baseline from network policy

Merge result

Clean merge — no conflicts.

Next steps

After merge:

Tag new release (v0.0.49-kagenti.1)
Build and push new gateway + supervisor images
Update kagenti/kagenti Helm chart values.yaml with new tags
Verify policy normalization fixes #1669
Test agent policy with new wildcard handling #1647

Tracking: kagenti/kagenti#1695

Assisted-By: Claude Code

…1252)

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

…lse) (NVIDIA#983) Add opt-in support for Kubernetes user namespace isolation on sandbox pods. When enabled, container UID 0 maps to an unprivileged host UID and capabilities become namespaced, providing defense-in-depth for the supervisor process. Configuration is two-layered: a cluster-wide default via OPENSHELL_ENABLE_USER_NAMESPACES (default false) and a per-sandbox override via the new `user_namespaces` field on SandboxTemplate. When user namespaces are active, the pod security context is extended with SETUID, SETGID, and DAC_READ_SEARCH capabilities to match the bounding-set requirements inside a user namespace. Introduces SandboxPodParams struct to replace long argument lists on sandbox_to_k8s_spec and sandbox_template_to_k8s. Validated end-to-end on OCP 4.22 (K8s 1.35.3, CRI-O 1.35, RHEL CoreOS, kernel 5.14) with full SSH tunnel and non-identity UID mapping.

…VIDIA#1257)

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

* feat(providers): support sandbox provider attach lifecycle Closes NVIDIA#1171 Adds sandbox provider list, attach, and detach API/CLI support while keeping provider policy and credential resolution derived from current sandbox attachments. * fix(providers): refresh sandbox provider credentials Adds provider environment revisions and generation-scoped sandbox credential snapshots so future SSH and exec launches pick up provider attach, detach, and credential updates without mutating already-running processes. Also blocks provider deletion while attached to prevent stale sandbox provider references. * fix(providers): serialize sandbox object mutations * test(providers): cover sandbox provider attach lifecycle * test(providers): accept versioned credential placeholders

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

* fix(helm): derive grpcEndpoint from chart context The chart hardcoded server.grpcEndpoint to https://openshell.openshell.svc.cluster.local:8080, which only matched the in-cluster Service DNS for the standard release name and namespace. A new helper now builds <scheme>://<fullname>.<namespace>.svc.cluster.local:<port> from chart context, picking the scheme from server.disableTls. An explicit server.grpcEndpoint override is passed through verbatim. * chore(scripts): validate k3d cluster name length early helm-k3s-local.sh derives the cluster name from the current branch suffix. Long branch names produced names exceeding k3d's 32-char cap and failed deep inside k3d cluster create with a confusing validation error. cmd_create now bails out before invoking docker/k3d with a copy-pasteable HELM_K3S_CLUSTER_NAME override hint. Status, start, stop, delete, and help remain unaffected so an over-long derived name does not block diagnostics.

…1280) Signed-off-by: Drew Newberry <anewberry@nvidia.com>

…hardcoding (NVIDIA#1282) Signed-off-by: Saurabh Agarwal <sauagarwa@redhat.com>

* docs(rfc): add agent-driven policy management * docs(rfc): switch policy MVP to local API * docs(rfc): clarify policy advisor skill and local logs * feat(sandbox): add agent-driven policy proposal loop * test(examples): add codex policy dogfood loop * refactor(examples): make policy demo agent-agnostic * refactor(examples): colocate policy validation harness * docs(examples): add policy demo env sample * docs(examples): use placeholder env example * feat(sandbox): wire policy.local denials to OCSF JSONL log Wires GET /v1/denials?last=N on the sandbox-local policy advisor API to read recent OCSF JSONL events from /var/log/openshell-ocsf.YYYY-MM-DD.log, filter to network/L7 denials (action_id=2, class_uid 4001/4002), and return a compact summary newest-first. Default limit is 10, capped at 100. Ran inside spawn_blocking so file I/O does not block the policy.local handler. Other cleanup: - POST /v1/proposals now uses the typed grpc_client wrapper instead of raw_client, so accepted/rejected counts surface to the agent uniformly. Wrapper return type extended to the response struct. - Drop the 'add_rule' snake_case alias in the proposal JSON; canonical form is camelCase 'addRule', matching the PolicyMergeOperation convention used elsewhere. - skills/policy_advisor.md updated to match: documents the now-real /v1/denials?last=10 endpoint and uses 'addRule' consistently. - skills.rs test asserts on the canonical 'addRule' phrase rather than the removed 'PolicyMergeOperation' substring. * feat(cli): show L7 protocol/method/path in rule get output format_endpoint() previously rendered only host:port, dropping protocol, access, and the L7 rules array. That made openshell rule get text output unable to distinguish a broad L4 grant from a method/path-scoped L7 REST rule -- exactly the distinction a developer needs at approval time. New rendering tags each endpoint with its enforcement layer and surfaces allow/deny rules: bare L4: api.example:443 [L4] L7 read-only: api.example:443 [L7 rest, access=read-only] L7 method/path: api.example:443 [L7 rest, allow PUT /v1/foo/bar] Pure display change: no proto, gateway, or behavior changes. Unit test covers all three rendering cases with synthetic fixtures. * refactor(examples): rewrite policy demo as Codex-default loop Re-shape examples/agent-driven-policy-management/ to be a single, clean end-to-end demonstration of the agent-driven policy loop. A Codex agent inside an OpenShell sandbox attempts a GitHub Contents API write, hits a structured 403 from the L7 proxy, reads the policy_advisor skill, drafts a narrow addRule proposal via http://policy.local/v1/proposals, the host auto-approves, the sandbox hot-reloads policy, and the agent's retry succeeds. Whole loop runs in roughly two minutes. Demo cleanup: - Drop .env file ceremony. Defaults resolve from gh: owner via 'gh api user --jq .login', repo defaults to 'openshell-policy-demo', token from gh auth token / GITHUB_TOKEN / GH_TOKEN. With gh auth login and codex login already done, 'bash demo.sh' Just Works. - Codex-specific. Bootstraps ~/.codex/auth.json from credentials injected by the OpenShell provider, runs codex exec --sandbox danger-full-access (OpenShell is the actual security boundary; bwrap nesting cannot create user namespaces inside the sandbox container). - Tighter narrative output: a single 'Preflight' step, a run summary banner before launch, an inline narration of what's happening inside the sandbox while we poll for the proposal (including the literal structured 403 body the agent acts on), and an OCSF trace at the end filtered to the three events that tell the story (DENY, RELOAD, ALLOW). - Replace Python heredoc templating with sed; uploads use the single-flag pattern (--upload "${PAYLOAD_DIR}:/sandbox") with files referenced at the basename-prefixed path that NVIDIA#952 / NVIDIA#1028 established. - README documents the trust model honestly: structured rule is the contract, agent rationale is a hint, prover validation badge in progress per RFC 0001. Move the deterministic no-LLM regression harness out of examples/ into e2e/policy-advisor/ -- it was a parallel demo, not an example. Same loop without the LLM, useful for iterating on the proxy and policy.local API. * style(sandbox,cli): apply rustfmt Whitespace-only fixups caught by mise run pre-commit. No functional change. * perf(examples): cap Codex reasoning at 'low' in policy demo The demo task is mechanical (one HTTP request, parse a structured 403, post a JSON proposal, retry). Codex's default high-effort reasoning roughly doubles the demo's wall time without improving outcomes; running at 'low' lands the same minimal L7 grant in roughly half the time. Override with DEMO_CODEX_REASONING=medium (or higher) to compare runs. * fix(sandbox): harden policy.local denials endpoint Three changes addressing review feedback before merging the agent-driven policy management MVP: - Distinguish "OCSF JSONL enabled, no denials" from "OCSF JSONL disabled, nothing to read." The endpoint now returns a `log_available` flag and an explanatory `note` when the log file is missing, so the in-sandbox agent can give the developer an accurate hint instead of a misleading empty list. - Stop echoing the OCSF `message` field in the per-denial summary. The proxy's denial messages can include the request path with query string (e.g., `?access_token=...`); the structured `host`/`port`/`method`/ `path`/`binary` fields carry everything the agent needs to draft a proposal, and `path` is sourced from `http_request.url.path` which already excludes the query string. - Cap `read_request_body` at a 15s timeout. Bounds slowloris-style stalls from a misbehaving in-sandbox process. The proxy listener only accepts loopback connections so practical impact is small, but this is cheap defense-in-depth. New tests cover the missing-log signal and the message-redaction guarantee. * fix(examples): redact tokens in agent log tail and validate DEMO_FILE_DIR Two small hardening passes on the policy management demo: - `fail()` now pipes the agent log tail through a redactor that masks the GitHub token and Codex credential triple before printing. Codex itself is well-behaved about not echoing the token, but a misbehaving tool call could leak it; this is a final safety net before the log hits the developer's terminal (and any clipboard or chat history that follows). - `validate_env` now regex-checks DEMO_FILE_DIR with the same allow-list the other path-shaped variables use. The value is interpolated through sed with `|` as the delimiter when rendering the agent task; rejecting unsupported characters keeps the templating predictable and stops a user-supplied value from breaking out into a shell context. * refactor(sandbox): centralize policy.local routes and skill path Addresses review feedback that the deny body's `next_steps` array and the route table could drift apart. The route paths and skill location now live as `pub const`s in `policy_local.rs` and feed both: - the dispatcher in `route_request` that matches against them - a new `agent_next_steps()` helper that builds the JSON the L7 deny body embeds `l7/rest.rs::deny_response_body` calls `policy_local::agent_next_steps()` instead of inlining the array, so adding or renaming a route is a one-line change in `policy_local.rs` and the agent contract follows automatically. * feat(sandbox): switch /v1/denials to shorthand log pass-through Previously /v1/denials parsed `/var/log/openshell-ocsf.*.log` (OCSF JSONL) and returned structured per-event objects. JSONL is opt-in via `ocsf_json_enabled`, so the endpoint returned an empty list with a "log not enabled" hint by default — agents had to navigate a setup step before the inspect-recent-denials guidance was useful. Switch to reading the shorthand log at `/var/log/openshell.*.log`, which is always-on and the same human-readable format `openshell logs` displays. The endpoint now returns raw shorthand lines (newest first) — the agent reads them directly, no field parsing. Tradeoffs: - Removes the JSONL-on-by-default debate: shorthand is already on, no defaults change. - Updating shorthand is a single-file change in this repo; no schema rev needed when we want to add fields. Implementation: - `read_recent_denial_lines` walks shorthand log files newest-first, filters lines with ` OCSF ` AND ` DENIED ` (the OCSF action label, uppercase, space-bounded). - `collect_shorthand_log_files` matches `openshell.<date>.log`; the trailing dot in `SHORTHAND_LOG_PREFIX = "openshell."` excludes `openshell-ocsf.<date>.log` so JSONL-on doesn't bleed into responses. - 4096-byte cap per surfaced line as defense against pathological inputs. - Skill doc updated to reflect that `/v1/denials` returns raw shorthand lines, not structured fields. Defense-in-depth on query-string secrets: - `redact_query_strings` strips `?<query>` to `?[redacted]` from each surfaced line. The L7 relay path emits OCSF events using `redacted_target` (secret-placeholder redaction), but the FORWARD deny path in `proxy.rs` populates `OcsfUrl::new("http", host, path, port)` and `.message(...)` with the raw request path — query string included. Stripping queries at the consumer guards `/v1/denials` regardless of whether the upstream emit sites are tightened. The on-disk log is not rewritten by this change; that is a separate hardening task tracked for the FORWARD path emit sites in proxy.rs. - `truncate_at_char_boundary` is UTF-8 safe; redaction runs before truncation so a cut cannot slice mid-secret. Tests: - `recent_denials_returns_newest_first_from_shorthand_lines` covers the happy path with mixed allowed/denied/non-OCSF lines. - `recent_denials_skips_jsonl_log_files` confirms JSONL files don't surface even if present. - `recent_denials_truncates_pathological_lines` covers the cap. - `is_ocsf_denial_line_filters_correctly` covers the line-level filter. - `redact_query_strings_removes_query_from_url_token` and `redact_query_strings_removes_query_in_reason_tag` cover the redaction in both URL token and `[reason:...]` contexts. - `truncate_at_char_boundary_does_not_panic_on_multibyte` covers the UTF-8 safety. * chore(sandbox): align proto inits with main's L7 GraphQL additions Post-rebase fixups after NVIDIA#1083 (GraphQL L7 inspection) landed on main and introduced new fields on the proto types this branch constructs: - `crates/openshell-sandbox/src/l7/relay.rs`: two `deny_with_redacted_target` call sites (REST and GraphQL relay deny paths) now pass the `DenyResponseContext` argument that `rest::send_deny_response` expects. Both sites pass `host`, `port`, and `binary` from the existing `L7EvalContext`, matching the pattern used at the primary deny site. - `crates/openshell-sandbox/src/policy_local.rs`: `L7Allow`, `L7DenyRule`, and `NetworkEndpoint` proto initializers now populate the new GraphQL and path-scoping fields with empty defaults. Agent-authored proposals via `policy.local` target REST/SQL/L4 today; GraphQL operation matching is set on the gateway side or via direct YAML, so empty defaults are correct here. No behavior change. `cargo test -p openshell-sandbox --lib` (650 tests) and `cargo clippy -p openshell-sandbox --lib --tests -- -D warnings` clean. * feat(sandbox): gate agent policy proposals behind opt-in feature flag The agent-driven policy proposal surface delivered by this PR (skill install, `policy.local` API, `next_steps` array on L7 deny bodies) is now opt-in via the new `agent_policy_proposals_enabled` setting. Default false. Same shape as `providers_v2_enabled`: registered in `openshell-core::settings`, sandbox-level, hot-toggleable via the existing settings poll loop. Why: the surface is a novel agent-controlled mutation point in every sandbox. The per-proposal developer approval gate is a correctness control, but it doesn't address "should this sandbox have an agent-authoring API at all" — compliance teams may want that question closed. The flag is the second gate. Implementation: - New registry entry + `AGENT_POLICY_PROPOSALS_ENABLED_KEY` constant in `openshell-core::settings`. - `lib.rs`: process-wide `OnceLock<Arc<AtomicBool>>` mirroring the `OCSF_CTX` pattern. `agent_proposals_enabled()` is the single read point. - Initial settings fetch added to `run_sandbox` so skill install honors the flag at startup (not just on the poll loop's first tick). - Skill install in `run_sandbox` is gated on the flag. - `policy_local::route_request` returns `404 feature_disabled` for all routes when the flag is off — including the otherwise-public `current_policy` and `denials` routes. When the surface is off it's off entirely. - `policy_local::agent_next_steps` returns an empty array when the flag is off so deny bodies don't advertise routes that 404. - Poll loop updates the atomic on each tick, lazily installs the skill on a false→true transition (no claw-back on true→false; stale skill on disk is harmless because route + next_steps gate on the live atom). Tests: - Shared `test_helpers::ProposalsFlagGuard` mutex+atomic guard for the process-wide flag, used across `policy_local::tests` and `l7::rest::tests`. - New: `agent_next_steps_returns_empty_when_flag_off`, `agent_next_steps_returns_full_array_when_flag_on`, `route_request_returns_feature_disabled_when_flag_off`. - Updated existing tests that exercise the deny body or the route dispatcher to set the flag on first. - Full sandbox lib test suite: 653 pass, clippy clean. Demo and e2e: - `examples/agent-driven-policy-management/demo.sh` and `e2e/policy-advisor/test.sh` now snapshot the prior global value of the setting, set it to true before sandbox creation (so the supervisor's initial poll picks it up), and restore on exit (delete if previously unset, otherwise write the prior value back). Docs: - RFC 0001 MVP-implementation note documents the flag, default, and intended soft-launch posture. * test(policy-advisor): require proposal opt-in for e2e * refactor(sandbox): group policy poll loop state * test(e2e): isolate Kubernetes user namespace test --------- Co-authored-by: John Myers <9696606+johntmyers@users.noreply.github.com>

…DIA#1287) The Podman and Kubernetes compute drivers require OPENSHELL_SSH_HANDSHAKE_SECRET to be set. This was introduced in 2e0afea ("feat(vm): derive guest rootfs from sandbox images (NVIDIA#957)"), which exempted only the Docker and VM drivers from the check. The Getting Started instructions in CONTRIBUTING.md didn't mention the variable, so developers using Podman (the default on systems where it is installed) hit an opaque configuration error on first run. Add the export as a separate setup step with a comment explaining which drivers require it. Signed-off-by: Russell Bryant <russell.bryant@gmail.com>

The Podman path in the dev gateway script references the build:docker:supervisor-sideload mise task, which was removed in d8b8477 ("feat(rpm): add RPM packaging with Packit/COPR and GHA release publishing (NVIDIA#1126)"). That commit consolidated the sideload and standalone supervisor image builds into a single build:docker:supervisor task. Update the reference so `mise run gateway` works with the Podman compute driver. Signed-off-by: Russell Bryant <russell.bryant@gmail.com>

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

…VIDIA#1298) Disables automountServiceAccountToken in sandbox pods for security hardening. Sandbox pods should not have access to the Kubernetes API by default. Adds test case to verify the pod spec includes the disabled setting. Signed-off-by: Derek Carr <decarr@redhat.com>

Colima, Lima, Rancher Desktop, and OrbStack all run dockerd inside a host VM. Their bridge gateway IP is reachable from inside containers but not from the OpenShell server process running on the host, the same constraint Docker Desktop has — yet the existing is_docker_desktop check rejects them, leaving callbacks routed at a bridge IP nothing on the host can listen on. Detect these runtimes by daemon Name (Lima sets the VM hostname to colima*, lima-*, rancher-desktop, orbstack) and supplemental labels (dev.rancherdesktop.*, dev.orbstack.*), and route them through host-gateway like Docker Desktop. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

* feat(gpu): honor device IDs in Docker and Podman Signed-off-by: Evan Lezar <elezar@nvidia.com> * test(gpu): add Docker and Podman device selection e2e Signed-off-by: Evan Lezar <elezar@nvidia.com> * ci(gpu): run Docker GPU e2e workflow Signed-off-by: Evan Lezar <elezar@nvidia.com> --------- Signed-off-by: Evan Lezar <elezar@nvidia.com>

…1300) * feat(k8s): support ImageVolumeSource for supervisor sideload Add a config-driven switch between two supervisor binary delivery methods: image-volume (default) mounts the supervisor OCI image directly as a read-only volume (K8s >= v1.33), while init-container preserves the existing emptyDir + copy-self pattern for older clusters. Configurable via --supervisor-sideload-method CLI arg, OPENSHELL_SUPERVISOR_SIDELOAD_METHOD env var, or Helm supervisor.sideloadMethod value. * feat(helm): auto-detect supervisor sideload method from cluster version Use Helm .Capabilities.KubeVersion to choose the supervisor sideload method automatically: image-volume on K8s >= v1.33, init-container on older clusters. An explicit supervisor.sideloadMethod value overrides auto-detection. * fix(helm): raise auto-detect threshold to K8s v1.35 The ImageVolume feature gate is only enabled by default starting in K8s v1.35, not v1.33 (where it is beta but off by default). Clusters on v1.33-v1.34 can still opt in with an explicit sideloadMethod value.

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>

* docs(rfc): add sandbox resource requirements proposal Signed-off-by: Evan Lezar <elezar@nvidia.com> * docs(rfc): finalize sandbox resource requirements --------- Signed-off-by: Evan Lezar <elezar@nvidia.com>

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>

* fix(cli): add json output for policy get * test(cli): cover policy get full json output * fix(cli): address policy get json clippy --------- Co-authored-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* feat(providers): derive discovery from profiles Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> * fix(providers): keep v2 discovery profile-only Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> * docs(providers): update providers v2 behavior Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> * fix(providers): make github profile read-only Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> --------- Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* fix(homebrew): repair local driver bootstrap state * fix(bootstrap): satisfy default SAN doc lint

Thread the gateway_insecure flag through gateway_add(), gateway_login(), and all OIDC HTTP clients so that --gateway-insecure and OPENSHELL_GATEWAY_INSECURE apply to OIDC discovery, token exchange, and token refresh requests. Previously, the flag only affected gRPC connections to the gateway. OIDC HTTP clients (reqwest::get and http_client) always verified TLS certificates, causing gateway registration and login to fail when the OIDC issuer used a self-signed certificate (common on OpenShift with edge-terminated routes). Fixes NVIDIA#1534 Signed-off-by: Adel Zaalouk <azaalouk@redhat.com>

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

Bumps [docker/login-action](https://github.com/docker/login-action) from 4.1.0 to 4.2.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@4907a6d...650006c) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: 4.2.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…IA#1545) * chore(helm): add missing SPDX header to gateway-config template * chore(scripts): remove helm templates from license header exclusions The bypass had no known rationale. Removing it ensures the header script covers deploy/helm/openshell/templates uniformly going forward. Signed-off-by: mesutoezdil <mesudozdil@gmail.com> --------- Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

…1544) * ci: pin azure/setup-helm and helm/kind-action to commit SHAs * chore(python): add py.typed marker for PEP 561 compliance * ci: use full semver in pinned action version comments Signed-off-by: mesutoezdil <mesudozdil@gmail.com> --------- Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

…tes (NVIDIA#1526) Extract repeated patterns into shared helpers: - Add impl_builder_setters! macro to openshell-ocsf/builders that generates the identical severity(), status(), and message() setter methods present on all 7 OCSF event builders - Add SandboxContext::apply_common_fields() to consolidate the four-line build() finalization (set_status, set_message, set_device, set_container) repeated in every builder - Add driver_utils::sandbox_token_path() to centralize the XDG state path construction for sandbox JWT files used by both the Docker and Podman drivers - Add driver_utils::build_capabilities_response() to eliminate the identical GetCapabilitiesResponse struct literal repeated across the Docker, Podman, and Kubernetes compute drivers

…ror (NVIDIA#1547) * fix(python): raise SandboxError instead of FileNotFoundError or KeyError * fix(python): suppress exception chaining in SandboxError raises Add `from None` to both `raise SandboxError(...)` calls inside `except FileNotFoundError` blocks to satisfy ruff B904.

…elm-k3s-local (NVIDIA#1539) macOS ships bash 3.2 which lacks mapfile/readarray. Replace all three occurrences in configure_ghcr_credentials, cluster_has_image, and cluster_image_platform with a portable while-read loop, consistent with the fix applied to docker-build-image.sh in NVIDIA#1334.

Signed-off-by: Ann Marie Fred <afred@redhat.com>

This makes it so you can run the dev gateway and sandbox with: ``` mise run gateway # in another shell mise run sandbox ``` Signed-off-by: Kris Hicks <khicks@nvidia.com>

… L4/L7 split (NVIDIA#1412) * fix(sandbox): add mechanistic smoke test for L4 deny and document the L4/L7 split The old smoke script exercised an L7 PUT which hung because the denial aggregator is only wired to L4 CONNECT denies, not L7 enforcement. Add mechanistic-smoke.sh which triggers an L4 deny, waits for the aggregator to flush, and asserts a pending chunk appears under openshell rule get --status pending. Document the intentional L4-only scope of the mechanistic mapper in architecture/sandbox.md. Fixes NVIDIA#1333 Signed-off-by: mesutoezdil <mesudozdil@gmail.com> * refactor(smoke): remove redundant variable inits and merge double step call Signed-off-by: mesutoezdil <mesudozdil@gmail.com> * fix(smoke): wire mechanistic smoke into mise and guard TMP_DIR - Initialize TMP_DIR before trap to prevent unbound variable on early exit - Add e2e:mechanistic-smoke mise task with gateway setup - Document mechanistic smoke in policy-advisor README * test(proxy): verify L4 deny enqueues a DenialEvent Signed-off-by: mesutoezdil <mesudozdil@gmail.com> * fix(proxy): remove unnecessary path qualifications in L4 denial smoke test --------- Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

Signed-off-by: Kris Hicks <khicks@nvidia.com>

NVIDIA#1585) On kernels without Landlock (e.g. gVisor's sentry returns ENOSYS for syscall 444), the previous best_effort path still logged "Applying Landlock" + "Landlock ruleset built" events even though no enforcement was happening. Probe at the top of `landlock::prepare` and short-circuit with a single High-severity "Sandbox Unavailable" finding. Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>

Signed-off-by: Evan Lezar <elezar@nvidia.com>

Ladas · 2026-05-29T08:26:22Z

Closing per team discussion — holding fork sync until v0.6.0 release is stable. The correct process is: (1) Sync Fork on GitHub to update main, (2) rebase mvp onto main, (3) evaluate which kagenti patches are still needed. See docs/research/openshell-fork-analysis.md for the full analysis.

drew and others added 30 commits May 7, 2026 16:05

fix(installer): repair dev install package and service setup (NVIDIA#…

084c93b

…1252)

fix(docker): use supervisor image entrypoint path (NVIDIA#1259)

62619ee

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

fix(vm): harden compute driver socket (NVIDIA#1248)

8ab5ee8

ci(release): run package release canaries (NVIDIA#1256)

52097f2

feat(install): add rpm dev installer support (NVIDIA#1262)

645b880

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

feat(server): add generate-certs subcommand; replace alpine PKI hook (N…

a4efc0b

…VIDIA#1257)

fix(docs): constrain landing terminal height (NVIDIA#1269)

b74d24b

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

ci(os-132): remove stale remote buildx mode (NVIDIA#1267)

3cfc915

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

ci(os-132): remove obsolete shadow workflows (NVIDIA#1273)

31f0345

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

fix(packaging): enable mTLS for local packages (NVIDIA#1271)

daa2a36

fix(installer): stop forcing Homebrew VM driver (NVIDIA#1277)

eec949d

fix(e2e): isolate kubernetes user namespace test (NVIDIA#1276)

b8e8743

fix(install): register local gateway before probing listener (NVIDIA#…

7ad823e

…1280) Signed-off-by: Drew Newberry <anewberry@nvidia.com>

fix(helm): derive sandboxNamespace from Release.Namespace instead of …

4041798

…hardcoding (NVIDIA#1282) Signed-off-by: Saurabh Agarwal <sauagarwa@redhat.com>

chore(installer): promote package install script (NVIDIA#1261)

529be37

fix(installer): guard incompatible v0.0.37 upgrades (NVIDIA#1294)

072f227

fix(docker): add SELinux labeling to bind mounts (NVIDIA#1291)

8d83776

docs(readme): add roadmap and RFC issue guidance (NVIDIA#1284)

4350482

Signed-off-by: Drew Newberry <anewberry@nvidia.com>

docs(rfc): move policy management RFC to 0002 (NVIDIA#1283)

ca63841

(feat) early snap support (NVIDIA#1238)

dfd4768

TaylorMutch and others added 28 commits May 22, 2026 06:02

fix(docker): use host-gateway callbacks on macOS (NVIDIA#1516)

68d4280

ci(e2e): load single-arch images into kind (NVIDIA#1518)

57b71c6

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>

docs(rfc): add sandbox resource requirements proposal (NVIDIA#1360)

18988bd

* docs(rfc): add sandbox resource requirements proposal Signed-off-by: Evan Lezar <elezar@nvidia.com> * docs(rfc): finalize sandbox resource requirements --------- Signed-off-by: Evan Lezar <elezar@nvidia.com>

ci(canary): keep helm jwt secret generation enabled (NVIDIA#1521)

48333e5

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>

fix(cli): add json output for policy get (NVIDIA#1410)

686b24d

* fix(cli): add json output for policy get * test(cli): cover policy get full json output * fix(cli): address policy get json clippy --------- Co-authored-by: John Myers <9696606+johntmyers@users.noreply.github.com>

docs: update NemoClaw/OpenClaw references (NVIDIA#1529)

603b3e2

ci: seed shared Rust caches from main (NVIDIA#1530)

521eccd

fix(release): build host Linux binaries with glibc floor (NVIDIA#1490)

0dc08a1

fix(homebrew): repair local driver bootstrap state (NVIDIA#1527)

7d38aa8

* fix(homebrew): repair local driver bootstrap state * fix(bootstrap): satisfy default SAN doc lint

ci: install cargo-zigbuild from release binaries (NVIDIA#1533)

fbd580b

ci(release): smoke test rpm artifacts on fedora (NVIDIA#1558)

c8d405c

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

ci(release): skip python rpm in gateway smoke test (NVIDIA#1559)

286ce7c

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

docs: add macOS compiler troubleshooting (NVIDIA#1569)

3460e5f

Signed-off-by: Ann Marie Fred <afred@redhat.com>

fix(gateway): configure local dev auth (NVIDIA#1575)

fa84e43

This makes it so you can run the dev gateway and sandbox with: ``` mise run gateway # in another shell mise run sandbox ``` Signed-off-by: Kris Hicks <khicks@nvidia.com>

docs: add Pi as supported sandbox (NVIDIA#1572)

9e5aee4

docs(readme): whitespace (NVIDIA#1578)

47d208c

Signed-off-by: Kris Hicks <khicks@nvidia.com>

fix(cli): replace outdated name reference (NVIDIA#1582)

2e03faf

Signed-off-by: Kris Hicks <khicks@nvidia.com>

fix(sandbox): decouple GPU baseline from network policy (NVIDIA#1524)

c9056bb

Signed-off-by: Evan Lezar <elezar@nvidia.com>

Ladas closed this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sync upstream NVIDIA/OpenShell v0.0.49#9

feat: sync upstream NVIDIA/OpenShell v0.0.49#9
Ladas wants to merge 157 commits into
mainfrom
feat/sync-upstream-v0.0.49

Ladas commented May 27, 2026

Uh oh!

Ladas commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Ladas commented May 27, 2026

Summary

Key upstream changes

Merge result

Next steps

Uh oh!

Ladas commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants