feat: Kubernetes support on AppArmor-enabled host nodes

## Problem Statement

On Canonical Kubernetes clusters running on Ubuntu hosts (and possibly other Kubernetes distros on other AppArmor enabled hosts), OpenShell sandbox pods can receive `CAP_SYS_ADMIN` and `CAP_NET_ADMIN` and still fail during supervisor startup because the runtime/default AppArmor profile blocks the mount operations used by `ip netns add`.

Observed on the local Canonical Kubernetes cluster:

- Kubernetes: `v1.32.13`
- OS image: `Ubuntu 24.04.4 LTS`
- Kernel: `6.17.0-29-generic`
- Runtime: `containerd://1.6.39`

A normal sandbox without a localhost AppArmor profile reached `CrashLoopBackOff` with:

```shell
Network namespace creation failed and proxy mode requires isolation.
Ensure CAP_NET_ADMIN and CAP_SYS_ADMIN are available and iproute2 is installed.
Error: /usr/sbin/ip netns add sandbox-66ed3353 failed: mount --make-shared /run/netns failed: Permission denied
```

A minimal pod with the same relevant capabilities and no localhost AppArmor profile reproduced the same kernel denial:

```shell
+ mkdir -p /run/netns
+ ip netns add aa-default
mount --make-shared /run/netns failed: Permission denied
```

A straightforward fix proves the basic direction: load a localhost AppArmor profile on each node and apply that profile to sandbox pods. That approach is not safe enough as-is for all Kubernetes users because it unconditionally requires `Localhost/openshell-supervisor` and an unconditional privileged loader. Non-AppArmor, SELinux-first, or restricted clusters could fail even though they could otherwise run OpenShell without this AppArmor-specific workaround.


## Proposed Design

Add conditional AppArmor support to the Kubernetes compute driver and Helm chart. The design splits into two pieces:

1. The Kubernetes driver decides whether sandbox pods should request a localhost AppArmor profile.
2. A node-local loader DaemonSet installs that profile onto nodes and advertises readiness through a node label.

### Runtime behavior

When AppArmor is effectively enabled for a sandbox, the driver should inject the following:

```yaml
securityContext:
  appArmorProfile:
    type: Localhost
    localhostProfile: openshell-supervisor
nodeSelector:
  openshell.ai/apparmor-supervisor: loaded
```

The node selector matters because Kubernetes requires `Localhost` profiles to already be loaded on the node where the pod lands. Kubernetes docs also note that the scheduler is not aware of loaded AppArmor profiles and recommend labeling nodes for profile availability.

This behavior should be controlled by configuration in the Kubernetes driver:

- `auto` (default): use the OpenShell AppArmor profile only when at least one schedulable node is known to have successfully loaded it; otherwise create the existing non-AppArmor sandbox pod spec.
- `required`: require a ready AppArmor node and fail sandbox creation with a clear precondition error if none exists.
- `disabled`: never request a localhost AppArmor profile.

This split gives the desired behavior across cluster types: clusters that need it can pick up AppArmor automatically, while clusters without working AppArmor support continue to use the current non-AppArmor pod spec unless the operator explicitly asks for fail-closed behavior.

### Loader behavior

Add an AppArmor loader DaemonSet. It should use a dedicated shell-capable image rather than the distroless gateway image, and it should run under its own ServiceAccount and node-labeling RBAC. The loader DaemonSet should:

1. Run only where cluster policy allows privileged host access.
2. Load `/etc/apparmor.d/openshell-supervisor` on the host.
3. Verify `/sys/kernel/security/apparmor/profiles` contains `openshell-supervisor` after `apparmor_parser` succeeds.
4. Label the node `openshell.ai/apparmor-supervisor=loaded` only after both parser success and profile visibility verification.
5. Remove or clear that label on unsupported/failure paths.

The loader DaemonSet has its own mode:

- `auto` (default): initially, this behaves the same as `required`. A future implementation could use Node Feature Discovery (NFD) to detect which nodes need the profile installed.
- `required`: try to install the profile on all nodes, regardless of NFD deployment.
- `disabled`: do not render the loader and do not attempt to install the profile.

The loader mode controls whether OpenShell tries to install the profile. The driver mode controls whether sandbox pods request the profile. The "just works" defaults should be `loader.mode=auto` and `driver.appArmorMode=auto`: clusters with AppArmor-enabled nodes get the profile when loading succeeds, while clusters where the loader cannot run fall back to the existing non-AppArmor pod spec. Setting `loader.mode=disabled` and `driver.appArmorMode=required` is guaranteed to fail. 

## Minimum AppArmor profile found during testing

Starting from a broad permissive profile for the supervisor, the profile can be stripped down for the current default sandbox path.

The following profile was verified end-to-end with a temporary gateway build that injected `appArmorProfile.type=Localhost` and `localhostProfile=openshell-research` into sandbox pods. `openshell sandbox create --name repro-aa-caps-setid --from base --no-auto-providers --no-tty -- /bin/sh -lc 'echo connected'` succeeded and printed `connected`.

```apparmor
#include <tunables/global>

profile openshell-supervisor flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>

  network,
  file,

  capability sys_admin,
  capability net_admin,
  capability sys_ptrace,
  capability syslog,
  capability setuid,
  capability setgid,

  mount options=(rw, bind) -> /run/netns/**,
  mount options=(rw, rbind) /run/netns/ -> /run/netns/,
  mount options=(rw, rbind) /run/netns -> /run/netns,
  mount options=(rw, rshared) -> /run/netns,
  mount options=(rw, rshared) -> /run/netns/,
  umount /run/netns/**,
}
```

Rules from the broader starting profile that were not required for the verified default path:

- `capability dac_override`
- `capability dac_read_search`
- `capability chown`
- `deny mount -> /proc/**`
- `deny mount -> /sys/**`
- `deny mount -> /dev/**`
- `mount options=(rw, rslave) -> /`
- `umount /sys/`
- `mount fstype=sysfs -> /sys/`
- `signal (send, receive) peer=@{profile_name}`
- `ptrace (trace, tracedby) peer=@{profile_name}`

Important caveats:

- `network,` is required for the actual sandbox, not just for the `ip netns` reproducer. Without it, the supervisor could create the namespace but failed to connect back to the gateway.
- `setuid` and `setgid` are required even when they are not explicitly in the Kubernetes `capabilities.add` list, because they are part of the default Linux capability set unless the pod drops them. Without them, the sandbox failed after namespace setup with `Invalid argument (os error 22)`.
- `dac_read_search` may be needed when Kubernetes user namespaces are enabled because the driver intentionally adds `DAC_READ_SEARCH` for cross-UID `/proc/<pid>/fd` inspection in that mode. The final implementation should either include that capability in the profile unconditionally or render the profile according to the configured sandbox capability set.
- this narrowing investigation only explored creating the sandbox, it did not explore running workloads in it, filesharing, or GPU or other hardware passthrough.


## Alternatives Considered

### Always apply `Localhost/openshell-supervisor`

Rejected. It fixes Ubuntu nodes with the profile loaded, but it breaks nodes where the profile is not loaded. Kubernetes rejects pods that request a missing localhost profile.

### Disable AppArmor support by default and document an opt-in

Rejected as the default because it keeps clusters with AppArmor-enabled nodes broken until the operator discovers the workaround. It remains useful as an explicit `disabled` mode for restricted clusters or operators who do not want any privileged profile loader.

### Node Feature Discovery integration

In theory, Node Feature Discovery (NFD) could be used to avoid running the privileged loader container on nodes where it is detectably unnecessary.

However, NFD by itself would only detect whether the node has AppArmor enabled, not whether or not our specific profile has been loaded.

Additionally, Node Feature Discovery is only compelling here if it reduces where the privileged loader pod is scheduled. Runtime NFD checks inside a loader pod do not materially reduce the security surface: the pod has already been scheduled with host access, and the loader's own local checks can already avoid mutating the host when AppArmor is inactive or unavailable. If using NFD requires adding CRD/discovery/`NodeFeature` read permissions to the loader ServiceAccount, it actually increases the RBAC surface of the loader pod.

Therefore, any NFD integration should meet these constraints:

1. NFD must not be required for correctness. The loader must still work without NFD by probing each node where it runs. NFD is primarily useful for clusters that already run it as trusted node inventory.
2. NFD should be consumed through node labels only. NFD should not expand the loader pod's runtime RBAC beyond `nodes get,patch`, which is the RBAC profile it already has.
3. NFD should be used for scheduler-level loader placement, not just runtime skip logic.

Thus, the flow would look something like:

1. If NFD CRDs are absent, Helm renders the loader DaemonSet without the `openshell.ai/apparmor-configured=true` node selector. The loader schedules broadly and self-detects AppArmor on each node.
2. If NFD CRDs are present, Helm renders the OpenShell `NodeFeatureRule` and renders the loader DaemonSet with a node selector or node affinity for `openshell.ai/apparmor-configured=true`.
3. If NFD later labels one or more nodes `openshell.ai/apparmor-configured=true`, the loader schedules on those nodes and performs the normal AppArmor self-check/load/verify path.
4. If NFD never labels any node `true` because AppArmor is unavailable, the loader remains unscheduled. In `auto` driver mode, sandbox creation falls back to the non-AppArmor pod spec because no node receives `openshell.ai/apparmor-supervisor=loaded`. In `required` driver mode, sandbox creation fails with a clear diagnostic.

#### Limitation: breaks `helm install/upgrade --wait`

In the above flow, if NFD is enabled but no nodes support AppArmor, the loader DaemonSet will remain Pending forever. This will cause `helm upgrade/install --wait` to fail. This limitation by itself is likely enough to block the use of NFD integration today. Potential future Helm development may enable this behavior: https://github.com/helm/helm/issues/12800

#### Limitation: NFD AppArmor detection

Today, as of `v0.18.3`, NFD does not provide any built-in AppArmor detection. That is not necessarily a blocker, because it does allow custom labels through `NodeFeatureRule`s. Those rules must work with the data NFD already exposes, and for our purposes they only need to avoid false negatives: the loader performs the final AppArmor detection, so a rule can safely include some extra nodes as long as it does not exclude nodes that really support AppArmor.

The most relevant built-in signals exposed today are kernel config values:

- `kernel.config.SECURITY_APPARMOR=y`
- `kernel.config.DEFAULT_SECURITY_APPARMOR=y`
- `kernel.config.LSM=landlock,lockdown,yama,integrity,apparmor`

These values are useful, but they still describe kernel configuration rather than the active boot state. The LSMs active for the current boot are exposed by `/sys/kernel/security/lsm`, and boot parameters can override the configured default LSM list. Still, nodes with `kernel.config.SECURITY_APPARMOR=n` categorically cannot enable AppArmor, so limiting the loader to nodes with `kernel.config.SECURITY_APPARMOR=y` is a safe coarse filter.

NFD also provides a SELinux-enabled feature. Today, SELinux and AppArmor are mutually exclusive, but future LSM stacking work means we should avoid making SELinux-specific assumptions central to correctness. However, it does suggest a possible willingness upstream to accept an AppArmor detection feature.

There is a more accurate alternative: NFD's `local` feature source can consume labels written by an external detector under `/etc/kubernetes/node-feature-discovery/features.d/`. An OpenShell detector could read `/sys/kernel/security/lsm` and `/sys/kernel/security/apparmor/profiles` and write `openshell.ai/apparmor-active=true`, which would be more accurate than kernel config matching. In practice, though, that means introducing another node-side detector with host access, which defeats much of the purpose of using NFD to reduce the privileged footprint of the AppArmor loader.

## Agent Investigation

- Loaded the `openshell-cli` skill for CLI workflows.
- Reproduced the current failure on the live Canonical Kubernetes cluster with a normal OpenShell sandbox and with a minimal `ip netns` pod.
- Wrote a permissive profile.
- Temporarily built a gateway image that injected `Localhost/openshell-research` into sandbox pods, imported it into the cluster containerd, and verified that the permissive profile makes sandbox creation work.
- Iteratively removed AppArmor profile rules and retested sandbox creation to identify the smaller profile above.
- Installed NFD `v0.18.3`, confirmed the default labels do not expose AppArmor directly, confirmed raw `NodeFeature` data contains AppArmor kernel config values, and verified a `NodeFeatureRule` can create an AppArmor-configuration label.

## Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Kubernetes support on AppArmor-enabled host nodes #1643

Problem Statement

Proposed Design

Runtime behavior

Loader behavior

Minimum AppArmor profile found during testing

Alternatives Considered

Always apply `Localhost/openshell-supervisor`

Disable AppArmor support by default and document an opt-in

Node Feature Discovery integration

Limitation: breaks `helm install/upgrade --wait`

Limitation: NFD AppArmor detection

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Kubernetes support on AppArmor-enabled host nodes #1643

Description

Problem Statement

Proposed Design

Runtime behavior

Loader behavior

Minimum AppArmor profile found during testing

Alternatives Considered

Always apply Localhost/openshell-supervisor

Disable AppArmor support by default and document an opt-in

Node Feature Discovery integration

Limitation: breaks helm install/upgrade --wait

Limitation: NFD AppArmor detection

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Always apply `Localhost/openshell-supervisor`

Limitation: breaks `helm install/upgrade --wait`