feat(seccomp): ExtraHandler — user-supplied syscall handlers#20
feat(seccomp): ExtraHandler — user-supplied syscall handlers#20dzerik wants to merge 1 commit intomultikernel:mainfrom
Conversation
5f2b730 to
71c5724
Compare
|
Thanks for the PR! Two main issues:
|
b334ab3 to
1d0783d
Compare
Adds a public extension point for downstream crates that need to
register their own seccomp-notification handlers alongside sandlock's
builtin chroot/cow/procfs/network/port_remap logic.
Motivation: downstream crates that want to intercept additional
syscalls in the same supervisor task as sandlock's builtins have no
clean way to do it today — one SECCOMP_FILTER_FLAG_NEW_LISTENER per
process means a single listener, so a second supervisor cannot run
alongside. The only alternative is forking sandlock or patching
notif::supervisor wholesale.
API:
- New type dispatch::ExtraHandler { syscall_nr, handler }.
- New entry Sandbox::run_with_extra_handlers(policy, cmd, extras).
- Existing Sandbox::run() delegates to it with empty extras — zero
behaviour change for current callers.
Ordering contract (documented + tested):
- Builtins register first (chroot path normalization, COW, procfs, …).
- Extras appended last, in the Vec order.
- Chain stops at first non-Continue — user handlers cannot subvert
builtin confinement.
BPF coverage (this is what plumbs extras to the kernel):
- Sandbox::do_spawn collects the syscall numbers from extra_handlers
and threads them into the child via the new ChildSpawnArgs.extra_syscalls
field on context::confine_child.
- The child merges them into notif_syscalls(policy) before
bpf::assemble_filter, with sort + dedup so a syscall registered both
by a builtin and an extra produces a single JEQ.
- Without this step the kernel would never raise USER_NOTIF for a
syscall that has no builtin handler — the dispatch table would
receive nothing and the user handler would silently never fire.
Default-deny bypass guard:
- The cBPF program emits notif JEQs before deny JEQs, so a syscall
present in both lists hits SECCOMP_RET_USER_NOTIF first. An extra
on a DEFAULT_DENY syscall would therefore convert a kernel-deny into
a user-supervised path, and a Continue from the handler would
silently bypass deny.
- Sandbox::run_with_extra_handlers now validates extras against the
policy's deny list at registration time via
dispatch::validate_extras_against_policy and returns
SandboxError::Child naming the offending syscall — no silent footgun.
Internals:
- build_dispatch_table now takes Vec<ExtraHandler> and drains it into
register() calls after builtins.
- notif::supervisor signature extended to accept extras and pass them
through. sandbox.rs moves self.extra_handlers via std::mem::take
on spawn (HandlerFn is Box<dyn Fn> — not Clone).
- confine_child's seven positional parameters packed into
context::ChildSpawnArgs to keep the call site readable.
Docs:
- docs/extension-handlers.md: design rationale, security boundary,
panics policy, non-goals, downstream sketch. Adds §3.0 (BPF-filter
merge semantics) and §3.0.1 (default-deny bypass guard); corrects
the NotifAction variant table (ReturnValue, Kill { sig, pgid }).
- crates/sandlock-core/examples/openat_audit.rs: runnable example.
Tests:
- 4 unit tests on dispatch::extra_handler_tests (ctor, insertion
order, append-after-builtin, empty-extras nop).
- 7 integration tests under tests/integration/test_extra_handlers.rs
exercising the full kernel path:
* extra on SYS_uname (not intercepted by any builtin) returning
Errno(EACCES) reaches the guest;
* Continue lets the kernel resume the syscall;
* empty extras vector preserves baseline behaviour;
* cross-handler ordering: extra on SYS_openat fires after the
/proc-virtualization builtin returns Continue;
* registration on SYS_mount (DEFAULT_DENY) is rejected up-front
with a descriptive error;
* builtin non-Continue blocks extra: openat on /proc/1/cmdline is
rejected by the procfs builtin and is never observed by the
extra (path inspected via process_vm_readv), while a peer
openat on /etc/hostname is observed — proves the chain stops at
first non-Continue end-to-end through the kernel;
* chain of two extras on the same syscall: first returns Continue,
second returns Errno(EACCES) — both counters increment in lock
step (insertion order preserved) and the guest sees the EACCES.
- All 215 unit tests pass; the 178-test integration suite passes
modulo the pre-existing 54-test failure set observed on origin/main
(kernel/capability environment, unrelated to this change).
Minor bump 0.6 → 0.7 suggested.
Signed-off-by: dzerik <dzerik@gmail.com>
1d0783d to
431c207
Compare
|
You were right on both counts — the missing BPF plumbing was the 1. BPF plumbing
While wiring this up I noticed an adjacent footgun: the cBPF program emits 2. Tests that actually exercise dispatchThe unit-level
Cosmetic
Deliberately deferred
Diff stat: 8 files, ~1046 / -9. All 215 unit tests pass; integration |
Summary
Adds a public extension point for downstream crates that need to register their own seccomp-notification handlers alongside sandlock's builtin chroot/cow/procfs/network/port_remap logic.
Motivation. Downstream crates that want to intercept additional syscalls in the same supervisor task as sandlock's builtins have no clean way to do it today — one
SECCOMP_FILTER_FLAG_NEW_LISTENERper process means a single listener, so a second supervisor cannot run alongside. The only alternative is forking sandlock or patchingnotif::supervisorwholesale.API.
dispatch::ExtraHandler { syscall_nr, handler }.Sandbox::run_with_extra_handlers(policy, cmd, extras).Sandbox::run()delegates to it with empty extras — zero behaviour change for current callers.Ordering contract (documented + tested).
Vecorder.Continue— user handlers cannot subvert builtin confinement.Docs.
docs/extension-handlers.md: design rationale, security boundary, panics policy, non-goals, downstream sketch.crates/sandlock-core/examples/openat_audit.rs: runnable example.Minor bump
0.6 → 0.7suggested.Test plan
dispatch::extra_handler_tests(ctor, insertion order, append-after-builtin, empty-extras nop) — passingopenat_audit.rsruns against apython3 -cguest