diff --git a/openspec/changes/fix-sandbox-capability-enforcement/.openspec.yaml b/openspec/changes/fix-sandbox-capability-enforcement/.openspec.yaml new file mode 100644 index 0000000..18edba1 --- /dev/null +++ b/openspec/changes/fix-sandbox-capability-enforcement/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-06-20 diff --git a/openspec/changes/fix-sandbox-capability-enforcement/design.md b/openspec/changes/fix-sandbox-capability-enforcement/design.md new file mode 100644 index 0000000..5c59184 --- /dev/null +++ b/openspec/changes/fix-sandbox-capability-enforcement/design.md @@ -0,0 +1,40 @@ +## Context + +The sandbox runner (`src/sandbox/runner.js`) spawns child processes to execute skill scripts in isolation. It calls `enforceCapabilities(permissions)` to map skill permissions to resource access rules, but the result is stored in an unused `_rules` variable and never applied to the spawned process. This means spawned processes have unrestricted access regardless of the permissions granted to the skill. + +## Goals / Non-Goals + +**Goals:** +- Apply the `resources` array from `enforceCapabilities()` to the spawned child process +- Use Node.js `--permission` flag to enforce capability restrictions on Node.js scripts +- Ensure existing tests continue to pass +- Add test coverage for permission enforcement + +**Non-Goals:** +- Modifying `enforceCapabilities()` function (it works correctly) +- Supporting `--permission` flag for non-Node.js interpreters (python, bash, ruby, etc.) +- Adding new permission types or changing the permission model + +## Decisions + +1. **Use `resources` array, not `rules` object**: The `resources` array contains the original permission strings (e.g., `filesystem:read`, `network:outbound`) which are directly compatible with Node.js's `--permission` flag. The `rules` object contains structured `{ resource, action }` pairs that would require additional transformation. + +2. **Gate on Node.js scripts only**: The `--permission` flag is Node.js-specific (v20.0+). Adding it to non-Node.js scripts (python, bash, ruby, etc.) would cause errors. We only add the flag when `ext === "js" || ext === "mjs"`. + +3. **Add to `execArgv`, not `args`**: Node.js flags like `--permission` belong in `execArgv` (passed to the Node.js binary itself), not in `args` (passed as script arguments). This aligns with the existing pattern for `--max-old-space-size=512`. + +4. **Only apply when non-empty**: If `resources.length === 0` (no permissions granted), don't add the `--permission` flag. This avoids passing an empty or meaningless flag. + +## Risks / Trade-offs + +- **Node.js version compatibility**: The `--permission` flag requires Node.js v20.0+. If an older version is used, the flag will cause an error. → Mitigation: Only applied when permissions are explicitly granted, and the project requires Node.js 24+ per skill compatibility. +- **Non-Node.js scripts unaffected**: Python, bash, ruby, and other scripts won't have capability restrictions enforced. → Mitigation: This is acceptable as the `--permission` flag is Node.js-specific. Other interpreters would need their own sandboxing mechanisms. +- **Minimal change scope**: This is a surgical fix to a single line. → Mitigation: Low risk of introducing regressions. + +## Migration Plan + +No migration needed. This is a bug fix that makes the sandbox behave as originally intended. The change is backward compatible — processes that had no permissions specified will continue to work as before (no `--permission` flag added). + +## Open Questions + +None. The fix is straightforward and well-understood. \ No newline at end of file diff --git a/openspec/changes/fix-sandbox-capability-enforcement/proposal.md b/openspec/changes/fix-sandbox-capability-enforcement/proposal.md new file mode 100644 index 0000000..64aa2f0 --- /dev/null +++ b/openspec/changes/fix-sandbox-capability-enforcement/proposal.md @@ -0,0 +1,26 @@ +## Why + +The sandbox runner in `src/sandbox/runner.js` calls `enforceCapabilities()` to map skill permissions to resource access rules, but the result is stored in an unused `_rules` variable and never applied to the spawned child process. This means sandbox permissions are declared but never actually enforced, creating a security gap where spawned processes have unrestricted access. + +## What Changes + +- Apply the `resources` array returned by `enforceCapabilities()` to the spawned child process via Node.js `--permission` flag +- Change `_rules` to `resources` in `src/sandbox/runner.js` line 125 to reflect actual usage +- Add `--permission` flag to `execArgv` when spawning Node.js scripts with non-empty permissions +- Add test to verify permissions are actually applied to the spawned process + +## Capabilities + +### New Capabilities + + +### Modified Capabilities +- `sandbox-runner`: Requirement changed — spawned processes must now have capabilities enforced via `--permission` flag based on skill permissions + +## Impact + +- `src/sandbox/runner.js` — `runSandbox()` function, line 125 +- `src/sandbox/capability.js` — `enforceCapabilities()` function (no changes, but referenced) +- `tests/unit/sandbox.test.js` — existing sandbox runner tests +- Only affects Node.js script execution (python, bash, ruby, etc. are unaffected) +- Requires Node.js v20.0+ for `--permission` flag support \ No newline at end of file diff --git a/openspec/changes/fix-sandbox-capability-enforcement/specs/sandbox-rte/spec.md b/openspec/changes/fix-sandbox-capability-enforcement/specs/sandbox-rte/spec.md new file mode 100644 index 0000000..ab44cbe --- /dev/null +++ b/openspec/changes/fix-sandbox-capability-enforcement/specs/sandbox-rte/spec.md @@ -0,0 +1,20 @@ +## MODIFIED Requirements + +### Requirement: Capability Restriction via Permission Model +The sandbox SHALL enforce a capability model where each skill's granted permissions (`config.yaml` or skill metadata) determine what resources the child process can access. For Node.js scripts, enforcement is achieved by passing the `--permission` flag with the computed resource list to the spawned process. For non-Node.js interpreters (python, bash, ruby, etc.), the capability model is enforced at the application level via path resolution, URL filtering, and environment variable isolation. + +#### Scenario: Skill with no permissions runs in minimal sandbox +- **WHEN** a skill declares zero permissions in its metadata +- **THEN** the child process receives read-only access to its own sandbox directory only + +#### Scenario: Skill with network permission makes allowed request +- **WHEN** a skill declares `network:outbound` permission +- **THEN** the child process is allowed to make HTTP requests to allowlisted URLs + +#### Scenario: Node.js script receives --permission flag +- **WHEN** a Node.js script (.js, .mjs, .ts) is executed with permissions like `["filesystem:read", "network:outbound"]` +- **THEN** the spawned process receives the `--permission filesystem:read,network:outbound` flag in its execArgv + +#### Scenario: Non-Node.js script does not receive --permission flag +- **WHEN** a non-Node.js script (.py, .sh, .rb) is executed with permissions +- **THEN** the spawned process does not receive the `--permission` flag (capability enforcement is handled at the application level) \ No newline at end of file diff --git a/openspec/changes/fix-sandbox-capability-enforcement/tasks.md b/openspec/changes/fix-sandbox-capability-enforcement/tasks.md new file mode 100644 index 0000000..963f685 --- /dev/null +++ b/openspec/changes/fix-sandbox-capability-enforcement/tasks.md @@ -0,0 +1,17 @@ +## 1. Implementation + +- [x] 1.1 Change `_rules` to `resources` in `src/sandbox/runner.js` line 125 +- [x] 1.2 Add `--permission` flag to `execArgv` when spawning Node.js scripts with non-empty permissions +- [x] 1.3 Ensure `--permission` flag is only added for `.js`, `.mjs`, `.ts` extensions + +## 2. Testing + +- [x] 2.1 Update existing test at line 412-424 to verify permissions are applied to spawned process +- [x] 2.2 Add test for Node.js script with permissions receiving `--permission` flag +- [x] 2.3 Add test for non-Node.js script not receiving `--permission` flag + +## 3. Verification + +- [ ] 3.1 Run full test suite (`npm run test`) +- [ ] 3.2 Run lint (`npm run lint`) +- [ ] 3.3 Run coverage (`npm run coverage`) \ No newline at end of file diff --git a/src/agent/react.js b/src/agent/react.js index 166ba70..140966d 100644 --- a/src/agent/react.js +++ b/src/agent/react.js @@ -1,5 +1,12 @@ import { createReactAgent as createReactAgentGraph } from "@langchain/langgraph/prebuilt"; -import { HumanMessage, HumanMessageChunk, SystemMessage, AIMessage, AIMessageChunk, ToolMessage } from "@langchain/core/messages"; +import { + HumanMessage, + HumanMessageChunk, + SystemMessage, + AIMessage, + AIMessageChunk, + ToolMessage, +} from "@langchain/core/messages"; import { extractContextLength, isContextLengthError, @@ -66,7 +73,13 @@ const CONTEXT_TOO_LONG_MESSAGE = "The conversation is too long. Please start a n * @returns {ReturnType} A compiled ReAct agent */ /* node:coverage ignore next */ -export function createReactAgent(model, tools = [], checkpointer = null, recursionLimit = null, timeout = 600000) { +export function createReactAgent( + model, + tools = [], + checkpointer = null, + recursionLimit = null, + timeout = 600000, +) { const agent = createReactAgentGraph({ llm: model, tools, @@ -115,7 +128,16 @@ export async function callReactAgent(agent, message, config, systemPrompt, callb } if (callback) { - return callReactAgentStreaming(agent, messages, message, config, callback, options, systemPrompt, recursionLimit); + return callReactAgentStreaming( + agent, + messages, + message, + config, + callback, + options, + systemPrompt, + recursionLimit, + ); } // Cache-aside: check cache before invoking the agent @@ -142,11 +164,11 @@ export async function callReactAgent(agent, message, config, systemPrompt, callb const result = await agent.invoke({ messages }, invokeConfig); const content = extractContent(result, message); - // Cache the result on miss (only if no tools were used) - const hasToolCalls = result.messages?.some(m => m.tool_calls?.length > 0) ?? false; - if (cacheKey && !hasToolCalls) { - getCache().set(cacheKey, content.content); - } + // Cache the result on miss (only if no tools were used) + const hasToolCalls = result.messages?.some((m) => m.tool_calls?.length > 0) ?? false; + if (cacheKey && !hasToolCalls) { + getCache().set(cacheKey, content.content); + } return content; } catch (err) { diff --git a/src/cache/llm_cache.js b/src/cache/llm_cache.js index 196b97f..bd515e8 100644 --- a/src/cache/llm_cache.js +++ b/src/cache/llm_cache.js @@ -44,4 +44,4 @@ export function createLlmCache(size, ttl) { }, _lru: cache, }; -} \ No newline at end of file +} diff --git a/src/sandbox/runner.js b/src/sandbox/runner.js index 2fc0d47..d739200 100644 --- a/src/sandbox/runner.js +++ b/src/sandbox/runner.js @@ -122,7 +122,7 @@ export async function runSandbox(options) { cwd = process.cwd(), } = options; - const { _rules } = enforceCapabilities(permissions); + const { resources } = enforceCapabilities(permissions); let env = filterEnv(process.env, whitelist); @@ -144,9 +144,13 @@ export async function runSandbox(options) { // Build exec args based on interpreter type const execArgs = []; - if (ext === "js" || ext === "mjs") { + if (ext === "js" || ext === "mjs" || ext === "ts") { // Pass memory limit to Node.js scripts execArgs.push("--max-old-space-size=512"); + // Apply capability restrictions via --permission flag + if (resources.length > 0) { + execArgs.push(`--permission=${resources.join(",")}`); + } } const child = spawn(interp.command, [...interp.args, script], { diff --git a/src/tui/app.js b/src/tui/app.js index 4438419..1eb663d 100644 --- a/src/tui/app.js +++ b/src/tui/app.js @@ -74,20 +74,20 @@ export default function App({ const providerConfig = config?.providers?.[providerName] || {}; const modelName = providerConfig.model || "gpt-4o"; const encoding = providerConfig.encoding; - + // Calculate conversation tokens let totalTokens = calculateConversationTokens(conversation, modelName, encoding); - + // Add system prompt tokens const systemPrompt = loadSystemPrompt(); if (systemPrompt) { totalTokens += calculateConversationTokens( [{ role: "system", content: systemPrompt }], modelName, - encoding + encoding, ); } - + setContextSize(totalTokens); } return () => { @@ -165,7 +165,11 @@ export default function App({ _executeSkill: (skillName, _args) => { const skill = registry.get(skillName); if (!skill) { - return { action: "skill", subAction: "error", message: `Skill "${skillName}" not found.` }; + return { + action: "skill", + subAction: "error", + message: `Skill "${skillName}" not found.`, + }; } // Skills are prompt-based instructions for the agent to interpret and execute. // Load the SKILL.md and pass it to the conversation so the agent can use it. @@ -175,7 +179,9 @@ export default function App({ subAction: "load", name: skillName, skillBody: body || "", - message: body ? `Skill "${skillName}" loaded.\n${body}` : `Skill "${skillName}" loaded. No instructions found.`, + message: body + ? `Skill "${skillName}" loaded.\n${body}` + : `Skill "${skillName}" loaded. No instructions found.`, }; }, }); @@ -345,8 +351,8 @@ export default function App({ // Silently ignore streaming callback errors } }, - abortControllerRef.current?.signal, - ); + abortControllerRef.current?.signal, + ); // Store the promise so handleInterrupt can await it dispatchPromiseRef.current = dispatchPromise; @@ -458,12 +464,12 @@ export default function App({ } } catch (_cbErr) {} }, - abortControllerRef.current?.signal, - ); - // Update the ref so handleInterrupt can await this promise too - dispatchPromiseRef.current = continuePromise; - await continuePromise; - setStatusMessage("Done"); + abortControllerRef.current?.signal, + ); + // Update the ref so handleInterrupt can await this promise too + dispatchPromiseRef.current = continuePromise; + await continuePromise; + setStatusMessage("Done"); } catch (contErr) { setStatusMessage(`Error continuing: ${contErr.message}`); } finally { @@ -535,7 +541,12 @@ export default function App({ } else if (result.action !== "help" && result.action !== "skill") { setStatusMessage(result.message || result.action + " executed"); } - if (result.message && result.action !== "provider" && result.action !== "schedule" && result.action !== "skill") { + if ( + result.message && + result.action !== "provider" && + result.action !== "schedule" && + result.action !== "skill" + ) { addMessage({ role: "system", content: result.message }); } } catch (err) { @@ -563,7 +574,7 @@ export default function App({ const providerConfig = config?.providers?.[providerName] || {}; const modelName = providerConfig.model || "gpt-4o"; const encoding = providerConfig.encoding; - + // Calculate conversation tokens + system prompt let totalTokens = calculateConversationTokens(conversation, modelName, encoding); const systemPrompt = loadSystemPrompt(); @@ -571,7 +582,7 @@ export default function App({ totalTokens += calculateConversationTokens( [{ role: "system", content: systemPrompt }], modelName, - encoding + encoding, ); } setContextSize(totalTokens); @@ -700,8 +711,8 @@ export default function App({ // Silently ignore streaming callback errors } }, - abortControllerRef.current?.signal, - ); + abortControllerRef.current?.signal, + ); // Store the promise so handleInterrupt can await it dispatchPromiseRef.current = dispatchPromise; @@ -820,12 +831,12 @@ export default function App({ } } catch (_cbErr) {} }, - abortControllerRef.current?.signal, - ); - // Update the ref so handleInterrupt can await this promise too - dispatchPromiseRef.current = continuePromise; - await continuePromise; - setStatusMessage("Received response"); + abortControllerRef.current?.signal, + ); + // Update the ref so handleInterrupt can await this promise too + dispatchPromiseRef.current = continuePromise; + await continuePromise; + setStatusMessage("Received response"); } catch (contErr) { setStatusMessage(`Error continuing: ${contErr.message}`); } finally { @@ -873,7 +884,7 @@ export default function App({ const providerConfig = config?.providers?.[providerName] || {}; const modelName = providerConfig.model || "gpt-4o"; const encoding = providerConfig.encoding; - + // Calculate conversation tokens + system prompt let totalTokens = calculateConversationTokens(conversation, modelName, encoding); const systemPrompt = loadSystemPrompt(); @@ -881,7 +892,7 @@ export default function App({ totalTokens += calculateConversationTokens( [{ role: "system", content: systemPrompt }], modelName, - encoding + encoding, ); } setContextSize(totalTokens); diff --git a/src/tui/contextTokens.js b/src/tui/contextTokens.js index 9a047d7..7a08c49 100644 --- a/src/tui/contextTokens.js +++ b/src/tui/contextTokens.js @@ -13,9 +13,7 @@ export function calculateConversationTokens(conversation, modelName, encoding) { // Resolve encoder: env var takes priority, then config, then derive from model name. const encoderName = - process.env.OPENAI_ENCODING || - encoding || - (modelName ? modelName.split(":")[0] : "gpt-4o"); + process.env.OPENAI_ENCODING || encoding || (modelName ? modelName.split(":")[0] : "gpt-4o"); let tiktoken; try { diff --git a/src/tui/inputPanel.js b/src/tui/inputPanel.js index 88ded20..05db820 100644 --- a/src/tui/inputPanel.js +++ b/src/tui/inputPanel.js @@ -16,11 +16,7 @@ export function Blink({ text = "", char = "\u2588", cursorColor }) { return React.createElement( Box, { flexDirection: "row" }, - React.createElement( - Text, - { key: "cursor", color: cursorColor || "white" }, - cursorStr, - ), + React.createElement(Text, { key: "cursor", color: cursorColor || "white" }, cursorStr), ); } diff --git a/src/tui/statusBar.js b/src/tui/statusBar.js index edf2e29..9a1bec5 100644 --- a/src/tui/statusBar.js +++ b/src/tui/statusBar.js @@ -49,9 +49,10 @@ export function formatSize(bytes) { const exp = Math.floor(Math.log(bytes) / Math.log(1024)); const value = bytes / Math.pow(1024, exp); const locale = Intl.DateTimeFormat().resolvedOptions().locale; - const formatted = value % 1 === 0 - ? new Intl.NumberFormat(locale, { maximumFractionDigits: 0 }).format(Math.round(value)) - : new Intl.NumberFormat(locale, { maximumFractionDigits: 1 }).format(value); + const formatted = + value % 1 === 0 + ? new Intl.NumberFormat(locale, { maximumFractionDigits: 0 }).format(Math.round(value)) + : new Intl.NumberFormat(locale, { maximumFractionDigits: 1 }).format(value); return formatted + units[exp - 1]; } diff --git a/tests/unit/cache/llm_cache.test.js b/tests/unit/cache/llm_cache.test.js index ce08f19..e182cff 100644 --- a/tests/unit/cache/llm_cache.test.js +++ b/tests/unit/cache/llm_cache.test.js @@ -255,11 +255,13 @@ describe("LLM Cache - Integration with callReactAgent", () => { invoke: async () => { invokeCount++; return { - messages: [{ - type: "ai", - content: `Response ${invokeCount}`, - tool_calls: [{ name: "test_tool", args: {} }] - }], + messages: [ + { + type: "ai", + content: `Response ${invokeCount}`, + tool_calls: [{ name: "test_tool", args: {} }], + }, + ], }; }, }; @@ -274,4 +276,4 @@ describe("LLM Cache - Integration with callReactAgent", () => { await callReactAgent(mockAgent, "Hello", config); assert.strictEqual(invokeCount, 2); // Agent should be called again }); -}); \ No newline at end of file +}); diff --git a/tests/unit/sandbox.test.js b/tests/unit/sandbox.test.js index 4692bc5..be93865 100644 --- a/tests/unit/sandbox.test.js +++ b/tests/unit/sandbox.test.js @@ -349,6 +349,28 @@ function createTestScript(stdout, stderr, exitCode) { return { scriptPath, cleanup }; } +function createExecArgvScript() { + const testDir = join(tmpdir(), "madz-sandbox-exec-argv-test-" + Date.now()); + mkdirSync(testDir, { recursive: true }); + const scriptPath = join(testDir, "exec-argv-test.js"); + writeFileSync( + scriptPath, + [ + '"use strict";', + `process.stdout.write(JSON.stringify(process.execArgv));`, + `process.exit(0);`, + ].join("\n"), + ); + const cleanup = () => { + try { + rmSync(testDir, { recursive: true, force: true }); + } catch { + // ignore cleanup errors + } + }; + return { scriptPath, cleanup }; +} + describe("sandbox - runner (spawn)", () => { it("captures stdout, stderr, and exitCode from child process", async () => { const { scriptPath, cleanup } = createTestScript("hello", "oops", 0); @@ -410,19 +432,63 @@ describe("sandbox - runner (spawn)", () => { }); it("applies permissions via enforceCapabilities", async () => { - const { scriptPath, cleanup } = createTestScript("", "", 0); + const { scriptPath, cleanup } = createExecArgvScript(); try { const result = await runSandbox({ script: scriptPath, permissions: ["filesystem:read"], skillName: "test", }); - assert.strictEqual(result.exitCode, 0); + const execArgv = JSON.parse(result.stdout); + assert.ok( + execArgv.includes("--permission=filesystem:read"), + "execArgv should include --permission=filesystem:read", + ); } finally { cleanup(); } }); + it("applies multiple permissions via enforceCapabilities", async () => { + const { scriptPath, cleanup } = createExecArgvScript(); + try { + const result = await runSandbox({ + script: scriptPath, + permissions: ["filesystem:read", "network:outbound"], + skillName: "test", + }); + const execArgv = JSON.parse(result.stdout); + assert.ok( + execArgv.includes("--permission=filesystem:read,network:outbound"), + "execArgv should include --permission=filesystem:read,network:outbound", + ); + } finally { + cleanup(); + } + }); + + it("does not add --permission flag for non-Node.js scripts", async () => { + const testDir = join(tmpdir(), "madz-sandbox-non-node-test-" + Date.now()); + mkdirSync(testDir, { recursive: true }); + const scriptPath = join(testDir, "test-script.sh"); + writeFileSync(scriptPath, ["#!/usr/bin/env bash", 'echo "hello"'].join("\n")); + try { + const result = await runSandbox({ + script: scriptPath, + permissions: ["filesystem:read"], + skillName: "test", + }); + assert.strictEqual(result.exitCode, 0); + assert.strictEqual(result.stdout.trim(), "hello"); + } finally { + try { + rmSync(testDir, { recursive: true, force: true }); + } catch { + // ignore cleanup errors + } + } + }); + it("honors custom timeout value", async () => { const { scriptPath, cleanup } = createTestScript("", "", 0); try {