Skip to content

feat: sandbox runtime and capability policy#1171

Open
AngeloDanducci wants to merge 6 commits into
generative-computing:mainfrom
AngeloDanducci:ad-1021
Open

feat: sandbox runtime and capability policy#1171
AngeloDanducci wants to merge 6 commits into
generative-computing:mainfrom
AngeloDanducci:ad-1021

Conversation

@AngeloDanducci
Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #1021

Description

Allow more granular permissions to be used during sandboxing via capability policy.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

@AngeloDanducci AngeloDanducci requested a review from a team as a code owner May 27, 2026 14:01
@github-actions github-actions Bot added the enhancement New feature or request label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

This comment is managed by a bot. Editing it is fine — checking off boxes, adding notes — but please leave the HTML comment marker on the first line alone, otherwise checklist updates will break.

Requirement PR Checklist

Use this checklist when adding or modifying requirements in mellea/stdlib/requirements/.

Base Class

  • Extends appropriate base class:
    • Requirement - standard requirement
    • ALoraRequirement - uses specialized Intrinsic/Adapter for generation-based validation

Validation Logic

  • validation_fn defined (if using Python-based validation)
    • re-usable functionality within the validation_fn should be separated out into mellea/stdlib/tools/
  • validate returns a ValidationResult with
    • a thunk and context if using a backend to generate
    • a specific reason and score when possible

Integration

  • Requirement exported in mellea/stdlib/requirements/__init__.py or, if you are adding a library of requirements, from your sub-module

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
@psschwei
Copy link
Copy Markdown
Member

General question: if I wanted to test this out, how would I use it?

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really clean redesign — the tier model is a big improvement over the old boolean flags, CapabilityPolicy with its honest ENFORCED_* separation is a nice touch, and make_execution_environment is exactly the right API shape. A few things below need fixing before merge; the rest are suggestions or noticed-in-passing nits.

Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/requirements/python_reqs.py
Comment thread mellea/stdlib/requirements/python_reqs.py
Comment thread mellea/stdlib/tools/execution_policy.py
Comment thread mellea/stdlib/tools/interpreter.py
Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/tools/interpreter.py
Comment thread mellea/stdlib/requirements/python_reqs.py Outdated
Comment thread mellea/stdlib/requirements/python_reqs.py Outdated
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really clean redesign — the tier model is a big improvement over the old boolean flags, CapabilityPolicy with its honest ENFORCED_* separation is a nice touch, and make_execution_environment is exactly the right API shape. A few things below need fixing before merge; the rest are suggestions or noticed-in-passing nits.

Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/requirements/python_reqs.py
Comment thread mellea/stdlib/requirements/python_reqs.py
Comment thread mellea/stdlib/tools/execution_policy.py
Comment thread mellea/stdlib/tools/interpreter.py
Comment thread mellea/stdlib/tools/interpreter.py Outdated
Comment thread mellea/stdlib/tools/interpreter.py
Comment thread mellea/stdlib/requirements/python_reqs.py Outdated
Comment thread mellea/stdlib/requirements/python_reqs.py Outdated
@planetf1 planetf1 dismissed their stale review May 28, 2026 13:01

Posted in error — duplicate of review 4381122017. Please disregard this one.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
@AngeloDanducci
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @planetf1 , I've addressed it all in the most recent commit as well as a small change to help with E2E.

@psschwei , if you want to test this you can try using it similar to this snippet, assuming you have docker/colima/podman running and the sandbox extra installed via uv sync:

from pathlib import Path
from mellea.stdlib.tools import LLMSandboxEnvironment, CapabilityPolicy

policy = CapabilityPolicy(
    timeout=60,
    artifact_export_paths=[Path("/output/result.txt")],
)

with LLMSandboxEnvironment(policy=policy) as env:
    result = env.execute("""
import pathlib
pathlib.Path('/output').mkdir(exist_ok=True)
with open('/output/result.txt', 'w') as f:
    f.write('hello from the sandbox')
""")
    print("success:", result.success)
    print("artifacts:", result.artifacts)
    if result.artifacts:
        print("content:", result.artifacts[0].path.read_text())

You should see a resultant artifact from the sandbox following the policy.

Comment thread test/stdlib/requirements/test_reqlib_python.py
@planetf1
Copy link
Copy Markdown
Contributor

Checked through all the fixes from my earlier review — everything looks good. The timeout regression, leak, legacy int shim, container ID fallback, unconditional warning, and truncation edge case are all addressed correctly.

I also ran some broader variations of your code snippet (capability matrix coverage, static import blocking, one-shot warning, local interpreter, artifact export) and they all behaved as expected. There was a Docker socket timeout on a couple of the sandbox tests but that was down to my local environment rather than anything in the code.

Just the one test failure to sort (comment above) and this is good to go.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
@AngeloDanducci AngeloDanducci enabled auto-merge May 29, 2026 14:58
@AngeloDanducci AngeloDanducci requested a review from planetf1 May 29, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sandbox runtime & capability policy

3 participants