Feature Proposal: Add Data Sensitivity and Trust Boundary Metadata to ToolDefinition to Prevent "Lethal Trifecta" Architectures

** Problem Statement **

The current ToolDefinition object ([Section 3.2](https://aos.owasp.org/spec/instrument/specification/#32-tooldefinition-object)) focuses primarily on the functional schema of a tool—its parameters, types, and descriptions. However, it lacks native security metadata regarding data sensitivity classifications, trust boundaries, and side-effect profiles.

Without this context, runtime policy engines, agents, or harnesses using the OWASP AOS specification cannot programmatically detect or prevent high-risk design patterns, specifically:

- The "Lethal Trifecta" (Simon Willison): Co-locating untrusted input processing, sensitive data access, and external/state-changing capabilities within the same execution context.

- The "Agents Rule of Two" (Meta) [link](https://ai.meta.com/blog/practical-ai-agent-security/): Restricting an agent from simultaneously handling untrusted input, accessing private data, and executing state-changing actions without strict isolation or Human-in-the-Loop (HITL) overrides.

If the specification doesn't provide a structured way to declare these properties, downstream tooling must rely on out-of-band registries, breaking the self-documenting goal of the Instrument Specification.

** Proposed Solution **

Introduce an optional `security_context` object into the `ToolDefinition` schema. This object should explicitly declare the tool's relationship to data classification, trust boundaries, and its impact profile.

```
{
  "name": "send_customer_email",
  "description": "Sends an email update to the customer.",
  "parameters": { ... },
  
  "security_context": {
    "trust_boundary": "sink",
    "data_access": {
      "reads": ["pii", "internal"],
      "writes": ["external_network"]
    },
    "impact_profile": {
      "state_changing": true,
      "external_communication": true
    },
    "required_controls": [
      "human_in_the_loop"
    ]
  }
}
```

** How This Enables Threat Modeling & Policy Enforcement **
With this metadata baked into the specification, an intercepting security harness or gateway can deterministically evaluate the blast radius of a planned execution graph before invoking the model:

Risk Score = func (Untrusted input, Sensitive Data Read, State Change)

If a session combines a tool labeled with trust_boundary: "source" (untrusted input) and another tool labeled with state_changing: true and reads: ["pii"], the orchestration engine can flag a Lethal Trifecta violation and step up authorization or block execution outright.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Add Data Sensitivity and Trust Boundary Metadata to ToolDefinition to Prevent "Lethal Trifecta" Architectures #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Proposal: Add Data Sensitivity and Trust Boundary Metadata to ToolDefinition to Prevent "Lethal Trifecta" Architectures #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions