Skip to content

Feature Proposal: Add Data Sensitivity and Trust Boundary Metadata to ToolDefinition to Prevent "Lethal Trifecta" Architectures #8

@ashayraut

Description

@ashayraut

** Problem Statement **

The current ToolDefinition object (Section 3.2) focuses primarily on the functional schema of a tool—its parameters, types, and descriptions. However, it lacks native security metadata regarding data sensitivity classifications, trust boundaries, and side-effect profiles.

Without this context, runtime policy engines, agents, or harnesses using the OWASP AOS specification cannot programmatically detect or prevent high-risk design patterns, specifically:

  • The "Lethal Trifecta" (Simon Willison): Co-locating untrusted input processing, sensitive data access, and external/state-changing capabilities within the same execution context.

  • The "Agents Rule of Two" (Meta) link: Restricting an agent from simultaneously handling untrusted input, accessing private data, and executing state-changing actions without strict isolation or Human-in-the-Loop (HITL) overrides.

If the specification doesn't provide a structured way to declare these properties, downstream tooling must rely on out-of-band registries, breaking the self-documenting goal of the Instrument Specification.

** Proposed Solution **

Introduce an optional security_context object into the ToolDefinition schema. This object should explicitly declare the tool's relationship to data classification, trust boundaries, and its impact profile.

{
  "name": "send_customer_email",
  "description": "Sends an email update to the customer.",
  "parameters": { ... },
  
  "security_context": {
    "trust_boundary": "sink",
    "data_access": {
      "reads": ["pii", "internal"],
      "writes": ["external_network"]
    },
    "impact_profile": {
      "state_changing": true,
      "external_communication": true
    },
    "required_controls": [
      "human_in_the_loop"
    ]
  }
}

** How This Enables Threat Modeling & Policy Enforcement **
With this metadata baked into the specification, an intercepting security harness or gateway can deterministically evaluate the blast radius of a planned execution graph before invoking the model:

Risk Score = func (Untrusted input, Sensitive Data Read, State Change)

If a session combines a tool labeled with trust_boundary: "source" (untrusted input) and another tool labeled with state_changing: true and reads: ["pii"], the orchestration engine can flag a Lethal Trifecta violation and step up authorization or block execution outright.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions