** Problem Statement **
The current ToolDefinition object (Section 3.2) focuses primarily on the functional schema of a tool—its parameters, types, and descriptions. However, it lacks native security metadata regarding data sensitivity classifications, trust boundaries, and side-effect profiles.
Without this context, runtime policy engines, agents, or harnesses using the OWASP AOS specification cannot programmatically detect or prevent high-risk design patterns, specifically:
-
The "Lethal Trifecta" (Simon Willison): Co-locating untrusted input processing, sensitive data access, and external/state-changing capabilities within the same execution context.
-
The "Agents Rule of Two" (Meta) link: Restricting an agent from simultaneously handling untrusted input, accessing private data, and executing state-changing actions without strict isolation or Human-in-the-Loop (HITL) overrides.
If the specification doesn't provide a structured way to declare these properties, downstream tooling must rely on out-of-band registries, breaking the self-documenting goal of the Instrument Specification.
** Proposed Solution **
Introduce an optional security_context object into the ToolDefinition schema. This object should explicitly declare the tool's relationship to data classification, trust boundaries, and its impact profile.
{
"name": "send_customer_email",
"description": "Sends an email update to the customer.",
"parameters": { ... },
"security_context": {
"trust_boundary": "sink",
"data_access": {
"reads": ["pii", "internal"],
"writes": ["external_network"]
},
"impact_profile": {
"state_changing": true,
"external_communication": true
},
"required_controls": [
"human_in_the_loop"
]
}
}
** How This Enables Threat Modeling & Policy Enforcement **
With this metadata baked into the specification, an intercepting security harness or gateway can deterministically evaluate the blast radius of a planned execution graph before invoking the model:
Risk Score = func (Untrusted input, Sensitive Data Read, State Change)
If a session combines a tool labeled with trust_boundary: "source" (untrusted input) and another tool labeled with state_changing: true and reads: ["pii"], the orchestration engine can flag a Lethal Trifecta violation and step up authorization or block execution outright.
** Problem Statement **
The current ToolDefinition object (Section 3.2) focuses primarily on the functional schema of a tool—its parameters, types, and descriptions. However, it lacks native security metadata regarding data sensitivity classifications, trust boundaries, and side-effect profiles.
Without this context, runtime policy engines, agents, or harnesses using the OWASP AOS specification cannot programmatically detect or prevent high-risk design patterns, specifically:
The "Lethal Trifecta" (Simon Willison): Co-locating untrusted input processing, sensitive data access, and external/state-changing capabilities within the same execution context.
The "Agents Rule of Two" (Meta) link: Restricting an agent from simultaneously handling untrusted input, accessing private data, and executing state-changing actions without strict isolation or Human-in-the-Loop (HITL) overrides.
If the specification doesn't provide a structured way to declare these properties, downstream tooling must rely on out-of-band registries, breaking the self-documenting goal of the Instrument Specification.
** Proposed Solution **
Introduce an optional
security_contextobject into theToolDefinitionschema. This object should explicitly declare the tool's relationship to data classification, trust boundaries, and its impact profile.** How This Enables Threat Modeling & Policy Enforcement **
With this metadata baked into the specification, an intercepting security harness or gateway can deterministically evaluate the blast radius of a planned execution graph before invoking the model:
Risk Score = func (Untrusted input, Sensitive Data Read, State Change)
If a session combines a tool labeled with trust_boundary: "source" (untrusted input) and another tool labeled with state_changing: true and reads: ["pii"], the orchestration engine can flag a Lethal Trifecta violation and step up authorization or block execution outright.