Skip to content

Feat: 🛠️ Add client-executed tools & human-in-the-loop tool approval flow#932

Open
vinitkadam03 wants to merge 8 commits intoprism-php:mainfrom
vinitkadam03:feat/client-executed-tools-and-tool-approval
Open

Feat: 🛠️ Add client-executed tools & human-in-the-loop tool approval flow#932
vinitkadam03 wants to merge 8 commits intoprism-php:mainfrom
vinitkadam03:feat/client-executed-tools-and-tool-approval

Conversation

@vinitkadam03
Copy link
Contributor

@vinitkadam03 vinitkadam03 commented Mar 1, 2026

Summary

This PR introduces support for client-executed tools that are intended to be executed by the client/caller rather than by Prism, and a tool approval flow for tools that require explicit user consent before server-side execution.


1. Client Executed Tools

Motivation

Client-executed tools enable scenarios where tool execution must happen on the client side, such as:

  • Interactive user input - Rendering forms, confirmations, or option selectors based on tool call params passed by llm, then continuing the conversation with the user's selection (similar to how AI coding assistants ask clarifying questions during agentic workflows)
  • Browser automation - Controlling UI elements, clicking buttons, or navigating pages
  • Frontend-only operations - Accessing browser APIs, local storage, or device capabilities
  • Any tool where the server should not (or cannot) execute the logic

Behavior:

  • Client-executed tools are skipped during tool execution
  • Server-executed tools in the same request are still executed normally
  • When client-executed tools are detected, execution stops and control is returned to the caller
  • The LLM is not called for the next turn, allowing the client to execute the tool and continue the conversation
  • Response/stream ends with FinishReason::ToolCalls

Usage Example

use Prism\Prism\Facades\Tool;

// Explicit declaration (recommended)
$clientTool = Tool::as('browser_action')
    ->for('Perform an action in the user\'s browser')
    ->withStringParameter('action', 'The action to perform')
    ->clientExecuted();

// Implicit declaration (also works - omit using())
$clientTool = Tool::as('browser_action')
    ->for('Perform an action in the user\'s browser')
    ->withStringParameter('action', 'The action to perform');

2. Tool Approval Flow

Motivation

Tool approval enables scenarios where the server can execute a tool but should only do so after explicit user consent:

  • Destructive operations — File deletion, database mutations, account changes
  • Sensitive actions — Payments, sending emails, publishing content
  • Compliance requirements — Audit trails requiring explicit user authorization
  • Any tool where the server has the handler but needs human-in-the-loop approval before execution

The flow is stateless and operates in two phases.

Phase 1: Approval Request (stream stops)

When the LLM calls an approval-required tool, Prism emits a ToolApprovalRequestEvent and stops — returning control to the client.

Event chain (streaming):

StreamStartEvent → StepStartEvent → ToolCallEvent → ToolApprovalRequestEvent → StepFinishEvent → StreamEndEvent(ToolCalls)

Phase 2: Approval Resolution (tool executes, LLM continues)

The client sends a new request with messages containing ToolApprovalResponses. Before making the HTTP call to the LLM, resolveToolApprovalsAndYieldEvents() executes approved tools and creates denial results for denied ones. The LLM then continues the conversation with the tool results.

Event chain (streaming):

StreamStartEvent → ToolResultEvent → StepStartEvent → TextStartEvent → TextDeltaEvent → TextCompleteEvent → StepFinishEvent → StreamEndEvent(Stop)

Key behaviors:

  • StreamStartEvent is emitted from approval resolution (before the HTTP call), so the client knows the stream is live before tool results arrive
  • No duplicate StreamStartEvent — once emitted, the StreamState suppresses it from the subsequent HTTP stream
  • tool-output-available arrives without a prior tool-input-available in Phase 2 (that was sent in Phase 1)
  • Tool calls without any approval response default to denial
  • Denied tools yield a failed ToolResultEvent with the denial reason
  • A merged ToolResultMessage (existing results + resolved results) is placed on the request before calling the LLM
┌─────────────────────────────────────────────────────────────────┐
│                     REQUEST 1                                   │
│                                                                 │
│  Frontend ──► Prism ──► LLM                                     │
│                          │                                      │
│                          ▼                                      │
│                   LLM responds with                             │
│                   tool_calls: [                                 │
│                     { name: "delete_file", args: {path:...} }   │
│                   ]                                             │
│                          │                                      │
│                          ▼                                      │
│                    callTools()                                  │
│                          │                                      │
│               Tool has requiresApproval()?                      │
│                    │              │                             │
│                   NO             YES                            │
│                    │              │                             │
│                    ▼              ▼                             │
│             Execute tool    Emit tool-approval-request          │
│              normally       (instead of executing)              │
│                    │         Set hasPendingToolCalls = true     │
│                    │              │                             │
│                    └──────────────┤                             │
│                                   ▼                             │
│         Add AssistantMessage (with toolCalls)                   │
│         Add ToolResultMessage (server results only)             │
│                                   │                             │
│                                   ▼                             │
│         hasPendingToolCalls? ──YES──► STOP. Return response.    │
│                                       Process ENDS.             │
└─────────────────────────────────────────────────────────────────┘

          ⏳ Frontend shows approval UI to user...
          ⏳ User approves or denies...

┌─────────────────────────────────────────────────────────────────┐
│                  REQUEST 2                                      │
│                                                                 │
│  Frontend sends: full message history + tool-approval-response  │
│                          │                                      │
│                          ▼                                      │
│               Prism inspects last AssistantMessage              │
│               Finds pending tool_calls needing approval         │
│                    │              │                             │
│                APPROVED        DENIED (or skipped)              │
│                    │              │                             │
│                    ▼              ▼                             │
│             Execute tool    Add denial as                       │
│                    │        ToolResultMessage                   │
│                    ▼              │                             │
│         Add ToolResultMessage     │                             │
│         with actual result        │                             │
│                    │              │                             │
│                    └──────┬───────┘                             │
│                           ▼                                     │
│                  Send to LLM for further processing             │
└─────────────────────────────────────────────────────────────────┘

Usage Example

use Prism\Prism\Facades\Tool;

// Static approval (always requires approval)
$tool = Tool::as('delete_file')
    ->for('Delete a file from the filesystem')
    ->withStringParameter('path', 'File path to delete')
    ->using(fn (string $path): string => "Deleted: {$path}")
    ->requiresApproval();

// Dynamic approval (closure receives tool arguments)
$tool = Tool::as('transfer')
    ->for('Transfer money')
    ->withNumberParameter('amount', 'Amount to transfer')
    ->using(fn (float $amount): string => "Transferred {$amount}")
    ->requiresApproval(fn (array $args): bool => $args['amount'] > 1000);

Phase 2 continuation (client sends approval responses):

use Prism\Prism\ValueObjects\Messages\AssistantMessage;
use Prism\Prism\ValueObjects\Messages\ToolResultMessage;
use Prism\Prism\ValueObjects\Messages\UserMessage;
use Prism\Prism\ValueObjects\ToolApprovalResponse;
use Prism\Prism\ValueObjects\ToolCall;

$response = Prism::text()
    ->using('openai', 'gpt-4o')
    ->withTools([$tool])
    ->withMaxSteps(3)
    ->withMessages([
        new UserMessage('Delete /tmp/test.txt'),
        new AssistantMessage(
            content: '',
            toolCalls: [
                new ToolCall(id: 'call_123', name: 'delete_file', arguments: ['path' => '/tmp/test.txt']),
            ],
            toolApprovalRequests: [
                new ToolApprovalRequest(approvalId: 'call_123', toolCallId: 'call_123'),
            ],
        ),
        new ToolResultMessage([], [
            new ToolApprovalResponse(approvalId: 'call_123', approved: true),
        ]),
    ])
    ->asStream();

Breaking Changes

None. This is a backward-compatible addition. Existing tools with handlers continue to work exactly as before. Tools without requiresApproval() or clientExecuted()are unaffected.

@vinitkadam03 vinitkadam03 marked this pull request as draft March 1, 2026 13:14
@vinitkadam03 vinitkadam03 changed the title Feat/client executed tools and tool approval Feat: 🛠️ Add client-executed tools & human-in-the-loop tool approval flow Mar 1, 2026
@vinitkadam03 vinitkadam03 force-pushed the feat/client-executed-tools-and-tool-approval branch 2 times, most recently from 7483218 to c1840a7 Compare March 1, 2026 13:20
@vinitkadam03
Copy link
Contributor Author

vinitkadam03 commented Mar 1, 2026

fixes: #921

@vinitkadam03 vinitkadam03 marked this pull request as ready for review March 1, 2026 13:23
@vinitkadam03 vinitkadam03 marked this pull request as draft March 1, 2026 16:19
@vinitkadam03 vinitkadam03 force-pushed the feat/client-executed-tools-and-tool-approval branch 3 times, most recently from 0dc55ee to a04633b Compare March 1, 2026 16:53
@vinitkadam03 vinitkadam03 marked this pull request as ready for review March 1, 2026 17:03
@vinitkadam03 vinitkadam03 force-pushed the feat/client-executed-tools-and-tool-approval branch from a04633b to ac64d0a Compare March 2, 2026 07:02
@emiliopedrollo
Copy link

Can this work with a flow that uses previous_response_id and prompt instead of the full message history?

@vinitkadam03
Copy link
Contributor Author

vinitkadam03 commented Mar 8, 2026

Can this work with a flow that uses previous_response_id and prompt instead of the full message history?

@emiliopedrollo I haven't tested with previous_response_id specifically, but it should work. But, instead of full message history, you'd pass the response ID from the last turn that did not end in tool-calls (and stopped flow) along with only the AssistantMessage and ToolResultMessage:

$response = Prism::text()
    ->using('openai', 'gpt-4o')
    ->withTools([$tool])
    ->withMessages([
        new AssistantMessage('', $previousResponse->toolCalls, [], $previousResponse->toolApprovalRequests),
        new ToolResultMessage([], [new ToolApprovalResponse(approvalId: 'call_123', approved: true)]),
    ])
   // id of previous response that ended in stop, etc i.e non tool-calls and streaming completed.
    ->withProviderOptions(['previous_response_id' => $lastCompleteResponse->meta->id]) 
    ->asText();

The AssistantMessage is still needed even with previous_response_id as internally resolveToolApprovals logic needs it to match tool calls against approval responses and execute approved tools before sending the request. OpenAI's Responses API handles deduplication on its end.

@sixlive
Copy link
Contributor

sixlive commented Mar 9, 2026

I had Claude do a first pass at this. Can you take a look and let me know what you think? I hype on this feature.


Bug

Anthropic Structured handler missing resolveToolApprovals

Anthropic\Handlers\Structured::handle() does not call $this->resolveToolApprovals($this->request) at the start, unlike every other Text handler. If a structured request with tools requires approval, Phase 2 will silently fail to resolve approvals.


Architecture

approvalId === toolCallId everywhere

ToolApprovalRequest has two fields: approvalId and toolCallId. But every single call site sets them to the same value:

new ToolApprovalRequest(approvalId: $tc->id, toolCallId: $tc->id)

If they're always identical, why have two fields? The PR description mentions "Vercel AI SDK format" but Prism isn't the Vercel AI SDK. Is there a real use case where they diverge? I'd love to hear it. Otherwise we should collapse them into a single identifier.

ToolResultMessage is being overloaded as an approval transport

The constructor now accepts $toolApprovalResponses:

public function __construct(
    public readonly array $toolResults = [],
    public readonly array $toolApprovalResponses = []
) {}

This conflates two responsibilities. A ToolResultMessage should contain tool results. Using it to also carry approval responses back makes the message semantics confusing. The resolveToolApprovalsAndYieldEvents method then has to do pretty complex surgery on messages (removing and re-adding ToolResultMessage instances). A dedicated message type would make the protocol clearer and the approval resolution code simpler.

bool &$hasPendingToolCalls passed by reference through multiple layers

A mutable boolean threaded through callTools -> callToolsAndYieldEvents -> filterServerExecutedToolCalls is hard to trace. Consider returning a result object instead:

readonly class ToolExecutionResult {
    public function __construct(
        public array $toolResults,
        public array $approvalRequests,
        public bool $hasPendingToolCalls,
    ) {}
}

clientExecuted() sets $this->fn = null which is the same as the default state

isClientExecuted() checks $this->fn === null, which is also the initial state of a new Tool() before using() is called. So a tool where someone forgot to call using() is indistinguishable from an intentionally client-executed tool. The "implicit declaration" path documented in the PR is more likely to mask bugs than help.

I'd suggest using a dedicated bool $clientExecuted flag. Then a tool with no handler and no clientExecuted() call can throw at validation time rather than silently behaving as client-executed.

Repeated boilerplate across all providers

Every provider Text handler now has this identical block:

$hasPendingToolCalls = false;
$approvalRequests = [];
$toolResults = $this->callTools($request->tools(), $toolCalls, $hasPendingToolCalls, $approvalRequests);

$toolApprovalRequests = array_map(
    fn (ToolCall $tc): ToolApprovalRequest => new ToolApprovalRequest(
        approvalId: $tc->id, toolCallId: $tc->id
    ),
    $approvalRequests,
);

This ToolCall[] -> ToolApprovalRequest[] mapping should live in callTools itself (or a helper on the trait). The providers shouldn't need to know about this transformation.


Security

Dynamic approval closures receive raw LLM arguments

->requiresApproval(fn (array $args): bool => $args['amount'] > 1000)

The $args come directly from LLM output (parsed JSON). The documentation should explicitly warn that these arguments are untrusted LLM output and should be validated before use in any sensitive context.


Style

ToolResultMessage default changed from required to optional

Making $toolResults default to [] enables new ToolResultMessage with no args (used in the approval flow). But it also means code that used to require tool results now silently accepts empty messages. This is a subtle contract change worth being intentional about.

Vercel SDK references in internal docblocks

/**
 * Vercel AI SDK compatible format: { approvalId, type: 'tool-approval-response', approved }
 */

Prism's internal value objects shouldn't couple their documentation to third-party protocols. If Vercel compatibility is a goal, note it in the docs rather than the class docblock.


Testing

Tests cover the happy paths well for client-executed tools, mixed tools, and streaming. I'd like to see coverage for:

  • Phase 2 approval resolution with mixed approved + denied tools in the same request
  • A tool with requiresApproval(fn ($args) => ...) where the closure returns false (tool should execute normally)
  • Approval resolution when the message history is malformed (no AssistantMessage, or ToolResultMessage before AssistantMessage)
  • The Structured handler path for approval tools
  • Concurrent tool execution where some tools require approval and some are concurrent-capable

Summary

Priority Issue
Bug Anthropic Structured handler missing resolveToolApprovals
High Replace $fn === null check with explicit $clientExecuted flag
High Collapse approvalId/toolCallId or justify the distinction
High Extract approval boilerplate from provider handlers into the trait
Medium Consider a dedicated approval message type instead of overloading ToolResultMessage
Medium Replace bool &$hasPendingToolCalls with a result object
Medium Add edge-case tests listed above
Low Remove Vercel SDK references from internal docblocks
Low Document that approval closure args are untrusted LLM output

@vinitkadam03
Copy link
Contributor Author

vinitkadam03 commented Mar 10, 2026

Bug: Anthropic Structured handler missing resolveToolApprovals
Good catch! This was actually missing from all four Structured handlers (Anthropic, OpenAI, Gemini, OpenRouter), not just Anthropic. Added resolveToolApprovals at the top of handle() in each, along with Phase 2 tests to cover the flow.

High: Replace $fn === null check with explicit $clientExecuted flag
Addressed -- isClientExecuted() now checks a dedicated bool $clientExecuted flag. A tool where using() was forgotten is no longer silently treated as client-executed. It's also caught early via Tool::ensureRunnable() in toRequest(), so a misconfigured tool throws immediately at request build time with a clear message. Removed the "Implicit Declaration" section from the docs and updated all tests to use explicit ->clientExecuted().

High: Collapse approvalId/toolCallId or justify the distinction
Good call -- Initially I kept the approvalId the same as the toolCallId since it's already unique. But separating them gives us a clearer distinction -- the toolCallId identifies the LLM's tool call, while the approvalId identifies the approval request/response flow. They have different lifecycles and could diverge. approvalId is now generated independently via EventID::generate('apr'), producing a Prism-owned ID (e.g. apr_01J5X...) distinct from the LLM-provided toolCallId. This also future-proofs re-approval scenarios where the same toolCallId could have multiple approval requests with different approvalIds.

High: Extract approval boilerplate from provider handlers into the trait
Addressed as part of the approvalId separation changes -- callTools() now returns ToolApprovalRequest[] directly, so the array_map boilerplate is gone from all providers.

Medium: Consider a dedicated approval message type instead of overloading ToolResultMessage
Kept unified for now -- both approval responses and tool results relate to the same assistant turn's tool calls and occupy the same conversational slot. Approval responses are transient (they become tool results after resolution), and the resolution complexity comes from the workflow itself rather than the message type. Was thinking of renaming ToolResultMessage to ToolMessage but that will be a breaking change. Definitely open to revisiting if you think otherwise.

Medium: Replace bool &$hasPendingToolCalls with a result object
PHP generators do support return values, so technically feasible. That said, all three methods live in the CallsTools trait and $hasPendingToolCalls is only ever set in one place (filterServerExecutedToolCalls), so the scope stays contained. The refactor would touch 18+ handler files for a modest readability win. Keeping the pass-by-reference approach for now, but worth revisiting if the trait grows more complex. Open to revisiting this if you think otherwise.

Medium: Add edge-case tests
Two of the five were already covered (mixed approved+denied in Phase 2, and dynamic closure returning false). Added the missing three: malformed message history edge cases (no AssistantMessage, ToolResultMessage before AssistantMessage), and concurrent tools mixed with approval-required tools. The structured handler path is covered by provider-level integration tests across Anthropic, OpenAI, Gemini, and OpenRouter.

Low: Remove Vercel SDK references from internal docblocks
Removed!

Low: Document that approval closure args are untrusted LLM output
Fair thought, though this applies to all tool arguments in general, not just the approval closure. Feels like it might be out of scope for this PR, but happy to hear thoughts!

…th approval, and request payload validation fix
@vinitkadam03 vinitkadam03 force-pushed the feat/client-executed-tools-and-tool-approval branch from ebf2088 to aa3c746 Compare March 10, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants