diff --git a/README.md b/README.md index aace534..814f333 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,83 @@ Because of their special behavior of being preserved on context window overflow, The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model. -Here’s an example of how to use the `tools` option: +There are two tool use modes: with automatic execution (closed loop) and without automatic execution (open loop). + +Regardless of with or without automatic execution, the session creation and appending signature are the same. Here’s an example: + +```js +const session = await LanguageModel.create({ + initialPrompts: [ + { + role: "system", + content: `You are a helpful assistant. You can use tools to help the user.`, + }, + ], + tools: [ + { + name: "getWeather", + description: "Get the weather in a location.", + inputSchema: { + type: "object", + properties: { + location: { + type: "string", + description: "The city to check for the weather condition.", + }, + }, + required: ["location"], + }, + }, + ], +}); +``` + +In this example, the `tools` array defines a `getWeather` tool, specifying its name, description and input schema. + +Few shot examples of tool use can be appended like so: + +```js +await session.append([ + {role: "user", content: "What is the weather in Seattle?"}, + {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, + {role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}}, + {role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"}, +]); +``` + +Note that "role" and "type" now supports "tool-call" and "tool-result". +`content.result` is a list of a dictionary of `type` and `value`, where `type` can be `{"text", "image", "audio", "object" }` and `value` is `any`. + +#### Open Loop: + +Open loop is enabled by specifying `tool-call` in `expectedOutputs` when the session is created. + +When a tool needs to be called, the API will return an object with `callId` (a unique identifier of this tool call), `name` (name of the tool), and `arguments` (inputs to the tool), and client is expected to handle the tool execution and append the tool result back to the session. The `argument` is a dictionary fitting the JSON input schema of the tool's declaration; if the input schema is not "object", the value will be wrapped in a key. + +Example: + +```js +sessionOptions = structuredClone(options); +sessionOptions.expectedOutputs.push(["tool-call"]); +session = await LanguageModel.create(sessionOptions); + +var result = await session.prompt("What is the weather in Seattle?"); +if (result.type=="tool-call") { + if (result.name == "get_weather") { + const tool_result = getWeather(result.arguments.location); + result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) + } +} else{ + console.log(result) +} +``` + +Note that we always require tool-response to immediately follow tool-call generated by the model. + + +#### Closed Loop: + +To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation: ```js const session = await LanguageModel.create({ @@ -171,13 +247,41 @@ const session = await LanguageModel.create({ return JSON.stringify(await res.json()); }, } - ] + ], +toolUseConfig: {enabled: true}, }); const result = await session.prompt("What is the weather in Seattle?"); ``` -In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response. +When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response. + +#### Do I need auto execution? + +In general, automatic execution is suitable for use cases where the model quality is good enough via prompt tuning. That can either mean you are tolerable for certain mistakes that the model makes when making tool calls, or the task is simple enough for the model to handle (e.g, just a few distinct tools, short and clean tool output, short context window, etc) + +On the other hand, open loop allows more flexibility for intercepting at various points in the planner loop (the reason->action->observation loop) where you can inject your business logic programmatically. + +Here are a few patterns where open loop would be useful: + +1) context management + +If your session might go through a long chain of contents, and the previous tool results are no longer important or relevant for your use case, open loop gives the flexibility of editing and recreating the session in the middle of a tool call. You can manually compress and modify the history, and recreate a new session with less content. + +For example, for a shopping agent, your tool keeps track of a live shopping cart, but only the latest cart status is important. When there have been multiple rounds of cart updates, you might need to compress the tool call history to avoid exceeding context window, improve latency and quality. + +2) Conditional loop breaking + +If your business logic requires some determinism in some critical states, open loop allows the flexibility to early exit the planner loop and output a pre-determined action. + +For example, for a shopping agent, you might be required to get an explicit confirmation before placing the order. Whenever the tool `"place_order"` is called in the first time, you want to exit the planner loop immediately, and display a verbatim message to the user + +3) Conditional constraints + +In automatic execution, the planner loop decodes various and mutliple times. If you need to supply constraints dynamically, you'd use the open loop API and control the planner loop yourself. Because the planner loop runs the entire loop behind the scene, the closed loop API doesn't have a natural way to supply a different constraint for each LLM step. + +For example, you might want the model to always generate tool `FOO` after tool `BAR` is called; or you might want the model to always generate text only with some prefix after tool `FOO` is called. + #### Concurrent tool use diff --git a/index.bs b/index.bs index 7f7bdb1..37c8ebc 100644 --- a/index.bs +++ b/index.bs @@ -39,8 +39,11 @@ interface LanguageModel : EventTarget { static Promise availability(optional LanguageModelCreateCoreOptions options = {}); static Promise params(); + // The return type from prompt() method and those alike. + typedef (DOMString or sequence) LanguageModelPromptResult; + // These will throw "NotSupportedError" DOMExceptions if role = "system" - Promise prompt( + Promise prompt( LanguageModelPrompt input, optional LanguageModelPromptOptions options = {} ); @@ -80,13 +83,11 @@ interface LanguageModelParams { callback LanguageModelToolFunction = Promise (any... arguments); // A description of a tool call that a language model can invoke. -dictionary LanguageModelTool { +dictionary LanguageModelToolDeclaration { required DOMString name; required DOMString description; // JSON schema for the input parameters. required object inputSchema; - // The function to be invoked by user agent on behalf of language model. - required LanguageModelToolFunction execute; }; dictionary LanguageModelCreateCoreOptions { @@ -97,7 +98,7 @@ dictionary LanguageModelCreateCoreOptions { sequence expectedInputs; sequence expectedOutputs; - sequence tools; + sequence tools; }; dictionary LanguageModelCreateOptions : LanguageModelCreateCoreOptions { @@ -148,16 +149,52 @@ dictionary LanguageModelMessageContent { required LanguageModelMessageValue value; }; -enum LanguageModelMessageRole { "system", "user", "assistant" }; +enum LanguageModelMessageRole { "system", "user", "assistant", "tool-call", "tool-response" }; -enum LanguageModelMessageType { "text", "image", "audio" }; +enum LanguageModelMessageType { "text", "image", "audio","tool-call", "tool-response" }; typedef ( ImageBitmapSource or AudioBuffer or BufferSource or DOMString + or LanguageModelToolCall + or LanguageModelToolResponse ) LanguageModelMessageValue; + +// The definitions of `LanguageModelToolCall` and `LanguageModelToolResponse` values +enum LanguageModelToolResultType { "text", "image", "audio", "object" }; + +dictionary LanguageModelToolResultContent { + required LanguageModelToolResultType type; + required any value; +}; + +// Represents a tool call requested by the language model. +dictionary LanguageModelToolCall { + required DOMString callID; + required DOMString name; + object arguments; +}; + +// Successful tool execution result. +dictionary LanguageModelToolSuccess { + required DOMString callID; + required DOMString name; + required sequence result; +}; + +// Failed tool execution result. +dictionary LanguageModelToolError { + required DOMString callID; + required DOMString name; + required DOMString errorMessage; +}; + +// The response from executing a tool call - either success or error. +typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse; + +

Prompt processing