Skip to content

section9-lab/SwiftHarnessAgent

Repository files navigation

SwiftHarnessAgent icon

SwiftHarnessAgent

A Swift agent runtime for embedding coding agents into macOS and iOS apps.

Swift 5.10 Platforms SwiftPM License: MIT GitHub Stars

A multi-step agent loop, sandboxed file & shell tools, skill loading, and context compaction — with first-class OpenAI and Anthropic backends. Mirrors the Anthropic / oh-my-pi tool conventions (read, edit, bash, todo_write, ask, task subagents) so prompts and skills port cleanly between harnesses.

Not an OpenAI SDK wrapper. If you want client.chat(...), use a thinner library. If you want a coding agent that reads files, edits code, and runs shell commands under your policy — keep reading.

import SwiftHarnessAgent

let workspace = URL(fileURLWithPath: ".")

let agent = AgentSDK(
    client: EchoClient(),                               // swap with OpenAIChatCompletionsClient / AnthropicMessagesClient
    modelName: "echo",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace)
)

let result = try await agent.run(prompt: "Read README.md and summarize")
print(result.finalText)

Architecture

SwiftHarnessAgent is a monorepo containing two Swift Package Manager products:

  • SwiftAISDK — provider-agnostic LLM client layer. Use this directly if you only want OpenAI / Anthropic API access without the agent runtime.
  • SwiftHarnessAgent — coding-agent runtime built on top of SwiftAISDK. Includes the multi-step loop, tools, skills, subagents, and compaction.

Both products live in the same Package.swift and share a single Package.resolved, following the pattern used by swift-collections and swift-async-algorithms.

SwiftAISDK

Provider-agnostic LLM client with rich content-block support:

  • LLMClient protocol — implement for any backend
  • LLMMessage — role + array of typed content blocks (text, image, reasoning, toolUse, toolResult, refusal)
  • LLMContentBlock — preserves reasoning signatures (Anthropic extended thinking), encrypted reasoning (OpenAI Responses), and tool-use metadata across turns
  • Built-in clients:
    • EchoClient — local stub, no API key needed
    • OpenAIChatCompletionsClient/v1/chat/completions (OpenAI, NVIDIA NIM, vLLM, Ollama, Together, Groq, etc.)
    • OpenAIResponsesClient/v1/responses (OpenAI's newer protocol with reasoning persistence and server-side state)
    • AnthropicMessagesClient/v1/messages with extended thinking + signature round-tripping

The SDK's content-block model is lossless — reasoning blocks with signatures (required for Anthropic extended-thinking + tool-use multi-turn correctness) and encrypted reasoning (OpenAI Responses) are preserved verbatim across turns.

SwiftHarnessAgent

Coding-agent runtime:

  • AgentSDK — main entry, assembles client + tools + policy + skills + compaction
  • AgentLoop — the multi-step reasoning loop
  • ToolExecutionPolicy — file allow-roots and bash sandboxing (disabled / sandboxed / unrestricted)
  • ReadTool / WriteTool / EditTool / BashTool
  • TodoStore + TodoWriteTool — phased task tracking with a live phasesStream()
  • AskTool — interactive user prompts via an AskHandler closure
  • TaskCoordinator + SubagentDefinition — parallel subagent fan-out
  • SkillLoader — load SKILL.md directories into reusable skill definitions
  • CompactionConfig — summarize older context to keep long histories bounded

Quick Start

1. Add the package to Package.swift:

dependencies: [
    .package(url: "https://github.com/section9-lab/SwiftHarnessAgent", from: "1.0.0")
]

2. Add the product to your target:

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "SwiftHarnessAgent", package: "SwiftHarnessAgent")
        // or .product(name: "SwiftAISDK", package: "SwiftHarnessAgent") for client-only
    ]
)

3. Run the snippet aboveEchoClient needs no API key, so the agent boots immediately. Swap it for a real backend when you are ready (see Recipes).

How it differs

vs SwiftAgent (1amageek)

SwiftAgent is a SwiftUI-style declarative DSL for composing LLM workflows on top of Apple's FoundationModels. You describe pipelines as Step values inside a body and the framework synthesizes run(_:).

SwiftHarnessAgent is a coding-agent runtime — a multi-step loop with built-in file/edit/bash tools, sandboxing, skill loading, subagents, and context compaction.

You want to... Use
Compose declarative LLM pipelines (Transform / Map / Race / Gate) SwiftAgent
Ship on iOS 26 / macOS 26 with Apple FoundationModels first SwiftAgent
Embed a coding agent that reads & edits files and runs shell commands SwiftHarnessAgent
Target OpenAI or Anthropic as a first-class backend SwiftHarnessAgent
Ship today on iOS 17 / macOS 13, Swift 5.10 SwiftHarnessAgent
Reuse Anthropic / oh-my-pi tool conventions (skills, todos, subagents) SwiftHarnessAgent

The two are not really competitors — SwiftAgent treats LLMs as a declarative computation primitive; SwiftHarnessAgent treats them as the brain of an autonomous tool-using agent.

Recipes

OpenAI Chat Completions (and compatible endpoints)

let client = OpenAIChatCompletionsClient(
    baseURL: URL(string: "https://api.openai.com/v1")!,
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]
)

let agent = AgentSDK(
    client: client,
    modelName: "gpt-4o",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(
        workingDirectory: workspace,
        bash: .disabled
    ),
    maxSteps: 8
)

Works with OpenAI, NVIDIA NIM, vLLM, sglang, Ollama's OpenAI shim, Together, Groq, and any other /v1/chat/completions endpoint.

OpenAI Responses API (with reasoning persistence)

let client = OpenAIResponsesClient(
    baseURL: URL(string: "https://api.openai.com/v1")!,
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"],
    includeEncryptedReasoning: true
)

let agent = AgentSDK(
    client: client,
    modelName: "gpt-5.2",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace)
)

The Responses API is OpenAI's newer protocol with reasoning persistence and server-side state. Set includeEncryptedReasoning: true to thread reasoning across turns (required for o1 / o3 / gpt-5 style models when you want reasoning continuity).

Anthropic Messages API (with extended thinking)

let client = AnthropicMessagesClient(
    apiKey: ProcessInfo.processInfo.environment["ANTHROPIC_API_KEY"],
    thinkingBudgetTokens: 10_000  // enable extended thinking with 10k token budget
)

let agent = AgentSDK(
    client: client,
    modelName: "claude-sonnet-4-6",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace, bash: .disabled)
)

Extended thinking blocks with signatures are automatically preserved across turns (required for Anthropic extended-thinking + tool-use multi-turn correctness).

Custom tool

struct CurrentTimeTool: AgentTool {
    let name = "current_time"
    let description = "Returns the current local time"
    let argumentSchemaJSON = #"{"type":"object","properties":{}}"#

    func run(argumentsJSON: String, context: ToolExecutionContext) async throws -> String {
        Date().formatted(date: .omitted, time: .standard)
    }
}

let agent = AgentSDK(
    client: client,
    modelName: "gpt-4o",
    tools: [CurrentTimeTool()],
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace)
)

Skills from disk

let agent = AgentSDK(
    client: client,
    modelName: "gpt-4o",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace),
    skillsDirectories: [URL(fileURLWithPath: "/path/to/.skills")]
)

Todos, Ask, and Subagents (oh-my-pi parity)

Three opt-in tools. Each is enabled by passing its dependency to AgentSDK; otherwise it stays off and is not advertised to the model.

// todo_write — phased task tracking
let todoStore = TodoStore()
Task {
    for await phases in await todoStore.phasesStream() {
        print("phases:", phases.map(\.name))   // render to your UI
    }
}

// ask — clarifying questions
let askHandler: AskHandler = { questions in
    questions.map { q in
        AskAnswer(id: q.id, selections: [q.options[q.recommended ?? 0]])
    }
}

// task — parallel subagents
let explorer = SubagentDefinition(
    id: "explore",
    displayName: "Explorer",
    description: "Read-only investigator that returns compressed context",
    systemPrompt: "You are a read-only codebase scout. Return concise findings.",
    tools: [ReadTool()],
    maxSteps: 8
)

let coordinator = TaskCoordinator(
    definitions: [explorer],
    clientFactory: { _ in
        (
            OpenAIChatCompletionsClient(
                baseURL: URL(string: "https://api.openai.com/v1")!,
                apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]
            ),
            "gpt-4o-mini"
        )
    },
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace),
    maxConcurrency: 4
)

let agent = AgentSDK(
    client: client,
    modelName: "gpt-4o",
    workingDirectory: workspace,
    executionPolicy: ToolExecutionPolicy(workingDirectory: workspace),
    todoStore: todoStore,
    askHandler: askHandler,
    taskCoordinator: coordinator
)

Subagents inherit the parent's working directory and execution policy. Each task carries its own assignment; an optional context is shared across the batch. A failure in one task does not abort the rest — it is reported as a failed result.

Core Concepts

  • AgentSDK — assembles the client, model name, tools, skills, execution policy, working directory, and compaction settings into a runnable agent.
  • LLMClient — implement for your own backend, or use OpenAIChatCompletionsClient / OpenAIResponsesClient / AnthropicMessagesClient / EchoClient.
  • Tools — implement AgentTool to expose capabilities. Every tool runs with a ToolExecutionContext carrying the working directory and the effective execution policy.
  • ToolExecutionPolicy — separates file access (scoped via allowed roots) from shell execution (disabled, sandboxed via sandbox-exec, or unrestricted). read caps file size by default to keep tool output from blowing past the model's context window.
  • SkillLoader — scans directories for SKILL.md files and turns them into reusable skill definitions injected into the agent.
  • CompactionConfig — summarize older turns before context grows out of bounds, so long-running conversations stay tractable.

Parallel Tool Calls (Barrier Scheduler)

When parallelToolCalls: true is set in AgentLoopConfig, the agent loop uses a barrier-based scheduler inspired by oh-my-pi:

  • .shared tools (default — read, search, find, ast_grep) run concurrently within a batch.
  • .exclusive tools (write, edit, bash, todo_write) act as barriers: the pending shared batch drains first, then the exclusive tool runs alone, then accumulation resumes.
  • Results always return in the original tool_use order.
let config = AgentLoopConfig(
    workingDirectory: workspace,
    parallelToolCalls: true  // Enable barrier scheduler
)

Custom tools opt into the right mode via the concurrency property:

struct MyReadOnlyTool: AgentTool {
    // ...
    var concurrency: ToolConcurrency { .shared }    // default, can omit
}

struct MyWriteTool: AgentTool {
    // ...
    var concurrency: ToolConcurrency { .exclusive } // serialized
}

Non-Goals

  • Not a UI framework. Bring your own SwiftUI / AppKit / UIKit layer — TodoStore.phasesStream() and AskHandler exist precisely so the runtime stays headless.
  • Not a single-shot LLM SDK. For client.chat(...)-style calls, a thinner library will serve you better (or use SwiftAISDK directly).
  • Not a declarative DSL. If you want body { Transform; GenerateText; ... }, see SwiftAgent.
  • bash is macOS-only (uses sandbox-exec). On other platforms it raises an explicit error rather than silently downgrading sandboxing.

Testing

swift test
swift run SwiftHarnessAgentExample

All 67 tests pass.

Star History

Star History Chart

License

MIT — see LICENSE.

About

A Swift agent runtime for embedding coding agents into macOS and iOS apps.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages