feat: add /swarm parallel agent-swarm orchestration#208
Conversation
🦋 Changeset detectedLatest commit: a17cfee The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc5e4bf787
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof parsed !== 'object' || parsed === null) return null; | ||
|
|
||
| const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks; | ||
| if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null; |
There was a problem hiding this comment.
Enforce the planner's subtask cap
When the planner returns valid JSON with more than the prompted maximum of 6 subtasks, this accepts the entire array; SwarmCoordinator.runWave then iterates every entry and spawns a subagent for each one, only limiting concurrent workers to 4. In the common failure mode where the LLM ignores the cap or emits a large accidental list, /swarm can launch dozens of subagents and burn substantial time/tokens instead of retrying or rejecting the invalid plan. Please validate subtasksRaw.length <= 6 here (or truncate deliberately) before spawning workers.
Useful? React with 👍 / 👎.
| return; | ||
| } | ||
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Handle sessions whose active tools lack Swarm
This directly prompts the current session to call Swarm, but resumed sessions created before this commit replay their old tools.set_active_tools record from the wire, so their active tool list does not include the newly added Swarm entry from agent.yaml. In that context /swarm <task> is accepted by the TUI but the model is asked to use a tool that is not exposed, so the command fails or devolves into normal chat; migrate old agent tool lists or check tool availability before sending this framed prompt.
Useful? React with 👍 / 👎.
…ride test and changeset
…clean up on reset
The stall-detection repeat key joined the tool name and canonical args with a literal NUL (0x00) separator. The control byte caused git to classify stall-hook.ts as binary, so diffs, blame, and code review on the file were opaque — which prevented confirming the test history for this feature. Replace the NUL with a normal space (tool names are identifiers and never contain spaces, so keys stay collision-free) so the file is plain UTF-8 text and remains reviewable. Behavior is unchanged: the key still uniquely combines tool name and canonical args. Verified by reverting the hook to a no-op stub to show the three stall-detection test files go red (the discriminating block, canonical-key, e2e turn-abort, and worker-stall-translation cases all fail), then restoring the real implementation to confirm they pass — the failing-first the prior atomic commit never recorded. Full suite: 5049 passed / 25 skipped; make typecheck clean.
…nerate/reassign/drop)
…on-aware recovery UI
The summary-continuation pass re-prompted any subagent whose first
summary was under 200 chars to "expand" it, then read back the
follow-up turn — replacing the original output rather than appending.
For swarm's structured-output subagents this was harmful: a reviser's
compact decision JSON (e.g. {"kind":"retry"}) is always under the
threshold, so the expand turn always fired and could replace the JSON
with prose, silently degrading the recovery loop into conservative
drops. It also taxed every short-but-complete handoff with an extra
turn.
Remove the heuristic entirely so a subagent's first summary is returned
as-is. The max-tokens truncation guard is unaffected.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc9176b3d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks; | ||
| if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null; |
There was a problem hiding this comment.
Enforce the advertised subtask cap
The planner prompt says to keep the plan to at most 6 subtasks, but this parser accepts any non-empty subtasks array. If the model returns dozens or hundreds of items, runWithRetries will execute every one of them (bounded only by concurrency), which can turn a single /swarm call into unexpected token/tool spend and a very long run instead of retrying/rejecting the invalid plan.
Useful? React with 👍 / 👎.
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Route /swarm through the normal send lifecycle
This calls session.prompt directly, so the TUI never runs the normal sendMessageInternal setup (beginSessionRequest, streaming state, transcript entry, and queue handling). During the initial model latency before any SDK event arrives, the app still considers itself idle, so another user input or idle-only slash command can be accepted and race with the swarm turn instead of being blocked/queued like a normal prompt.
Useful? React with 👍 / 👎.
| await runChildTurnToCompletion(child, options.signal); | ||
| result = lastAssistantText(child); | ||
| } | ||
| const result = lastAssistantText(child); |
There was a problem hiding this comment.
Preserve continuation for regular subagents
This now returns the first assistant message for every subagent, removing the previous follow-up that expanded summaries shorter than 200 characters. That may be useful for swarm planner/reviser JSON, but it also changes normal Agent/explore subagents: a terse answer such as “Done” is handed back to the parent without the bounded expansion turn, leaving the parent under-informed. Scope the raw-result behavior to the swarm/profileOverride path rather than all subagents.
Useful? React with 👍 / 👎.
| if (result.is_error === true) { | ||
| this.swarmModel = applySwarmEvent(this.swarmModel, { t: 'cancelled' }); |
There was a problem hiding this comment.
Preserve non-cancel swarm errors
This treats every Swarm error result as cancelled, but SwarmTool returns isError for ordinary failures too, such as planner JSON failures or synthesizer errors. Because swarm cards also skip rendering the tool result body, those failures are displayed as a cancelled dashboard with the actual Swarm failed: ... message hidden from the user; only genuine abort/cancel errors should take this path.
Useful? React with 👍 / 👎.
| // visible but mark it retrying (an in-flight, uncounted state) so the | ||
| // re-spawn can collapse onto it. Carries no subagent id, so we match by | ||
| // role against the most recent terminal/retrying row. | ||
| const prior = findReusableRoleRow(model.workers, event.role); |
There was a problem hiding this comment.
Correlate retries by subtask, not role
When a plan contains two subtasks with the same role and both reach a terminal state, recovery events only carry the role to the reducer, so findReusableRoleRow can mark/re-key/drop the wrong row (the most recently inserted matching role) even though the coordinator emitted a distinct subtaskId. This makes the swarm dashboard inaccurate for duplicate-role plans; use the subtask identity or preserve a subtask-to-worker mapping instead of matching solely by role.
Useful? React with 👍 / 👎.
handleSwarmCommand called session.prompt directly, bypassing beginSessionRequest. streamingPhase therefore stayed 'idle' until the SDK turn.started event round-tripped back, leaving a startup window in which a fast follow-up message was dispatched as a second concurrent prompt and silently dropped by the core as agent_busy, and in which the UI showed no waiting state. Call beginSessionRequest() before prompting — flipping streamingPhase synchronously so the input gate closes immediately and the waiting pane shows — and failSessionRequest() on a prompt rejection, mirroring sendSkillActivation / handleInitCommand.
The swarm card finalized every is_error tool result as 'cancelled' with a success-toned bullet, and the dashboard suppresses the result body, so ordinary failures (planner produced no valid plan, synthesizer error) rendered as a clean "cancelled" with the real "Swarm failed: ..." reason hidden from the user. SwarmTool now distinguishes a genuine cancel (ctx.signal aborted) from an ordinary failure: on a real failure it emits a 'failed' swarm progress event carrying the reason before returning the error result. The TUI adds a terminal 'failed' phase (error bullet, ' · failed' tag, and a "✗ reason" body line); finalizeSwarmModelIfNeeded only forces 'cancelled' when the model is not already 'failed', so a genuine abort still shows 'cancelled'.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a17cfeee2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.swarmModel = applySwarmEvent(this.swarmModel, { | ||
| t: 'done', | ||
| succeeded: this.swarmModel.doneCount, | ||
| failed: this.swarmModel.failedCount, | ||
| }); |
There was a problem hiding this comment.
Preserve completed swarm state on replay
When a completed /swarm turn is rendered from session history, replay only reconstructs the tool call/result from messages and does not replay the live tool.progress or subagent.* events that populated swarmModel; this fallback therefore finalizes an empty initial model as done with zero workers/counts. Since Swarm cards also suppress the normal tool result body, every resumed completed swarm shows an inaccurate 0 workers · 0✓ 0✗ dashboard instead of the actual worker outcome. Please either persist/replay the swarm progress state or fall back to rendering the result body when no worker rows were reconstructed.
Useful? React with 👍 / 👎.
| if (this.swarmModel !== undefined) { | ||
| this.buildSwarmBody(); | ||
| return; |
There was a problem hiding this comment.
Move Swarm rendering out of ToolCallComponent
apps/kimi-code/AGENTS.md says new tool-result display should extend components/messages/tool-renderers/registry.ts and the corresponding renderer, and should not stack branches inside ToolCallComponent; this branch adds the Swarm-specific dashboard branch directly in the central component. That makes future tool-specific UI continue to accumulate in this already-large class instead of the documented renderer path, so please move the Swarm display behind a dedicated renderer/component boundary.
Useful? React with 👍 / 👎.
| // dropped as agent_busy. | ||
| host.beginSessionRequest(); | ||
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Show the swarm request in the transcript
This starts a real model turn but, unlike the normal send path, never appends the user's /swarm task to the live transcript before calling session.prompt. In a live session the user sees a Swarm tool card with no preceding user request, and after resume the replayed user message comes from the internal buildSwarmPrompt(...) wrapper instead of the command/task the user actually entered; add an explicit transcript entry for the swarm request before dispatching the prompt.
Useful? React with 👍 / 👎.
| typeof o['role'] !== 'string' || | ||
| typeof o['systemPrompt'] !== 'string' || | ||
| typeof o['prompt'] !== 'string' |
There was a problem hiding this comment.
Reject empty planner subtask fields
When the planner returns syntactically valid JSON but leaves role, systemPrompt, or prompt as an empty string, this parser accepts the plan instead of retrying, so the coordinator can spawn a swarm: worker with no role/instructions and synthesize arbitrary or useless output. Treat trimmed-empty required fields as invalid here, matching the stricter reviser parsing, so the existing planner retry handles malformed plans.
Useful? React with 👍 / 👎.
Problem
Broad, parallelizable tasks (multi-file analysis, multi-angle research) run today as a single sequential agent loop. The existing
Agentsubagent tool can spawn parallel subagents, but there is no built-in orchestration that decomposes a task, fans out heterogeneous role-specialized workers, and synthesizes their results into one answer.What changed
Adds a
/swarm <task>command (Phase 1) that runs a task as a self-directed agent swarm, client-side, on top of the existing subagent primitives:/swarmcommand (apps/kimi-code): sends a swarm-framed prompt viasession.prompt(), driving a new server-sideSwarmtool.Swarmtool (agent-core) runs a code-drivenSwarmCoordinator:swarm:<role>worker subagents concurrently (concurrency cap 4), each with a dynamically generated role — custom system prompt + a sanitized read-only tool subset via a newprofileOverrideonSessionSubagentHost.spawn;ctx.onUpdate(existing tool-progress channel).Swarmtool is registered only on non-sub agents (type !== 'sub'), and worker tool sets are filtered against a read-only allowlist (ALLOWED_WORKER_TOOLS), so a worker can never obtainSwarm/Agentand spawn a nested swarm.Scope: Phase 1 is a single-wave swarm with read-only workers. Failed subtasks are recorded and surfaced in synthesis (no auto-retry/reassign); multi-wave coordination, write-capable workers with approval, and a dedicated TUI panel are deferred to later phases.
Tests: unit coverage for the plan parser, concurrency helper, coordinator (plan/parallel/synthesize, planning retry, abort propagation, tool-allowlist sanitization), and the Swarm tool + command. Full suite green.