Skip to content

feat: add /swarm parallel agent-swarm orchestration#208

Open
RealKai42 wants to merge 27 commits into
mainfrom
kaiyi/karachi
Open

feat: add /swarm parallel agent-swarm orchestration#208
RealKai42 wants to merge 27 commits into
mainfrom
kaiyi/karachi

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

Problem

Broad, parallelizable tasks (multi-file analysis, multi-angle research) run today as a single sequential agent loop. The existing Agent subagent tool can spawn parallel subagents, but there is no built-in orchestration that decomposes a task, fans out heterogeneous role-specialized workers, and synthesizes their results into one answer.

What changed

Adds a /swarm <task> command (Phase 1) that runs a task as a self-directed agent swarm, client-side, on top of the existing subagent primitives:

  • /swarm command (apps/kimi-code): sends a swarm-framed prompt via session.prompt(), driving a new server-side Swarm tool.
  • Swarm tool (agent-core) runs a code-driven SwarmCoordinator:
    1. spawns a planner subagent that emits a JSON decomposition plan (parsed + one retry);
    2. fans out swarm:<role> worker subagents concurrently (concurrency cap 4), each with a dynamically generated role — custom system prompt + a sanitized read-only tool subset via a new profileOverride on SessionSubagentHost.spawn;
    3. spawns a synthesizer subagent to merge worker outputs into the final answer.
  • Progress streams via ctx.onUpdate (existing tool-progress channel).
  • Recursion guard: the Swarm tool is registered only on non-sub agents (type !== 'sub'), and worker tool sets are filtered against a read-only allowlist (ALLOWED_WORKER_TOOLS), so a worker can never obtain Swarm/Agent and spawn a nested swarm.

Scope: Phase 1 is a single-wave swarm with read-only workers. Failed subtasks are recorded and surfaced in synthesis (no auto-retry/reassign); multi-wave coordination, write-capable workers with approval, and a dedicated TUI panel are deferred to later phases.

Tests: unit coverage for the plan parser, concurrency helper, coordinator (plan/parallel/synthesize, planning retry, abort propagation, tool-allowlist sanitization), and the Swarm tool + command. Full suite green.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

🦋 Changeset detected

Latest commit: a17cfee

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/agent-core Minor
@moonshot-ai/kimi-code Minor
@moonshot-ai/migration-legacy Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 29, 2026

pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@a17cfee
npx https://pkg.pr.new/@moonshot-ai/kimi-code@a17cfee

commit: a17cfee

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc5e4bf787

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (typeof parsed !== 'object' || parsed === null) return null;

const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks;
if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the planner's subtask cap

When the planner returns valid JSON with more than the prompted maximum of 6 subtasks, this accepts the entire array; SwarmCoordinator.runWave then iterates every entry and spawns a subagent for each one, only limiting concurrent workers to 4. In the common failure mode where the LLM ignores the cap or emits a large accidental list, /swarm can launch dozens of subagents and burn substantial time/tokens instead of retrying or rejecting the invalid plan. Please validate subtasksRaw.length <= 6 here (or truncate deliberately) before spawning workers.

Useful? React with 👍 / 👎.

Comment thread packages/agent-core/src/tools/builtin/collaboration/swarm.ts Outdated
return;
}
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle sessions whose active tools lack Swarm

This directly prompts the current session to call Swarm, but resumed sessions created before this commit replay their old tools.set_active_tools record from the wire, so their active tool list does not include the newly added Swarm entry from agent.yaml. In that context /swarm <task> is accepted by the TUI but the model is asked to use a tool that is not exposed, so the command fails or devolves into normal chat; migrate old agent tool lists or check tool availability before sending this framed prompt.

Useful? React with 👍 / 👎.

RealKai42 added 18 commits May 29, 2026 17:36
The stall-detection repeat key joined the tool name and canonical args
with a literal NUL (0x00) separator. The control byte caused git to
classify stall-hook.ts as binary, so diffs, blame, and code review on
the file were opaque — which prevented confirming the test history for
this feature. Replace the NUL with a normal space (tool names are
identifiers and never contain spaces, so keys stay collision-free) so
the file is plain UTF-8 text and remains reviewable.

Behavior is unchanged: the key still uniquely combines tool name and
canonical args. Verified by reverting the hook to a no-op stub to show
the three stall-detection test files go red (the discriminating block,
canonical-key, e2e turn-abort, and worker-stall-translation cases all
fail), then restoring the real implementation to confirm they pass —
the failing-first the prior atomic commit never recorded.

Full suite: 5049 passed / 25 skipped; make typecheck clean.
The summary-continuation pass re-prompted any subagent whose first
summary was under 200 chars to "expand" it, then read back the
follow-up turn — replacing the original output rather than appending.

For swarm's structured-output subagents this was harmful: a reviser's
compact decision JSON (e.g. {"kind":"retry"}) is always under the
threshold, so the expand turn always fired and could replace the JSON
with prose, silently degrading the recovery loop into conservative
drops. It also taxed every short-but-complete handoff with an extra
turn.

Remove the heuristic entirely so a subagent's first summary is returned
as-is. The max-tokens truncation guard is unaffected.
@RealKai42
Copy link
Copy Markdown
Collaborator Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc9176b3d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +24 to +25
const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks;
if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the advertised subtask cap

The planner prompt says to keep the plan to at most 6 subtasks, but this parser accepts any non-empty subtasks array. If the model returns dozens or hundreds of items, runWithRetries will execute every one of them (bounded only by concurrency), which can turn a single /swarm call into unexpected token/tool spend and a very long run instead of retrying/rejecting the invalid plan.

Useful? React with 👍 / 👎.

Comment on lines +26 to +27
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route /swarm through the normal send lifecycle

This calls session.prompt directly, so the TUI never runs the normal sendMessageInternal setup (beginSessionRequest, streaming state, transcript entry, and queue handling). During the initial model latency before any SDK event arrives, the app still considers itself idle, so another user input or idle-only slash command can be accepted and race with the swarm turn instead of being blocked/queued like a normal prompt.

Useful? React with 👍 / 👎.

await runChildTurnToCompletion(child, options.signal);
result = lastAssistantText(child);
}
const result = lastAssistantText(child);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve continuation for regular subagents

This now returns the first assistant message for every subagent, removing the previous follow-up that expanded summaries shorter than 200 characters. That may be useful for swarm planner/reviser JSON, but it also changes normal Agent/explore subagents: a terse answer such as “Done” is handed back to the parent without the bounded expansion turn, leaving the parent under-informed. Scope the raw-result behavior to the swarm/profileOverride path rather than all subagents.

Useful? React with 👍 / 👎.

Comment on lines +684 to +685
if (result.is_error === true) {
this.swarmModel = applySwarmEvent(this.swarmModel, { t: 'cancelled' });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-cancel swarm errors

This treats every Swarm error result as cancelled, but SwarmTool returns isError for ordinary failures too, such as planner JSON failures or synthesizer errors. Because swarm cards also skip rendering the tool result body, those failures are displayed as a cancelled dashboard with the actual Swarm failed: ... message hidden from the user; only genuine abort/cancel errors should take this path.

Useful? React with 👍 / 👎.

// visible but mark it retrying (an in-flight, uncounted state) so the
// re-spawn can collapse onto it. Carries no subagent id, so we match by
// role against the most recent terminal/retrying row.
const prior = findReusableRoleRow(model.workers, event.role);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Correlate retries by subtask, not role

When a plan contains two subtasks with the same role and both reach a terminal state, recovery events only carry the role to the reducer, so findReusableRoleRow can mark/re-key/drop the wrong row (the most recently inserted matching role) even though the coordinator emitted a distinct subtaskId. This makes the swarm dashboard inaccurate for duplicate-role plans; use the subtask identity or preserve a subtask-to-worker mapping instead of matching solely by role.

Useful? React with 👍 / 👎.

RealKai42 added 2 commits May 30, 2026 00:51
handleSwarmCommand called session.prompt directly, bypassing
beginSessionRequest. streamingPhase therefore stayed 'idle' until the
SDK turn.started event round-tripped back, leaving a startup window in
which a fast follow-up message was dispatched as a second concurrent
prompt and silently dropped by the core as agent_busy, and in which the
UI showed no waiting state.

Call beginSessionRequest() before prompting — flipping streamingPhase
synchronously so the input gate closes immediately and the waiting pane
shows — and failSessionRequest() on a prompt rejection, mirroring
sendSkillActivation / handleInitCommand.
The swarm card finalized every is_error tool result as 'cancelled' with
a success-toned bullet, and the dashboard suppresses the result body, so
ordinary failures (planner produced no valid plan, synthesizer error)
rendered as a clean "cancelled" with the real "Swarm failed: ..." reason
hidden from the user.

SwarmTool now distinguishes a genuine cancel (ctx.signal aborted) from an
ordinary failure: on a real failure it emits a 'failed' swarm progress
event carrying the reason before returning the error result. The TUI adds
a terminal 'failed' phase (error bullet, ' · failed' tag, and a "✗ reason"
body line); finalizeSwarmModelIfNeeded only forces 'cancelled' when the
model is not already 'failed', so a genuine abort still shows 'cancelled'.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a17cfeee2d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +693 to +697
this.swarmModel = applySwarmEvent(this.swarmModel, {
t: 'done',
succeeded: this.swarmModel.doneCount,
failed: this.swarmModel.failedCount,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve completed swarm state on replay

When a completed /swarm turn is rendered from session history, replay only reconstructs the tool call/result from messages and does not replay the live tool.progress or subagent.* events that populated swarmModel; this fallback therefore finalizes an empty initial model as done with zero workers/counts. Since Swarm cards also suppress the normal tool result body, every resumed completed swarm shows an inaccurate 0 workers · 0✓ 0✗ dashboard instead of the actual worker outcome. Please either persist/replay the swarm progress state or fall back to rendering the result body when no worker rows were reconstructed.

Useful? React with 👍 / 👎.

Comment on lines +1748 to +1750
if (this.swarmModel !== undefined) {
this.buildSwarmBody();
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move Swarm rendering out of ToolCallComponent

apps/kimi-code/AGENTS.md says new tool-result display should extend components/messages/tool-renderers/registry.ts and the corresponding renderer, and should not stack branches inside ToolCallComponent; this branch adds the Swarm-specific dashboard branch directly in the central component. That makes future tool-specific UI continue to accumulate in this already-large class instead of the documented renderer path, so please move the Swarm display behind a dedicated renderer/component boundary.

Useful? React with 👍 / 👎.

// dropped as agent_busy.
host.beginSessionRequest();
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Show the swarm request in the transcript

This starts a real model turn but, unlike the normal send path, never appends the user's /swarm task to the live transcript before calling session.prompt. In a live session the user sees a Swarm tool card with no preceding user request, and after resume the replayed user message comes from the internal buildSwarmPrompt(...) wrapper instead of the command/task the user actually entered; add an explicit transcript entry for the swarm request before dispatching the prompt.

Useful? React with 👍 / 👎.

Comment on lines +33 to +35
typeof o['role'] !== 'string' ||
typeof o['systemPrompt'] !== 'string' ||
typeof o['prompt'] !== 'string'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty planner subtask fields

When the planner returns syntactically valid JSON but leaves role, systemPrompt, or prompt as an empty string, this parser accepts the plan instead of retrying, so the coordinator can spawn a swarm: worker with no role/instructions and synthesize arbitrary or useless output. Treat trimmed-empty required fields as invalid here, matching the stricter reviser parsing, so the existing planner retry handles malformed plans.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant