Skip to content

feat(subagent): add API key pool for parallel subagent execution#2369

Open
Liewzheng wants to merge 23 commits into
MoonshotAI:mainfrom
Liewzheng:dev-feat-subagent-optimized
Open

feat(subagent): add API key pool for parallel subagent execution#2369
Liewzheng wants to merge 23 commits into
MoonshotAI:mainfrom
Liewzheng:dev-feat-subagent-optimized

Conversation

@Liewzheng
Copy link
Copy Markdown

@Liewzheng Liewzheng commented May 26, 2026

Related Issue

Closes #2368

Description

1. API Key Pool for Parallel Subagent Execution

This PR introduces APIKeyPool (src/kimi_cli/llm_key_pool.py), a round-robin API key allocator designed for parallel subagent execution.

  • Key Collection: APIKeyPool.from_env("KIMI_API_KEY") collects keys from KIMI_API_KEY, KIMI_API_KEY_1, KIMI_API_KEY_2, … up to KIMI_API_KEY_99. A pool is only created when ≥2 keys are found; otherwise it returns None and subagents fall back to the root runtime's key.
  • Round-Robin Allocation: acquire() returns the next key in rotation, ensuring concurrent subagents are distributed across different API keys instead of hammering a single key's rate-limit quota.

2. KeyPoolKimi Wrapper

Added KeyPoolKimi (src/kimi_cli/llm.py), a thin wrapper around the Kosong Kimi provider:

  • Forwards all public attributes and methods (generate, with_thinking, with_generation_kwargs, with_extra_body, etc.)
  • On retryable errors (429, 500, 503), on_retryable_error() swaps in the next key from the pool and recreates the HTTP client
  • Provides a provider property for unwrapping when type checks are needed

3. Subagent User-Agent Attribution

SubagentBuilder now injects a subagent-specific User-Agent header:

User-Agent: KimiCLI/1.44.0 (subagent: coder)

This allows the Kimi backend to distinguish root agent calls from subagent calls for monitoring and attribution.

4. Foreground Concurrency Limit

Added _max_foreground_concurrency() in src/kimi_cli/tools/agent/__init__.py:

  • When a key pool is configured, the limit is max(1, int(key_count * 0.8))
  • Without a key pool, it falls back to max(1, int(background.max_running_tasks * 0.8))
  • The 20% headroom ensures each subagent has a high probability of receiving a fresh key and reduces rate-limit collisions
  • When the limit is reached, new foreground subagent requests are rejected with a ToolError instead of blocking

5. Configurable Foreground Timeout

Changed the default foreground subagent timeout from "no limit" to 300 seconds (5 minutes):

  • Explicit timeout parameter still takes highest priority
  • KIMI_FOREGROUND_AGENT_TIMEOUT environment variable overrides the default (set to 0 to disable)
  • Invalid env values log a warning and fall back to the 300s default

6. OAuth Compatibility Fix

Fixed oauth.py _apply_access_token() which had a hard isinstance(chat_provider, Kimi) assertion. When KeyPoolKimi wraps the provider, the assertion failed and crashed the subagent. The fix unwraps KeyPoolKimi before the type check:

chat_provider = runtime.llm.chat_provider
if isinstance(chat_provider, KeyPoolKimi):
    chat_provider = chat_provider.provider
assert isinstance(chat_provider, Kimi), "Expected Kimi chat provider"

7. KIMI_MODEL_THINKING_KEEP Compatibility Fix

The thinking_keep check also used a bare isinstance(chat_provider, Kimi) assertion. When key pooling is enabled, the provider is wrapped in KeyPoolKimi, so the check never matched and the extra body was never injected. Fixed by unwrapping before the type check.

8. clone_llm_with_model_alias Fix

When model_alias=None but api_key_override, key_pool, or extra_headers were passed, the function incorrectly returned the original LLM unchanged. This meant subagents could not receive a distinct API key when no model override was requested. Fixed to recreate the LLM from stored provider_config/model_config when override parameters are present.

9. Logger UnboundLocalError Fix

Removed a local from kimi_cli.utils.logging import logger inside create_llm that shadowed the module-level import. When api_key_override was set but oauth was None, the local import was skipped and the subsequent logger.info(...) call raised UnboundLocalError.

10. Default HTTP Timeout for OpenAI Client

Added a default httpx.Timeout(connect=10, read=300, write=30, pool=30) in openai_common.py to prevent indefinite API hangs when no timeout is configured.

11. Subagent Live Visualization

Enhanced the shell UI to show subagent progress in real time:

  • Step counter with elapsed time (Step 2 ⠋ 12s)
  • Live preview of the subagent's streamed text output (last 3 lines, max 800 chars)
  • _LiveView handles TextPart and StepBegin events from the subagent wire stream

Breaking Changes

  • Default foreground subagent timeout: Previously foreground subagents had no timeout (ran until completion). Now the default is 300 seconds. Users who rely on long-running foreground subagents should set KIMI_FOREGROUND_AGENT_TIMEOUT=0 or pass an explicit timeout parameter.
Category Files
Core src/kimi_cli/llm.py, src/kimi_cli/llm_key_pool.py, src/kimi_cli/app.py, src/kimi_cli/soul/agent.py
Subagent src/kimi_cli/subagents/builder.py, src/kimi_cli/tools/agent/__init__.py
Auth src/kimi_cli/auth/oauth.py
UI src/kimi_cli/ui/shell/visualize/_blocks.py, src/kimi_cli/ui/shell/visualize/_live_view.py
Deps packages/kosong/src/kosong/chat_provider/openai_common.py
Tests tests/core/test_llm_key_pool.py, tests/core/test_key_pool_kimi.py, tests/core/test_foreground_concurrency.py, tests/core/test_foreground_agent_timeout.py, tests/core/test_subagent_builder.py, tests/tools/test_tool_schemas.py, tests/core/test_default_agent.py, tests/ui/test_usage.py
Docs CHANGELOG.md, docs/en/configuration/env-vars.md, docs/en/customization/agents.md, docs/en/release-notes/changelog.md, docs/zh/configuration/env-vars.md, docs/zh/customization/agents.md, docs/zh/release-notes/changelog.md

Testing

  • Added test_llm_key_pool.py: verifies APIKeyPool.from_env key collection, round-robin ordering, and empty-pool behavior
  • Added test_key_pool_kimi.py: verifies KeyPoolKimi attribute forwarding and on_retryable_error key rotation
  • Added test_foreground_concurrency.py: verifies concurrency cap enforcement and ToolError rejection when limit is reached
  • Added test_foreground_agent_timeout.py: verifies env var override, default 300s, 0 = no limit, and invalid env fallback
  • Updated test_subagent_builder.py: adjusted mock signature for clone_llm_with_model_alias
  • Updated test_tool_schemas.py: refreshed inline snapshot for new timeout default
  • Updated test_default_agent.py: updated agent description assertion
  • Updated test_usage.py: adjusted for rich 14.x grey23 empty-segment filtering

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open in Devin Review

@Liewzheng Liewzheng changed the title Dev feat subagent optimized feat(subagent): add API key pool for parallel subagent execution May 26, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +24 to +26
api_key_override: str | None = None
if self._root_runtime.key_pool is not None:
api_key_override = self._root_runtime.key_pool.acquire()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 SubagentBuilder injects KIMI pool keys into non-Kimi provider API calls

In SubagentBuilder.build_builtin_instance, when a key pool exists (runtime.key_pool is not None), a key is unconditionally acquired from the pool and passed as api_key_override regardless of the provider type. The pool only contains KIMI_API_KEY* values. If the subagent's effective model resolves to a non-Kimi provider (e.g., the user sets model to an OpenAI or Anthropic model alias in the Agent tool call), clone_llm_with_model_alias at src/kimi_cli/llm.py:417-425 passes this KIMI key to create_llm, where resolved_api_key = api_key_override is applied unconditionally at src/kimi_cli/llm.py:228. The resulting provider (OpenAI, Anthropic, etc.) would authenticate with an invalid KIMI API key, causing all API calls to fail with authentication errors.

Prompt for agents
The key pool acquisition in SubagentBuilder.build_builtin_instance should only happen when the effective provider is a Kimi provider. Currently, api_key_override is set from the pool unconditionally. The fix should check whether the provider that will be used for this subagent is of type 'kimi' before acquiring a key from the pool.

Approach: Before acquiring from the pool, determine the effective provider type. If effective_model is None, the subagent inherits the root runtime's provider — check self._root_runtime.llm.provider_config.type. If effective_model is not None, look up the model in config to get its provider type. Only acquire from the pool if the provider type is 'kimi'.

Relevant code paths:
- src/kimi_cli/subagents/builder.py:20-26 (key acquisition)
- src/kimi_cli/llm.py:228-240 (api_key_override applied unconditionally)
- src/kimi_cli/llm.py:252-253 (KeyPoolKimi wrapping only in kimi case)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

@Liewzheng Liewzheng May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0546f6ce.

SubagentBuilder.build_builtin_instance now determines the effective provider type before acquiring from the key pool:

  • If effective_model is None, it checks runtime.llm.provider_config.type
  • If effective_model is set, it looks up the model in config.models, then resolves its provider from config.providers, and reads provider.type
  • Only when the resolved provider type is "kimi" does it call key_pool.acquire() and pass both api_key_override and key_pool to clone_llm_with_model_alias
  • For any non-kimi provider, both values stay None and the subagent falls back to the provider's own API key

Added tests:

  • test_builder_skips_key_pool_for_non_kimi_provider — root runtime LLM has openai_legacy provider → no key injected
  • test_builder_skips_key_pool_when_effective_model_is_non_kimieffective_model resolves to a non-kimi model → no key injected
  • test_builder_uses_key_pool_for_kimi_provider — kimi provider → key is acquired as before

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +181 to +191
max_concurrent = _max_foreground_concurrency(self._runtime)
running = _count_running_foreground(self._runtime)
if running >= max_concurrent:
return ToolError(
message=(
f"Too many foreground subagents are already running "
f"({running}/{max_concurrent}). Please wait for one to finish "
f"before starting another."
),
brief="Concurrency limit reached",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 TOCTOU race in foreground concurrency check allows exceeding the limit

The foreground concurrency check at src/kimi_cli/tools/agent/__init__.py:181-191 reads the count of running_foreground instances, but the subagent's status is not updated to running_foreground until inside runner.run(req) (after an await at line 204-205). When the LLM issues multiple concurrent Agent tool calls in a single step (dispatched via asyncio.gather), all coroutines can pass the concurrency check before any of them marks its status as running_foreground. For example, with a limit of 4, the model could issue 6 parallel Agent calls and all 6 would pass the check (seeing 0 running), exceeding the intended limit.

Prompt for agents
The concurrency check at lines 181-191 of src/kimi_cli/tools/agent/__init__.py suffers from a time-of-check-to-time-of-use (TOCTOU) race because the subagent status is only set to running_foreground inside runner.run(req), which is awaited after the check. Multiple concurrent Agent tool calls can all pass the check before any of them updates the store.

Possible fix: Eagerly create the subagent instance and set its status to running_foreground BEFORE the await, similar to how _run_in_background marks running_background synchronously before dispatching (lines 280-286). This would require creating the instance record in the foreground path as well, and rolling it back on failure. Alternatively, use a simple asyncio counter (an integer incremented before the await and decremented after) to track in-flight foreground subagents.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0546f6ce.

The root cause was that _count_running_foreground only counted instances whose status was already "running_foreground", but that status was only set inside runner.run — after multiple await points. When the LLM issued concurrent Agent calls via asyncio.gather, every coroutine passed the check before any of them updated the store.

Fix: Eagerly create the instance and set its status to "running_foreground" before the await runner.run:

  1. ForegroundSubagentRunner._prepare_instance is now a synchronous public method prepare_instance
  2. AgentTool.__call__ calls prepare_instance and update_instance(status="running_foreground") right after the concurrency check, before await runner.run(req, prepared)
  3. runner.run accepts an optional prepared argument so it skips the internal prepare step

This mirrors the background-task pattern already used in _run_in_background, where the status is marked synchronously before dispatching the async work.

Added tests:

  • test_agent_tool_eagerly_sets_running_foreground_before_await — asserts that when runner.run is entered, the instance status is already "running_foreground"
  • test_concurrent_agent_calls_respect_limit_after_toctou_fix — spawns a hanging foreground runner, yields control, then verifies the second concurrent Agent call is rejected by the limit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0546f6ce13

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/llm.py
Comment on lines +252 to +253
if key_pool is not None:
chat_provider = KeyPoolKimi(chat_provider, key_pool)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Kimi provider identity in key-pool wrapper

Wrapping Kimi with KeyPoolKimi changes llm.chat_provider from a Kimi instance to a different type, which breaks downstream isinstance(..., Kimi) paths used outside the two patched callsites. For example, ReadMedia only uses Kimi's server-side video upload when that check passes (src/kimi_cli/tools/file/read_media.py), so keyed subagents now fall back to inlined data URLs for videos, which can bloat prompts and fail for larger media. Make the wrapper transparent to Kimi-specific checks (or update all Kimi type checks to unwrap) to avoid this regression.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

Added KeyPoolKimi unwrapping in the two remaining isinstance(..., Kimi) checks that were missed in the initial PR:

  • src/kimi_cli/tools/file/read_media.py:121 — Video upload now correctly uses the server-side files.upload_video API even when the provider is wrapped by KeyPoolKimi
  • src/kimi_cli/wire/server.py:625 — Web UI User-Agent suffix injection now works correctly through the wrapper

Pattern matches the existing unwraps in llm.py (thinking_keep) and auth/oauth.py (token refresh).

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +53 to +54
if runtime.key_pool is not None:
return max(1, int(runtime.key_pool.key_count * 0.8))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit key-pool concurrency only for Kimi executions

_max_foreground_concurrency applies the key-pool cap whenever runtime.key_pool exists, regardless of the active subagent provider. Since runtime.key_pool is initialized from KIMI_API_KEY* env vars globally, non-Kimi sessions can be throttled incorrectly (e.g., OpenAI subagents being capped by unrelated Kimi key count). This should be conditioned on the effective provider for the pending foreground run, not just pool presence.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

_max_foreground_concurrency now takes an optional provider_type parameter:

  • provider_type == "kimi" + key_pool present → max(1, int(key_pool.key_count * 0.8))
  • Any other provider type (or no key pool) → max(1, int(max_running_tasks * 0.8))

AgentTool.__call__ resolves the effective provider type (from params.model if overridden, otherwise from runtime.llm.provider_config.type) and passes it into the concurrency check. Non-Kimi subagents are no longer throttled by an unrelated Kimi key count.

Updated tests:

  • test_key_pool_based_limit_for_kimi / test_key_pool_single_key_fallback_to_one_for_kimi — explicit provider_type="kimi"
  • test_key_pool_ignored_for_non_kimi — key pool exists but non-kimi provider uses max_running_tasks cap

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +206 to +208
prepared = runner.prepare_instance(req)
agent_id = prepared.record.agent_id
store.update_instance(agent_id, status="running_foreground")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle foreground prepare failures inside AgentTool error path

runner.prepare_instance(req) is now executed before entering the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) bypass the tool's ToolError handling and surface as uncaught exceptions to the generic tool runtime wrapper. This regresses user-facing error quality and consistency compared with the prior path. Move preparation into the guarded block (or catch around it) so these expected failures return structured ToolErrors.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

runner.prepare_instance(req) and the subsequent update_instance(status="running_foreground") are now back inside the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) are caught by the generic except Exception handler and converted to structured ToolErrors instead of surfacing as uncaught exceptions.

agent_id is pre-initialized to None before the try block; the except handlers only touch the store when agent_id is not None, ensuring a failed prepare_instance does not trigger a secondary NameError or UnboundLocalError.

The eager status update (TOCTOU fix from 0546f6ce) is preserved because prepare_instance + update_instance still execute synchronously before the await runner.run.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

"Creating LLM with overridden API key {prefix}... for provider {provider_type}",
prefix=resolved_api_key[:16],
provider_type=provider.type,

P1 Badge Remove API key prefix logging from LLM creation

This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.


provider_type = _resolve_effective_provider_type(self._runtime, params.model)

P2 Badge Compute provider type from resume target before capping concurrency

The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.


source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

P1 Badge Keep uv.lock registry URLs on the default PyPI index

This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@Liewzheng
Copy link
Copy Markdown
Author

💡 Codex Review

"Creating LLM with overridden API key {prefix}... for provider {provider_type}",
prefix=resolved_api_key[:16],
provider_type=provider.type,

P1 Badge Remove API key prefix logging from LLM creation
This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.

provider_type = _resolve_effective_provider_type(self._runtime, params.model)

P2 Badge Compute provider type from resume target before capping concurrency
The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.

source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

P1 Badge Keep uv.lock registry URLs on the default PyPI index
This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

💡 Remove API key prefix logging from LLM creation

Fixed in 2fb02e9c.

Removed the prefix=resolved_api_key[:16] parameter from the logger.info call in create_llm. The log line now only records the provider type, not any portion of the API key.

# Before
logger.info(
    "Creating LLM with overridden API key {prefix}... for provider {provider_type}",
    prefix=resolved_api_key[:16],
    provider_type=provider.type,
)

# After
logger.info(
    "Creating LLM with overridden API key for provider {provider_type}",
    provider_type=provider.type,
)

💡 Compute provider type from resume target before capping concurrency

Fixed in 2fb02e9c.

AgentTool.__call__ now resolves the effective model before computing the concurrency cap:

  • If params.resume is set, it looks up the stored instance and uses launch_spec.model_override or launch_spec.effective_model
  • If params.model is set (and not resuming), it uses the explicit model override
  • Otherwise it falls back to params.model (which is None, causing _resolve_effective_provider_type to use the root runtime's provider)

This ensures that resuming a non-Kimi subagent is not incorrectly throttled by the Kimi key-pool limit.


💡 Keep uv.lock registry URLs on the default PyPI index

Fixed in 2fb02e9c.

Restored uv.lock to the origin/main version. The aliyun mirror URLs were an unintentional side-effect of local uv sync configuration; the PR itself does not modify any dependencies, so uv.lock should not change.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b026853e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
if isinstance(exc.__cause__, asyncio.CancelledError):
logger.warning("Foreground agent timed out after {t}s", t=timeout)
if agent_id is not None:
store.update_instance(agent_id, status="idle")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep timed-out foreground agents marked as killed

When asyncio.wait_for times out, runner.run is cancelled and already records the instance as "killed", but this handler immediately overwrites it to "idle". That state mismatch makes interrupted runs look like normal idle instances in status views/metrics and can mislead follow-up automation that relies on subagent status semantics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4e8d6ac1.

Removed the store.update_instance(agent_id, status="idle") line from the TimeoutError handler in AgentTool.__call__.

When asyncio.wait_for cancels runner.run on timeout, the runner's internal except asyncio.CancelledError block already marks the instance as "killed" (and re-raises). Overwriting that to "idle" in the outer handler made interrupted runs appear as normal idle instances in status views. The instance now correctly stays "killed".

Comment thread src/kimi_cli/subagents/builder.py Outdated
logger.info(
"Subagent {agent_id} assigned API key {prefix}... (pool size {n})",
agent_id=agent_id,
prefix=api_key_override[:16],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove API key material from subagent assignment logs

This log line writes the first 16 characters of each assigned API key to plaintext logs. Even partial key disclosure is sensitive in shared CI/support log pipelines and creates avoidable credential leakage risk; key assignment should be logged without exposing any key bytes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4e8d6ac1.

Removed the prefix=api_key_override[:16] parameter from the logger.info call in SubagentBuilder.build_builtin_instance.

# Before
logger.info(
    "Subagent {agent_id} assigned API key {prefix}... (pool size {n})",
    agent_id=agent_id,
    prefix=api_key_override[:16],
    n=self._root_runtime.key_pool.key_count,
)

# After
logger.info(
    "Subagent {agent_id} assigned API key from pool (pool size {n})",
    agent_id=agent_id,
    n=self._root_runtime.key_pool.key_count,
)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e8d6ac18e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/file/read_media.py Outdated
@@ -119,7 +119,12 @@ async def _read_media(self, path: KaosPath, file_type: FileType) -> ToolReturnVa
case "video":
data = await path.read_bytes()
if (llm := self._runtime.llm) and isinstance(llm.chat_provider, Kimi):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Unwrap KeyPoolKimi before the Kimi video check

The current diff still checks llm.chat_provider for Kimi before the new unwrap runs, so when key pooling wraps the provider as KeyPoolKimi this branch is skipped and Kimi subagents fall back to embedding video data URLs instead of using files.upload_video. Fresh evidence after the earlier thread is that the unwrap was added inside the branch, but the outer guard remains on the wrapped object; move the unwrap before this isinstance check.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/llm.py
async def generate(self, *args: Any, **kwargs: Any) -> Any:
return await self._provider.generate(*args, **kwargs)

def on_retryable_error(self, error: BaseException) -> bool:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rotate pooled keys before retrying status errors

This new wrapper only rotates keys when on_retryable_error is called, but _run_with_connection_recovery currently invokes that hook only for APIConnectionError/APITimeoutError; 429/500/503 APIStatusErrors are retried by tenacity with the same client and same exhausted key. In the rate-limit scenario this feature targets, a subagent that receives HTTP 429 will keep retrying the same key even though the pool has other keys available, so status-error retries need to trigger the same rotation path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bddb1c3f7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# Enforce foreground concurrency limit (80% of system capacity).
# For resumed instances, derive the provider type from the stored launch spec
# so that non-Kimi resumed subagents are not capped by an unrelated key pool.
effective_model = params.model
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve default subagent model before applying Kimi cap

For a new foreground launch with no explicit model, this leaves effective_model as None, so _resolve_effective_provider_type falls back to the root provider even though prepare_instance later records req.model or type_def.default_model and SubagentBuilder.resolve_effective_model will run the subagent on its type default. In a Kimi-root session with a key pool, a subagent type whose default model is OpenAI/Anthropic is therefore treated as Kimi for the foreground key-pool cap and can be rejected at the much lower Kimi limit before it ever starts.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
effective_model = launch.model_override or launch.effective_model
provider_type = _resolve_effective_provider_type(self._runtime, effective_model)
max_concurrent = _max_foreground_concurrency(self._runtime, provider_type)
running = _count_running_foreground(self._runtime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only matching-provider foreground runs for the Kimi cap

running counts every running_foreground instance, but max_concurrent may be the Kimi key-pool capacity for the pending request. When an OpenAI/default-model foreground subagent is already running and a Kimi subagent is launched with a small key pool (for example two keys gives a limit of one), the Kimi request is rejected as 1/1 even though no Kimi key is currently in use. The count needs to be scoped to the same effective provider/key-pool class as the cap being enforced.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 681ee768c4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if timeout is not None:
return await asyncio.wait_for(runner.run(req), timeout=timeout)
return await runner.run(req)
return await asyncio.wait_for(runner.run(req, prepared), timeout=timeout)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear pre-run timeouts from running_foreground

When the foreground timeout fires while runner.run() is still in its pre-execution preparation (for example prepare_soul() awaiting explore-agent git context collection), the runner has not reached its except asyncio.CancelledError block that marks the instance killed. Because this code now marks the instance running_foreground before the wait_for, the timeout handler returns without changing that status, leaving the subagent counted as running and causing later Agent calls to hit the concurrency limit until restart. Mark the record killed/failed here when it is still running_foreground.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2b1f2c9a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/llm.py Outdated
Comment on lines +145 to +148
async def generate(self, *args: Any, **kwargs: Any) -> Any:
try:
return await self._provider.generate(*args, **kwargs)
except APIStatusError as exc:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rotate keys for streaming status retries

When a retryable Kimi status error is raised while consuming the streaming response, this try has already returned the KimiStreamedMessage, so the wrapper never calls on_retryable_error; KimiStreamedMessage._convert_stream_response converts OpenAIError/httpx.HTTPError into APIStatusError during async for, and KimiSoul will retry 429/500/503/504 with the same exhausted client/key. Fresh evidence after the earlier status-retry fix is that only stream creation is covered here, not iteration, so pooled subagents can still hammer one key on streaming server/rate-limit failures.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

chat_provider.client.api_key = access_token or fallback_api_key

P2 Badge Preserve pooled API keys during OAuth refresh

When a logged-in Kimi configuration also sets KIMI_API_KEY_* for the new pool, each subagent is cloned with a pooled api_key_override, but KimiSoul.run() calls ensure_fresh() before the turn and this line overwrites the wrapped Kimi client's key with the shared OAuth access token. In that environment the first request from every supposedly pooled subagent uses the same OAuth credential, so the new parallel-key distribution is bypassed until a retry happens; skip applying OAuth tokens to key-pooled/API-key-overridden clients or mark those clones as API-key auth.


old_key = getattr(self._provider, "_api_key", None)
if old_key is not None:
self._key_pool.record_failure(old_key)

P2 Badge Reset pooled key failures after successful calls

With key health tracking enabled, failures are recorded here but no success path ever calls APIKeyPool.reset_key() (the only other reference is in tests). In long-running sessions with occasional transient 429/5xx responses, isolated failures are treated as consecutive forever, so a key can progress to 5-minute or 30-minute cooldowns even after many successful uses, unnecessarily shrinking the pool and lowering subagent capacity; reset the key after a successful request/stream for the key that was actually used.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Liewzheng added 15 commits May 27, 2026 20:50
- Introduce APIKeyPool for round-robin key rotation across subagents
- Add KeyPoolKimi wrapper to retry with fresh keys on rate limits
- Inject subagent type into User-Agent for backend attribution
- Add foreground concurrency limit (80% of key count or task slots)
- Add configurable foreground timeout via KIMI_FOREGROUND_AGENT_TIMEOUT
- Fix oauth.py assertion for KeyPoolKimi-wrapped providers
…l concurrency to kimi; guard prepare_instance failures
- read_media.py: unwrap KeyPoolKimi BEFORE isinstance(Kimi) guard so
  video uploads correctly use server-side files.upload_video API.
- llm.py: intercept APIStatusError (429/500/503) inside KeyPoolKimi.generate()
  to rotate the pooled key BEFORE re-raising for tenacity retry.
- tests: add parametrized tests for retryable status codes and a negative
  test for non-retryable codes (400).
… for kimi cap

- Resolve default subagent model before applying Kimi cap: when no explicit
  model is given, look up the built-in type's default_model so that a kimi
  default correctly triggers the key-pool cap.
- Count only matching-provider foreground runs for the Kimi cap:
  _count_running_foreground now optionally filters by provider_type so that
  openai/anthropic/etc. subagents do not consume the kimi key-pool quota.
- Add tests for provider-filtered counting and default-model resolution.
When asyncio.wait_for times out while runner.run() is still in its
pre-execution preparation (e.g. prepare_soul awaiting explore-agent git
context collection), the runner's except asyncio.CancelledError block
that marks the instance 'killed' may not have run yet.  Because we now
eagerly mark the instance 'running_foreground' before wait_for, the
TimeoutError handler must clean up the status when it is still
'running_foreground', otherwise the subagent remains counted as running
and later Agent calls hit the concurrency limit.

- In AgentTool.__call__, when TimeoutError.__cause__ is CancelledError,
  check if the instance is still 'running_foreground' and mark it 'killed'.
- Add test_timeout_cleans_running_foreground_status to verify the cleanup.
Centralize key rotation for retryable APIStatusError (429/500/502/503/504)
in KimiSoul._run_with_connection_recovery instead of KeyPoolKimi.generate.

- KeyPoolKimi.generate no longer catches APIStatusError; this prevents double
  rotation when generate() fails and the error is also caught by the soul.
- _run_with_connection_recovery now treats retryable status errors the same
  way as connection errors: calls chat_provider.on_retryable_error once per
  tenacity attempt, then re-enters with _status_retried=True.
- This covers both initial generate() failures AND mid-stream failures during
  async iteration, which were previously missed by the generate() wrapper.

Tests:
- Update test_step_status_error to expect recovery_calls == 1.
- Add test_step_mid_stream_status_error_triggers_recovery with a
  RetryableChatProvider that yields partial stream then errors, verifying
  on_retryable_error is called before the retry.
- Remove obsolete KeyPoolKimi.generate APIStatusError tests.
…i_provider

Add a public  utility in  and replace the
four manual unwrap sites (oauth.py, read_media.py, server.py, llm.py's
thinking_keep check) so that future  checks do not
need to remember to unwrap the wrapper.
…APIKeyPool

APIKeyPool now tracks per-key failure state:
- record_failure(key) applies exponential cooldown: 30s -> 5min -> 30min
- acquire() skips keys that are in cooldown, falling back to round-robin
  only when every key is cooling down
- reset_key(key) clears the failure state

KeyPoolKimi.on_retryable_error records the old key's failure before
rotating to the next one, so exhausted keys are not immediately reused.

Tests cover cooldown skipping, all-cooldown fallback, exponential
progression, and reset behaviour.
When a key pool is active, _apply_access_token must not overwrite
chat_provider.client.api_key, because the pool (not OAuth) manages
the key. Overwriting would silently replace the pooled key with the
OAuth token, breaking subsequent rotation.
When a key's cooldown period expires and it is re-acquired, reset its
consecutive_failures counter back to zero. Without this, recovered keys
start from higher cooldown tiers on their next failure, even though they
have demonstrated recovery.
@Liewzheng Liewzheng force-pushed the dev-feat-subagent-optimized branch from ed11ebb to ee7452c Compare May 27, 2026 13:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7452cb80

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/auth/oauth.py Outdated
Comment on lines +1092 to +1093
if runtime.key_pool is not None:
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Only skip OAuth refresh for wrapped pooled providers

When a user is logged in with the managed OAuth Kimi provider and also sets multiple KIMI_API_KEY* values for subagents, the root runtime.key_pool is non-null even though the root runtime.llm.chat_provider was created before the pool and is not a KeyPoolKimi. In that setup a 401 refresh updates the token cache but this early return leaves the root client's api_key on the expired token, so the retry keeps failing; gate this on the active chat provider actually being the key-pool wrapper rather than on the global pool existing.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/llm_key_pool.py Outdated
# Cooldown expired — reset the key to healthy.
self._states[key] = _KeyState()
return key
# All keys in cooldown — fall back to the current slot.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Advance the fallback slot when all keys are cooling down

If every key is in cooldown, the loop wraps _index back to its original value and this fallback returns that same key without advancing it. Subsequent retries while the pool is still fully cooling down will therefore hammer one key instead of round-robin falling back across the pool, which defeats the rotation behavior exactly after widespread 429/5xx failures.

Useful? React with 👍 / 👎.

Liewzheng added 2 commits May 27, 2026 21:24
When every key is in cooldown, the fallback path returned the same key
without advancing _index. Subsequent acquire() calls would hammer that
single key instead of round-robin falling back across the pool.
Gating _apply_access_token on runtime.key_pool is too broad: a user may
have a key pool configured for subagents while the root runtime still
uses an unwrapped OAuth-based provider. In that case a 401 refresh would
update the token cache but leave the root client's api_key on the expired
token, causing retries to keep failing.

Now we check isinstance(runtime.llm.chat_provider, KeyPoolKimi) so the
skip only applies when the active provider is actually managed by the pool.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53efcfde23

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

| `resume` | string | 可选的 Agent 实例 ID,用于恢复现有实例 |
| `run_in_background` | bool | 是否在后台运行,默认 false |
| `timeout` | int | 超时时间(秒),范围 30–3600。前台默认无超时(运行到完成),后台默认 15 分钟;超时后任务会被停止 |
| `timeout` | int | 超时时间(秒),范围 30–3600。前台默认 300 秒(或由 `KIMI_FOREGROUND_AGENT_TIMEOUT` 指定),后台默认 15 分钟;超时后任务会被停止。设置为 `0` 可关闭前台超时限制 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align the documented timeout escape hatch with the schema

When users follow this new table and pass timeout: 0 to Agent to disable the foreground timeout, the tool call is rejected before it reaches _resolve_foreground_timeout because Params.timeout is constrained with ge=30 in src/kimi_cli/tools/agent/__init__.py. The 0 escape hatch currently only works through KIMI_FOREGROUND_AGENT_TIMEOUT, so either allow 0 in the parameter schema or remove this instruction from the docs.

Useful? React with 👍 / 👎.

The docs stated that passing timeout: 0 disables the foreground timeout,
but Params.timeout was constrained with ge=30, so Pydantic rejected the
value before it ever reached _resolve_foreground_timeout.

- Change ge=30 to ge=0 in the Field constraint
- Treat params_timeout == 0 as 'disabled' (same semantics as
  KIMI_FOREGROUND_AGENT_TIMEOUT=0)
- Update inline snapshots in test_default_agent and test_tool_schemas
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e4267f6fd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"KIMI_FOREGROUND_AGENT_TIMEOUT. Background: default from config "
"(15min), max 3600s (1hr). The agent is stopped if it exceeds "
"this limit."
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject zero timeout for background agents

When run_in_background=true, this relaxed schema now accepts timeout: 0, but the background path passes that value through to BackgroundAgentRunner, which treats any non-None timeout as asyncio.wait_for(..., timeout=0) and immediately marks the task timed out. Since 0 is only documented/handled as the foreground timeout escape hatch, background Agent calls can now create tasks that fail instantly instead of being rejected or using the configured background default.

Useful? React with 👍 / 👎.

The previous fix allowed timeout=0 in the schema for foreground agents
(to disable the timeout), but background tasks passed that 0 through to
BackgroundAgentRunner, which treats any non-None timeout as
asyncio.wait_for(..., timeout=0) and immediately marks the task timed out.

Zero timeout is now explicitly rejected in the background path with a
clear ToolError, since background tasks must have a positive timeout.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 100b9bcbcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +227 to +231
if params.resume:
record = store.get_instance(params.resume)
if record is not None:
launch = record.launch_spec
effective_model = launch.model_override or launch.effective_model
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect resume model overrides when applying provider caps

When resuming an agent with an explicit model override, this branch overwrites the already-validated params.model with the stored launch spec, but ForegroundSubagentRunner.run() later does run the resumed agent with req.model. In a Kimi-key-pool session this can reject an OpenAI/Anthropic override under the Kimi cap, or let a Kimi override bypass that cap, because the concurrency check classifies the pending run using the old stored provider rather than the model that will actually execute.

Useful? React with 👍 / 👎.

When resuming an agent with an explicit model override, the concurrency
check was using the stored launch_spec's old model instead of the override,
which could reject an OpenAI/Anthropic override under the Kimi cap, or let
a Kimi override bypass that cap.

Fix: only fall back to the stored launch spec when no override is given.

Also fixes _count_running_foreground to filter by provider type for ALL
providers, not just kimi. Previously non-kimi caps were incorrectly
consumed by unrelated-provider subagents because the filter was only
applied when provider_type == 'kimi'.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60d70eb6dc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +89 to +90
instance_provider = _resolve_effective_provider_type(runtime, r.launch_spec.effective_model)
if instance_provider == provider_type:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count resumed override runs under the executing provider

Fresh evidence after the resume-override fix is that running counts still classify each active instance from the persisted launch_spec.effective_model, but ForegroundSubagentRunner.run() applies a resume model override only to a local launch_spec and SubagentStore.update_instance() never persists that override. If an old OpenAI/default instance is resumed with a Kimi model while a Kimi key pool cap is active, this count ignores the running Kimi execution, so subsequent Kimi foreground agents can bypass the key-pool concurrency limit.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb47dfcbbb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +240 to +244
store.update_instance(
params.resume,
launch_spec=replace(
launch,
model_override=effective_model,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Defer persisting resume model override until launch succeeds

When resuming with an explicit model, this updates the stored launch spec before the request has passed the foreground concurrency check or prepare_instance's running-instance validation. If the override call is rejected because the provider cap is full, or because the target agent is already running_foreground/running_background, no new execution happens but the instance metadata is still changed, so later resumes inherit a model that was never successfully launched and active-provider counting can be skewed for an already-running agent. Persist the override only after the resume is accepted, or roll it back on these early failure paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bcd5f0bf1b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

max_concurrent = _max_foreground_concurrency(self._runtime, provider_type)
# Count only matching-provider runs so that a cap for one provider is not
# consumed by subagents running on a different provider.
running = _count_running_foreground(self._runtime, provider_type)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the non-Kimi foreground cap global

When the pending run is not using the Kimi key-pool cap, this still filters the running count by provider type while max_concurrent comes from the global background.max_running_tasks capacity. In a mixed-provider session, for example with max_running_tasks=4 (foreground cap 3), three OpenAI foreground agents can already be running and an Anthropic foreground launch will see running == 0 and be allowed, bypassing the intended global foreground limit. Scope the count by provider only when enforcing the Kimi key-pool cap; otherwise count all running_foreground instances.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: Foreground subagents exhaust single API key rate limit, causing 429 errors and hangs

1 participant