feat: Anthropic CoT controls by odilitime · Pull Request #417 · elizaOS/cloud

odilitime · 2026-03-27T04:12:43Z

Align check-types-split with packages/db, packages/lib, and UI components; exclude Storybook stories from the UI chunk.
Add check-types:tests to CI; add root scripts for UI, agent-server, and gateway typechecks.
packages/ui: path aliases for @/lib, @/db, @/app, @/packages/ui; styled-jsx attrs for <style jsx>.
packages/lib: JWKS uses jose KeyLike; runtime-factory imports plugin-sql default with typed adapter factory.
gateway-discord: explicit discord.js handler types, connectBot characterId, sendTyping guard, listener map cast.
services/agent-server: requireInternalAuth accepts Elysia string/number status.
Tests: mocks, RequestInit/fetch preconnect, UUID casts, toMatchObject, financial addCredits, Playwright APIRequestContext, and related fixes.
New anthropic-thinking provider helper and unit test.

Note: services/operator (Pepr) is not part of the main workspace typecheck.
Made-with: Cursor

Note

Medium Risk
Changes how multiple API routes reserve credits and set maxOutputTokens/providerOptions when calling Anthropic models, which can affect billing accuracy and request acceptance. Also alters rate-limit behavior in development via a new multiplier env var, so misconfiguration could unexpectedly throttle or relax limits outside prod.

Overview
Enables per-agent Anthropic extended thinking by reading user_characters.settings.anthropicThinkingBudgetTokens, falling back to ANTHROPIC_COT_BUDGET, and optionally clamping with ANTHROPIC_COT_BUDGET_MAX; introduces shared helpers in packages/lib/providers/anthropic-thinking.ts to resolve budgets and deep-merge providerOptions without clobbering existing gateway/google settings.

Updates MCP/A2A agent chat (and several v1 AI endpoints) to apply the resolved thinking budget, adjust maxOutputTokens to satisfy Anthropic constraints, and include the budget in credit reservation/estimation; background services (SEO/promo/app builder) explicitly disable CoT (budget=0) to preserve temperature behavior. Separately, rate limiting now uses production-equivalent numeric presets in dev with an optional RATE_LIMIT_MULTIPLIER, CI adds test-project typechecking, and the PR includes targeted test/mocking hardening (notably Milady pricing mocks) plus new docs (docs/anthropic-cot-budget.md, docs/unit-testing-milady-mocks.md) and an engineering CHANGELOG.md.

^{Written by Cursor Bugbot for commit 9272304. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Per-agent Anthropic “extended thinking” token budgets for MCP/A2A chats; model calls now honor per-agent settings with env-derived default and optional max cap.
Documentation
- Added docs and roadmap entries explaining Anthropic extended-thinking configuration and unit-testing guidance.
Bug Fixes
- Rate-limit presets unified for consistent numeric behavior across environments.
Chores
- CI/typecheck expanded to include test project type checks; example env entries documented.

- Align check-types-split with packages/db, packages/lib, and UI components; exclude Storybook stories from the UI chunk. - Add check-types:tests to CI; add root scripts for UI, agent-server, and gateway typechecks. - packages/ui: path aliases for @/lib, @/db, @/app, @/packages/ui; styled-jsx attrs for <style jsx>. - packages/lib: JWKS uses jose KeyLike; runtime-factory imports plugin-sql default with typed adapter factory. - gateway-discord: explicit discord.js handler types, connectBot characterId, sendTyping guard, listener map cast. - services/agent-server: requireInternalAuth accepts Elysia string/number status. - Tests: mocks, RequestInit/fetch preconnect, UUID casts, toMatchObject, financial addCredits, Playwright APIRequestContext, and related fixes. - New anthropic-thinking provider helper and unit test. Note: services/operator (Pepr) is not part of the main workspace typecheck. Made-with: Cursor

vercel · 2026-03-27T04:12:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
eliza-cloud-v2	Error		Apr 3, 2026 0:19am

coderabbitai · 2026-03-27T04:12:51Z

Walkthrough

Adds Anthropic “extended thinking” (CoT) budget support with per-agent settings and env fallbacks/caps, provider-options merge helpers, integrations into AI request calls, test and type hardening, Milady test helper refactor, rate-limit and config updates, and docs/changelog entries.

Changes

Cohort / File(s)	Summary
Anthropic Core & Types `packages/lib/providers/anthropic-thinking.ts`, `packages/lib/providers/cloud-provider-options.ts`, `packages/lib/providers/index.ts`, `packages/lib/providers/types.ts`	New Anthropic CoT utilities: env parsing, per-character parsing, resolution with optional max clamp, provider-options deep-merge helpers, exported types, and re-exports.
Env & Validator `.env.example`, `packages/lib/config/env-validator.ts`	Documented `ANTHROPIC_COT_BUDGET` / `ANTHROPIC_COT_BUDGET_MAX` in example; added validation entries and parsing behavior.
Route Integrations (MCP / A2A / Chat / Messages / Responses) `app/api/agents/[id]/mcp/route.ts`, `app/api/agents/[id]/a2a/route.ts`, `app/api/v1/chat/completions/route.ts`, `app/api/v1/chat/route.ts`, `app/api/v1/messages/route.ts`, `app/api/v1/responses/route.ts`	Handlers now parse/resolve CoT budgets (per-agent or env), compute effective max tokens, and spread merged Anthropic provider options into `streamText`/`generateText` calls.
Tooling & Generation `app/api/mcp/tools/generation.ts`, `packages/lib/api/a2a/skills.ts`, `app/api/v1/generate-image/route.ts`	Image/text generation calls updated to use model constants and merge Google modalities with Anthropic CoT via helper functions.
Service / App Integrations `packages/lib/services/*` (app-promotion, promotion-assets, seo, twitter-automation, app-builder-ai-sdk, etc.)	Spreads Anthropic CoT provider options into various generate/stream calls; some calls explicitly disable CoT via budget=0.
Provider/Gateway Merge Helpers `packages/lib/providers/anthropic-thinking.ts` helpers used across gateway/responses and provider option flows	Deep-merge helpers preserve existing `gateway` / `google` fields while adding Anthropic `thinking` fragments.
Rate Limiting `packages/lib/middleware/rate-limit.ts`	Introduces `RATE_LIMIT_MULTIPLIER`, freezes presets, uses multiplier-based `maxRequests`, and skips dev validation when disabled.
JWKS & Auth Typing `packages/lib/auth/jwks.ts`, `services/agent-server/src/routes.ts`	Replaced `JoseCryptoKey` alias with `KeyLike` in JWKS APIs; relaxed `requireInternalAuth` status typing.
Test Infrastructure & Typing Hardening many `packages/tests/...` files and `packages/tests/helpers/*`	Widespread test typing fixes, fixtures, helper exports, mock improvements, `global.fetch` casts via `unknown`, and helper `mockMiladyPricingMinimumDepositForRouteTests`.
Milady Billing Tests Refactor `packages/tests/helpers/mock-milady-pricing-for-route-tests.ts`, `packages/tests/unit/z-milady-billing-route.test.ts`, related milady tests	Introduces shared Milady mock helper (spreads real pricing), moves billing test to `z-*`, and centralizes DB/mock registration for stable ordering.
E2E / Integration Adjustments `packages/tests/e2e/setup-server.ts`, `packages/tests/integration/*`	Improved server log handling, resilient managed fetch with preconnect preservation, and several test assertion/fixture updates.
CI / Scripts / TS Config `.github/workflows/tests.yml`, `package.json`, `packages/scripts/check-types-split.ts`, `tsconfig.test.json`, `packages/ui/tsconfig.json`, `services/agent-server/package.json`	Added extra typecheck steps, split typecheck changes, new typecheck scripts, tsconfig path/exclude updates, and minor script edits.
Docs & Changelog `README.md`, `CHANGELOG.md`, `docs/anthropic-cot-budget.md`, `docs/unit-testing-milady-mocks.md`, `docs/ROADMAP.md`, `packages/content/changelog.mdx`	Adds Anthropic CoT documentation, test HOWTOs, roadmap notes, and changelog entries.
UI Typings `packages/ui/src/styled-jsx.d.ts`	Adds React module augmentation for `StyleHTMLAttributes<T>` with `jsx` and `global` props.
Misc. Small Fixes & Fixtures `packages/lib/eliza/runtime-factory.ts`, `packages/services/gateway-discord/src/gateway-manager.ts`, `packages/tests/fixtures/mcp-test-character.ts`	DB adapter import wrapper, stronger Discord typings and safe sendTyping guard, and test fixture ID update.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Route as MCP/A2A Route
    participant Resolver as CharacterResolver
    participant Anthropic as AnthropicThinking
    participant LLM as LLMProvider

    Client->>Route: POST /api/agents/{id}/mcp (chat)
    Route->>Resolver: resolve character record
    Resolver-->>Route: character (includes settings)
    Route->>Anthropic: parseThinkingBudgetFromCharacterSettings(settings)
    Anthropic-->>Route: agentThinkingBudget?
    Route->>Anthropic: resolveAnthropicThinkingBudgetTokens(model, env, agentBudget)
    Anthropic-->>Route: effectiveBudget / providerOptions fragment
    Route->>LLM: streamText({ model, prompt, maxOutputTokens, ...providerOptions })
    LLM-->>Route: streamed response
    Route-->>Client: stream completion

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Anthropic budgets weave and flow,
Merged options gently grow,
Tests steadied, types aligned,
Milady mocks now well-defined,
Docs sing what changes know. ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch odi-anthro

claude · 2026-03-27T04:16:49Z

Code Review

This PR adds Anthropic extended thinking (ANTHROPIC_COT_BUDGET) support across all AI routes, fixes TypeScript type errors in tests/packages, adds a check-types:tests CI step, and hardens the Discord gateway with explicit handler types. The implementation is generally solid with good test coverage for the new helper.

Issues:

Silent temperature stripping when thinking is enabled (medium)

app-promotion-assets.ts, app-promotion.ts, twitter-automation/app-automation.ts, and seo.ts pass explicit temperature alongside mergeProviderOptions. When ANTHROPIC_COT_BUDGET is set, @ai-sdk/anthropic silently strips temperature/topP/topK (noted in the v1/chat completions comment). Callers relying on temperature 0.7/0.8 will silently get default sampling - an invisible behavior change. Consider logging a warning at startup when both are configured.

mergeProviderOptions silently overwrites unknown provider keys (low-medium)

The function deep-merges only gateway, anthropic, and google. Any other provider key present in both a and b (e.g. openai, azure) is overwritten by the shallow spread without merging. Worth a comment flagging this limitation.

Rate limit change may break local development (medium)

Removing the isDevelopment override drops STANDARD from 10,000 to 60 req/min in local dev. Developers running hot-reload tests or UI workflows will now hit production limits backed only by in-memory storage. Consider keeping separate dev limits or documenting the change prominently.

runtime-factory.ts default-import cast hides the type (low)

The default import is immediately cast to a manually-written signature. If the upstream @elizaos/plugin-sql signature changes, this cast will silently pass wrong types. Worth a comment explaining the workaround and a TODO to remove it when upstream fixes the type export.

Definite assignment in test teardown (low)

In x402-topup.test.ts, afterAll accesses referralsServiceForTest etc. with definite-assignment assertions and no null guards. If beforeAll throws before all three are assigned, afterAll will also throw and may swallow the original failure. The previous null-guard pattern was safer.

Positives:

parseAnthropicCotBudgetFromEnv is thorough: strict digit regex, safe-integer check, and solid edge-case test coverage.
Extracting model strings to named constants before passing to mergeProviderOptions improves readability.
Discord sendTyping guard is correct - DMChannel/PartialGroupDMChannel lack that method.
Excluding Storybook stories from check-types-split is a good cleanup.
Switching to toMatchObject in server-wallets tests is the correct fix for partial-shape assertions.

Rename milady billing route test to z-milady-billing-route for script ordering, use queue-backed db mocks re-registered in beforeEach, and add mockMiladyPricingMinimumDepositForRouteTests so route tests do not replace MILADY_PRICING with only MINIMUM_DEPOSIT (which broke billing thresholds in-process). Update package.json bulk/special unit paths accordingly. Made-with: Cursor

Extend anthropic-thinking handling, add cloud-provider-options, update provider types and index exports, and thread safe model params through v1 chat, completions, messages, responses, generate-image, MCP/A2A, and internal services. Expand anthropic-thinking unit tests. Made-with: Cursor

Add engineering docs (unit-testing-milady-mocks, anthropic-cot-budget), root CHANGELOG.md, README links and accurate test script table, ROADMAP and site changelog entries. Expand JSDoc and .env.example with rationale for ANTHROPIC_COT_BUDGET and partial MILADY_PRICING mock pitfalls. Made-with: Cursor

claude · 2026-03-27T08:07:02Z

Code Review - PR 417

This PR is a broad type-fix/quality pass with one new functional feature (Anthropic extended thinking via ANTHROPIC_COT_BUDGET). The changes are generally clean. A few things worth discussing:

ISSUES

Rate limiting: dev/prod unification is a breaking DX change (medium)

File: packages/lib/middleware/rate-limit.ts

Removing the isDevelopment branch means local dev now hits the same strict limits as production (60/min STANDARD, 10/min STRICT, 5 per 5-min CRITICAL) backed only by in-memory storage. Developers iterating locally will now hit 429s immediately. The old 10000 req/window dev ceiling existed precisely to avoid this.

The log message change says "same numeric limits as production" explicitly. Was this intentional? If so, it should be called out in the PR description. If not, the isDevelopment guard should be restored.

Definite assignment (!) removes null-safety in teardown (low)

File: packages/tests/integration/x402-topup.test.ts

Using ! on the three service vars silences TS but means if beforeAll throws partway through, afterAll will crash accessing properties on uninitialized vars. The original null-guards were load-bearing. Consider keeping them or initializing with no-op stubs.

mergeProviderOptions shallow merge - worth a comment (low)

File: packages/lib/providers/anthropic-thinking.ts

The merge is one level deep for anthropic/google/gateway. For the thinking key specifically: if base has { thinking: { type: "disabled" } } and COT helper adds { thinking: { type: "enabled", budgetTokens: N } }, the second replaces the first entirely. This is the desired behavior, but a short comment would prevent future contributors from trying to fix it.

Google responseModalities leaks into non-Google paths (nit)

File: app/api/v1/generate-image/route.ts

The else branch in isOpenAIModel ? mergeAnthropicCot : mergeGoogleImage covers everything that is not OpenAI. Future Anthropic image models would fall into the Google path and pick up responseModalities unnecessarily. Low risk now, worth noting when new providers are added.

Runtime factory type cast is fragile (nit)

File: packages/lib/eliza/runtime-factory.ts

Casting the default import manually bypasses TS module resolution. A comment explaining the upstream typing gap would help future audits.

POSITIVES

sendTyping guard: "sendTyping" in channel is the right type-narrowing approach.
UUID fixture: changing "test-agent-simple-001" to a valid UUID format prevents silent failures.
APIRequestContext in Playwright fixture: removing the ReturnType gymnastics is cleaner.
preconnect binding on fetchWithServer: correctly preserving fetch.preconnect.
Unit tests for anthropic-thinking.ts: good boundary coverage (0, invalid, whitespace, non-anthropic model, merge collision).
Storybook exclusion from check-types-split: correct - stories use Storybook types not in the main tsconfig.
isChatInputCommand() guard: proper discord.js type guard instead of duck-typing is a correctness improvement.

Overall this is a solid type cleanup pass. The rate-limit change is the most impactful behavioral difference and should be confirmed as intentional before merge.

claude · 2026-03-27T08:09:17Z

PR Review: fix(types): test project, split check-types, UI package, gateways

Overall this is well-structured work with clear documentation of design decisions. A few items worth addressing before merge.

Bugs / Correctness

x402-topup.test.ts: afterAll null safety regression - The refactor replaced null-guarded service patches with definite assignment (!) but the afterAll cleanup now runs unconditionally. If beforeAll throws before the service variables are assigned, afterAll will throw on property access, masking the root failure and potentially breaking test teardown for other suites. Restore optional chaining or null guards in afterAll even if the variable declarations no longer need the union type.

Behaviour Change Worth Calling Out

Rate limit presets: dev-mode unlimited -> production limits for local development - packages/lib/middleware/rate-limit.ts removes the isDevelopment pattern and always uses production-level limits (STANDARD: 60/min, STRICT: 10/min, etc.). The only difference now is in-memory vs Redis backing. This will hit local developers on standard routes once they exceed 60 req/min. If intentional, worth a note in dev setup docs.

Minor Issues

mergeProviderOptions silently drops unknown provider keys - The merge explicitly deep-merges gateway, anthropic, and google, but the initial spread means any other provider key present in both inputs will be shallow-overwritten by the second. Worth a comment so the next person adding a provider (mistral, cohere) knows to add it to the explicit merge block.
runtime-factory.ts: default import workaround is not grouped - The import declaration and the const re-cast are separated by an unrelated import. Grouping them or adding a comment would help future readers understand why the named binding does not work here.
ANTHROPIC_COT_BUDGET parse throws at request time, not startup - parseAnthropicCotBudgetFromEnv throws for invalid values and is called per-request, so a misconfigured env var produces 500s rather than a startup failure. Consider a module-level one-time parse or document that env-validator must run at startup.

What is Good

mockMiladyPricingMinimumDepositForRouteTests cleanly solves the Bun mock pollution problem; docs/unit-testing-milady-mocks.md is a valuable future reference.
Deep-merge helpers correctly preserve existing gateway.order and google.responseModalities when adding anthropic.thinking - the test suite covers the key combinations well.
Discord gateway: sendTyping guard and isChatInputCommand() check are correct defensive fixes.
Playwright APIRequestContext type fix removes the as-any cast cleanly.
fetch.preconnect binding on test fetch wrappers is a correct TypeScript conformance fix.
check-types:tests CI step closes a real coverage gap.

Summary: The null-safety regression in x402-topup.test.ts afterAll is the only real bug. The rate-limit behaviour change is worth explicit acknowledgement. Everything else is minor or informational.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Spread order lets options override deploy-wide thinking config
- Swapped the spread order so mergeAnthropicCotProviderOptions comes after options, ensuring deploy-wide ANTHROPIC_COT_BUDGET cannot be overridden by caller-supplied options.

Or push these changes by commenting:

@cursor push 99924fb2eb

Preview (99924fb2eb)

diff --git a/packages/lib/api/a2a/skills.ts b/packages/lib/api/a2a/skills.ts
--- a/packages/lib/api/a2a/skills.ts
+++ b/packages/lib/api/a2a/skills.ts
@@ -91,8 +91,8 @@
         role: m.role as "user" | "assistant" | "system",
         content: m.content,
       })),
+      ...options,
       ...mergeAnthropicCotProviderOptions(model),
-      ...options,
     });
 
     let fullText = "";

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

packages/lib/api/a2a/skills.ts

Read user_characters.settings.anthropicThinkingBudgetTokens for agent chat; add ANTHROPIC_COT_BUDGET_MAX cap and resolution helpers. Wire MCP and A2A routes; extend tests, env-validator, and .env.example. Update CHANGELOG, site changelog, README, ROADMAP, anthropic-cot-budget doc, and route/module comments with WHYs (owner-controlled settings vs untrusted request params). Made-with: Cursor

…ridden Move mergeAnthropicCotProviderOptions spread after options spread in executeSkillChatCompletion to ensure the deploy-wide ANTHROPIC_COT_BUDGET configuration takes precedence over caller-supplied options. This aligns with all other call sites in the codebase which consistently place mergeAnthropicCotProviderOptions last to enforce operator-controlled thinking budgets.

odilitime · 2026-03-27T08:15:00Z

@cursor push 99924fb

…ridden Move mergeAnthropicCotProviderOptions spread after options spread in executeSkillChatCompletion to ensure the deploy-wide ANTHROPIC_COT_BUDGET configuration takes precedence over caller-supplied options. This aligns with all other call sites in the codebase which consistently place mergeAnthropicCotProviderOptions last to enforce operator-controlled thinking budgets. Applied via @cursor push command

Copilot

Pull request overview

Aligns repo typechecking and test typings across the workspace, introduces deploy-scoped Anthropic extended-thinking configuration helpers, and tightens/clarifies several gateway + test mocks/casts to reduce order-dependent failures.

Changes:

Split/expand typecheck coverage (tests project typecheck in CI, new per-package typecheck scripts, updated tsconfigs/paths).
Add Anthropic COT budget env + providerOptions merge helpers and thread them through gateway-backed routes/services.
Stabilize/retarget numerous unit/integration tests (mocks, env handling, casting, and fixture shapes), including Milady pricing mock helper + renamed cron test.

Reviewed changes

Copilot reviewed 100 out of 101 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tsconfig.test.json	Adds UI package path aliases and refines include/exclude for test typecheck.
services/agent-server/src/routes.ts	Broadens `set.status` typing for Elysia compatibility.
services/agent-server/package.json	Adds `typecheck` script.
packages/ui/tsconfig.json	Adds cross-workspace path aliases and excludes Storybook stories from UI typecheck.
packages/ui/src/styled-jsx.d.ts	Augments React style attributes for `styled-jsx`.
packages/tests/unit/z-milady-billing-route.test.ts	Reworks Milady billing cron test mocking to be order-stable across the suite.
packages/tests/unit/x402/facilitator-service.test.ts	Adjusts casts for stricter typing.
packages/tests/unit/wallet-auth.test.ts	Fixes mock typing/resolved values.
packages/tests/unit/waifu-bridge.test.ts	Uses a typed mutable env view for `NODE_ENV` writes.
packages/tests/unit/v1-milaidy-provision-route.test.ts	Uses shared Milady pricing mock helper.
packages/tests/unit/token-agent-linkage.test.ts	Refactors validation test into scenario loop with strict checks.
packages/tests/unit/security-validations.test.ts	Avoids `{}` prototype typing issues in pollution assertions.
packages/tests/unit/referrals-service.test.ts	Uses `toMatchObject` to reduce brittle equality.
packages/tests/unit/provisioning-jobs.test.ts	Fixes Vitest mock restore typing and fetch assignment typing.
packages/tests/unit/provisioning-jobs-followups.test.ts	Uses `toMatchObject` for partial comparisons.
packages/tests/unit/privy-sync.test.ts	Uses `toMatchObject` for hydrated user comparisons.
packages/tests/unit/pr385-round5-fixes.test.ts	Aligns fake service signature with optional id param.
packages/tests/unit/performance-optimizations.test.ts	Adds typed helpers to avoid repeated unsafe access to provider snapshots.
packages/tests/unit/oauth/token-cache.test.ts	Rewrites date normalization branch for clearer typing.
packages/tests/unit/oauth/provider-registry.test.ts	Adjusts optional chaining/non-null assertions in provider routes.
packages/tests/unit/oauth/oauth-service.test.ts	Tightens typings for provider routes/platform loops/status handling.
packages/tests/unit/oauth/errors.test.ts	Stabilizes enum value typing for expectations.
packages/tests/unit/milaidy-sandbox-bridge-security.test.ts	Fixes fetch typing and backup fixture shape updates.
packages/tests/unit/milaidy-pairing-token-route.test.ts	Stabilizes env setup/teardown for base domain; removes module mock.
packages/tests/unit/milaidy-agent-routes-followups.test.ts	Uses shared Milady pricing mock helper.
packages/tests/unit/milady-sandbox-service.test.ts	Introduces typed sandbox provider helper and updates sandbox fixture fields.
packages/tests/unit/milady-create-routes.test.ts	Uses shared Milady pricing mock helper.
packages/tests/unit/mcp-twitter-tools.test.ts	Adds typed OAuth fixtures + handler result typing; normalizes token result shape.
packages/tests/unit/mcp-proxy-affiliate-pricing.test.ts	Fixes fetch typing cast.
packages/tests/unit/mcp-lib.test.ts	Tightens auth context typing and assertions.
packages/tests/unit/mcp-hubspot-tools.test.ts	Adds typed OAuth fixtures/token helper and fixes fetch typing casts.
packages/tests/unit/mcp-google-tools.test.ts	Adds typed OAuth fixtures/handler result typing; fixes fetch typing casts.
packages/tests/unit/internal-jwt-auth.test.ts	Uses typed env writes for `NODE_ENV`.
packages/tests/unit/field-encryption.test.ts	Adds typed rows and mock signatures to match repository interfaces.
packages/tests/unit/evm-rpc-proxy-route.test.ts	Fixes fetch typing casts.
packages/tests/unit/engagement-metrics/compute-metrics-cron.test.ts	Aligns mock signatures with implementation params.
packages/tests/unit/engagement-metrics/admin-metrics-api.test.ts	Aligns mock signature with implementation params.
packages/tests/unit/eliza-app/whatsapp-auth.test.ts	Casts string literals for stricter typing compatibility.
packages/tests/unit/eliza-app/telegram-ux-helpers.test.ts	Fixes fetch typing cast.
packages/tests/unit/docker-ssh-cloud-deploy.test.ts	Adjusts logger redact import strategy.
packages/tests/unit/database-url.test.ts	Uses typed env record for safe env mutation + asserts.
packages/tests/unit/compat-envelope.test.ts	Updates Milady sandbox fixture with new billing fields.
packages/tests/unit/api/route-test-helpers.ts	Improves `routeParams` typing and fixes `File` BlobPart typing.
packages/tests/unit/api/openapi-catalog.test.ts	Refactors test structure and simplifies OPTIONS invocation.
packages/tests/unit/anthropic-thinking.test.ts	Adds unit tests for new Anthropic thinking env + merge helpers.
packages/tests/unit/admin-service-pricing-route.test.ts	Tightens auth mock typing and request construction.
packages/tests/runtime/performance.test.ts	Fixes UUID typing casts for agentId.
packages/tests/runtime/mcp-assistant-trending.test.ts	Updates debug rendering mode and failure handling structure.
packages/tests/runtime/integration/runtime-factory/oauth-cache-invalidation.test.ts	Removes `any` for MCP settings lookup with typed cast.
packages/tests/runtime/integration/performance/runtime-creation.test.ts	Fixes UUID typing casts for agentId.
packages/tests/runtime/integration/message-handler/mcp-tools.test.ts	Updates debug rendering mode and failure handling structure.
packages/tests/playwright/fixtures/auth.fixture.ts	Uses `APIRequestContext` instead of inferred request type and removes `any` call.
packages/tests/load-env.ts	Uses typed env write for `NODE_ENV`.
packages/tests/integration/x402-topup.test.ts	Refactors service patch typing and restores methods after suite.
packages/tests/integration/webhooks-e2e.test.ts	Wraps fetch to auto-sign webhooks and preserves `fetch.preconnect`.
packages/tests/integration/unified-oauth-api.test.ts	Tightens `it` wrapper signature.
packages/tests/integration/services/users.service.test.ts	Adds fixture guard and tightens nullable handling.
packages/tests/integration/services/users-join-regression.test.ts	Adds runtime guard for relation shape.
packages/tests/integration/services/organizations.service.test.ts	Simplifies membership assertion.
packages/tests/integration/server-wallets.test.ts	Updates mocked returning rows and loosens assertion to `toMatchObject`.
packages/tests/integration/mcp-registry.test.ts	Tightens test wrapper signature.
packages/tests/integration/financial/credits-budget-flow.test.ts	Removes `source` field from addCredits call in test.
packages/tests/integration/financial/concurrent-operations.test.ts	Removes `source` field from addCredits call in test.
packages/tests/helpers/mock-milady-pricing-for-route-tests.ts	Adds helper to safely override Milady MINIMUM_DEPOSIT without clobbering constants.
packages/tests/helpers/index.ts	Re-exports the new Milady pricing mock helper.
packages/tests/fixtures/mcp-test-character.ts	Updates `simpleTestCharacter` id to UUID-like value.
packages/tests/e2e/setup-server.ts	Wraps fetch to ensure server availability and preserves `fetch.preconnect`.
packages/services/gateway-discord/src/gateway-manager.ts	Tightens discord.js handler typing, adds characterId routing, guards sendTyping, and listener map casts.
packages/scripts/check-types-split.ts	Updates typecheck chunk selection and excludes stories from temp configs.
packages/lib/services/twitter-automation/app-automation.ts	Threads new Anthropic COT merge helper into gateway generateText calls.
packages/lib/services/seo.ts	Threads new Anthropic COT merge helper into SEO Claude calls.
packages/lib/services/app-promotion.ts	Threads new Anthropic COT merge helper into promo copy generation.
packages/lib/services/app-promotion-assets.ts	Uses merge helpers for google image modalities + Anthropic COT for copy generation.
packages/lib/services/app-builder-ai-sdk.ts	Threads new Anthropic COT merge helper into streamText calls.
packages/lib/providers/types.ts	Broadens `providerOptions` typing to shared JSON-serializable merged options.
packages/lib/providers/index.ts	Exports new anthropic-thinking helpers.
packages/lib/providers/cloud-provider-options.ts	Adds shared merged providerOptions type alias.
packages/lib/providers/anthropic-thinking.ts	Adds env parsing + merge helpers for Anthropic extended thinking.
packages/lib/middleware/rate-limit.ts	Makes preset numeric limits consistent across dev/prod; clarifies logging/docs.
packages/lib/eliza/runtime-factory.ts	Changes plugin-sql adapter import to default + casted factory signature.
packages/lib/config/env-validator.ts	Adds validation rule for `ANTHROPIC_COT_BUDGET`.
packages/lib/auth/jwks.ts	Updates jose KeyLike typings for cached keys.
packages/lib/api/a2a/skills.ts	Threads merge helpers into skill chat + image generation paths.
packages/content/changelog.mdx	Adds Mar 27, 2026 entry for Anthropic COT + Milady testing fixes.
package.json	Adds per-package typecheck scripts and updates unit test split scripts for renamed file.
docs/unit-testing-milady-mocks.md	Documents Bun `mock.module` pitfalls and Milady pricing mock strategy.
docs/anthropic-cot-budget.md	Documents `ANTHROPIC_COT_BUDGET` behavior and providerOptions merge rationale.
docs/ROADMAP.md	Records completed items for COT budget + Milady test stabilization.
app/api/v1/responses/route.ts	Uses gateway+Anthropic merge helper for providerOptions injection.
app/api/v1/messages/route.ts	Threads merge helper into non-stream and stream AI SDK calls.
app/api/v1/generate-image/route.ts	Uses merge helpers for google image modalities and optional Anthropic COT.
app/api/v1/chat/route.ts	Threads merge helper into chat streaming call.
app/api/v1/chat/completions/route.ts	Threads merge helper into streaming and non-streaming calls.
app/api/v1/admin/service-pricing/tests/route.integration.test.ts	Standardizes NextRequest construction via URL base.
app/api/mcp/tools/generation.ts	Threads merge helpers into MCP generation (text + image) tools.
app/api/agents/[id]/mcp/route.ts	Threads merge helper into tool-call streaming.
app/api/agents/[id]/a2a/route.ts	Threads merge helper into A2A streaming.
README.md	Adds links to new “WHY” engineering docs and clarifies `bun run test` staging.
CHANGELOG.md	Adds Keep-a-Changelog style engineering changelog entries for these changes.
.github/workflows/tests.yml	Adds `check-types:tests` to CI.
.env.example	Documents optional `ANTHROPIC_COT_BUDGET` env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T08:15:54Z

packages/lib/eliza/runtime-factory.ts

+const createDatabaseAdapter = createDatabaseAdapterDefault as (
+  config: { postgresUrl: string },
+  agentId: UUID,
+) => IDatabaseAdapter;


@elizaos/plugin-sql/node is being imported as a default export and then cast to a callable adapter factory. This bypasses type safety and can hide a runtime mismatch if the default export isn’t actually callable (or if its signature changes). Consider updating the local declare module "@elizaos/plugin-sql/node" typings to correctly model the default export (e.g., a callable function with any attached properties like schema) and then import it with the accurate type, rather than casting unknown here.

Suggested change

const createDatabaseAdapter = createDatabaseAdapterDefault as (

config: { postgresUrl: string },

agentId: UUID,

) => IDatabaseAdapter;

type CreateDatabaseAdapterFn = (

config: { postgresUrl: string },

agentId: UUID,

) => IDatabaseAdapter;

function ensureCreateDatabaseAdapter(fn: unknown): CreateDatabaseAdapterFn {

if (typeof fn !== "function") {

throw new TypeError(

'Default export from "@elizaos/plugin-sql/node" is not a callable database adapter factory',

);

}

return fn as CreateDatabaseAdapterFn;

}

const createDatabaseAdapter = ensureCreateDatabaseAdapter(

createDatabaseAdapterDefault,

);

Dismissed: Comment targets line 21 but identifier schema not found in file — code may have been removed or renamed

Fixed in 6455e14.

Dismissed: Line 36-45 now have type checking and casting with validation before using the default export

Copilot · 2026-03-27T08:15:54Z

packages/lib/config/env-validator.ts

+  ANTHROPIC_COT_BUDGET: {
+    required: false,
+    description:
+      "Default Anthropic extended-thinking token budget when a character omits settings.anthropicThinkingBudgetTokens. Unset or 0 disables unless the character sets a positive budget",
+    validate: (value: string) => {
+      const trimmed = value.trim();
+      if (trimmed === "") {
+        return false;
+      }
+      if (!/^\d+$/.test(trimmed)) {
+        return false;
+      }
+      const n = Number.parseInt(trimmed, 10);
+      return n >= 0 && n <= Number.MAX_SAFE_INTEGER;
+    },
+    errorMessage:
+      "Must be a non-negative integer string (0 = off; when enabling, Anthropic often expects at least ~1024)",
+  },


ANTHROPIC_COT_BUDGET parsing in anthropic-thinking.ts throws on invalid non-empty values, but the env validator treats invalid optional values as warnings (startup continues). That means a mis-set budget can turn into request-time exceptions across any route that spreads mergeAnthropicCotProviderOptions(...). Either make invalid values for this variable a hard validation error (fail fast at startup), or change the parsing/merge path to degrade safely (e.g., log once and treat as "off").

Fixed in fe10e6b.

Fixed in 18bfb11.

No change in this file this run; manual review if still needed.

Copilot · 2026-03-27T08:15:55Z

packages/tests/unit/docker-ssh-cloud-deploy.test.ts

 // Importing via the file-system path with a cache-buster query param
 // guarantees we always get the real implementation regardless of mocks.
-import { redact } from "../../lib/utils/logger.ts?_real";
+import { redact } from "../../lib/utils/logger";


The comment above says this import uses a cache-buster query param to avoid mock.module("@/lib/utils/logger") poisoning, but the import no longer includes any cache-buster. If other unit files mock the logger module earlier in the same Bun process, this test may again receive the mocked module and lose the real redact export. Either restore the cache-buster (or another unmocked import strategy) or update the comment and ensure redact is always available in the mocked logger module(s).

Fixed in fe10e6b.

Dismissed: Line 16 still includes the cache-buster query param ?v=docker-ssh-test as required

odilitime · 2026-03-27T08:16:48Z

@coderabbitai review

coderabbitai · 2026-03-27T08:16:53Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Iteration 1 prr-fix:prrc_kwdop6btdm6yyjtn

Iteration 1 prr-fix:prrc_kwdop6btdm6yyjt7

Changes: - rate-limit.ts: PR Review: fix(types): test project, split check-types, UI package, gateways ...

odilitime · 2026-03-27T08:25:33Z

@coderabbitai review

coderabbitai · 2026-03-27T08:25:38Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

claude · 2026-03-27T08:27:29Z

Code Review — PR #417: Anthropic CoT Controls

Overall the approach is sound: centralizing thinking-budget resolution in a single helper, keeping budgets off request bodies, and wiring env defaults through a type-safe merge layer. A few issues worth addressing before merge.

🐛 Potential Bugs

1. Temperature conflict in seo.ts and app-promotion.ts

Anthropic's API rejects requests with temperature ≠ 1 when extended thinking is enabled. The PR comment in chat/completions/route.ts says "@ai-sdk/anthropic strips temp/topP/topK when thinking is on", but seo.ts and app-promotion.ts use generateText (not streamText), and the two routes use hardcoded non-1 temperatures:

// seo.ts
const { text } = await generateText({
  model: gateway.languageModel(modelId),
  temperature: 0.3,           // ← will cause API error when ANTHROPIC_COT_BUDGET is set
  ...mergeAnthropicCotProviderOptions(modelId),
  ...

// app-promotion.ts
await generateText({
  model: gateway.languageModel(promoModel),
  temperature: 0.7,           // ← same risk
  ...mergeAnthropicCotProviderOptions(promoModel),

If the AI SDK Gateway path does not strip temperature for generateText the way the comment implies, this will surface as a 400 from the Anthropic API when ANTHROPIC_COT_BUDGET is non-zero. Recommend either (a) explicitly documenting the SDK version that guarantees stripping or (b) using temperature: 1 at these call sites when thinking is active, or (c) a comment noting the SDK guarantees and the SDK version.

2. mergeGoogleImageModalitiesWithAnthropicCot now applied to ALL non-OpenAI image models

In generate-image/route.ts:

const cotOpts = isOpenAIModel
  ? mergeAnthropicCotProviderOptions(imageModel)    // returns {} for OpenAI — no-op
  : mergeGoogleImageModalitiesWithAnthropicCot(imageModel);  // applies google.responseModalities to ANY non-OpenAI model

Before this PR, google.responseModalities was only set in the two Google-specific streamConfig paths. After this change it's applied to every non-OpenAI model at the call site. If the image route ever serves an Anthropic or other non-Google image model, it will receive an unexpected google provider option. The AI SDK likely ignores it, but it changes observable behavior. The isOpenAIModel ? ... : ... branch would be more precise as isGoogleModel ? mergeGoogleImageModalitiesWithAnthropicCot : mergeAnthropicCotProviderOptions.

⚠️ Breaking Change for Developers

3. Rate-limit preset change removes dev relaxation

// Before
maxRequests: isDevelopment ? 10000 : 60,

// After
maxRequests: 60,

Local developers and CI integration tests that exercise HTTP API routes will now hit production-level rate limits (60 req/min STANDARD). Unit tests that spin up a real server or use fetch against local routes can fail intermittently. This is documented in the preset comments, but it's a behavioral change that will surprise contributors who run bun run test:integration locally. Worth calling out explicitly in the PR description or flagging in CONTRIBUTING.md.

🔍 Code Quality

4. mergeProviderOptions only deep-merges 3 known provider keys

if (a?.gateway && b?.gateway) { out.gateway = { ...a.gateway, ...b.gateway }; }
if (a?.anthropic && b?.anthropic) { out.anthropic = { ...a.anthropic, ...b.anthropic }; }
if (a?.google && b?.google) { out.google = { ...a.google, ...b.google }; }

Any other provider-specific keys (openai, vertex, bedrock, etc.) fall through to the { ...a, ...b } shallow spread, where b's top-level key silently overwrites a's. This is fine for current usage but worth a comment noting the limitation so future callers don't get surprised when adding new provider options.

5. parsePositiveIntStrict accepts and returns 0

The function is named "strict positive" but returns 0 (since /^\d+$/ matches "0" and parseInt("0") = 0). Callers then do if (n <= 0) return null after the call. The naming and intent are slightly mismatched — a comment clarifying that "positive" here means "non-negative" would reduce confusion.

6. parseThinkingBudgetFromCharacterSettings silently truncates floats

const n = Math.trunc(raw);  // 1.9 → 1, 0.5 → 0

A stored value of 0.5 becomes 0 (thinking off), which may surprise an agent owner who intended a positive budget. Since the docs and interface say "integer ≥ 0", consider rejecting non-integers explicitly (!Number.isInteger(raw)) rather than silently truncating, or documenting that truncation occurs.

✅ Strengths

The single resolution path in resolveAnthropicThinkingBudgetTokens is clean and easy to audit.
ANTHROPIC_COT_BUDGET_MAX as an operator ceiling over per-agent settings is a good safety valve.
failOnInvalid: true in env-validator.ts ensures bad config is caught at startup rather than at request time.
Test coverage for parse functions, merge helpers, and edge cases (0 disables, max clamps) is thorough.
Deep-merging providerOptions rather than shallow-spreading is the correct approach.
The CHANGELOG.md and docs/anthropic-cot-budget.md docs are well-written and explain the design rationale clearly.

Iteration 4 prr-fix:ic-4180968930-2

app/api/v1/chat/route.ts

claude · 2026-04-02T23:38:29Z

Code Review: feat: Anthropic CoT controls

Overall this is well-structured work with clear security reasoning (budgets must not come from untrusted callers). The core library in packages/lib/providers/anthropic-thinking.ts is solid. A few issues need attention before merge.

Bug: `anthropic-thinking.test.ts` imports a non-existent module

The test file added at the repository root imports from ./src/lib/anthropic-thinking:

import {
  validateBudgetTokens,
  getThinkingConfig,
  buildThinkingParam,
  supportsExtendedThinking,
  ...
} from "./src/lib/anthropic-thinking";

That path does not exist — the actual implementation is at packages/lib/providers/anthropic-thinking.ts and exports different functions (resolveAnthropicThinkingBudgetTokens, mergeAnthropicCotProviderOptions, etc.). The test imports validateBudgetTokens, getThinkingConfig, buildThinkingParam, ThinkingConfig, and CharacterThinkingSettings which are not exported from the real module.

This test will fail at import time. It appears to be a stale artifact from a different design iteration that was never reconciled with the final implementation. It should either be deleted or rewritten to test the actual exports.

Credit reservation inconsistency between MCP and A2A routes

In app/api/agents/[id]/a2a/route.ts the credit estimate uses 500 + effectiveThinkingBudget as base:

// "Add thinking budget to base output estimate (500 tokens)"
const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;
const baseCost = await estimateRequestCost(model, fullMessages, maxOutputTokens);

But in app/api/agents/[id]/mcp/route.ts the same estimate uses DEFAULT_MIN_OUTPUT_TOKENS (4096):

const baseOutputTokens = DEFAULT_MIN_OUTPUT_TOKENS; // 4096
const estimatedOutputTokens = effectiveThinkingBudget != null
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

The 500-token base in A2A significantly underestimates output — and potentially underbills — for non-thinking responses, relative to MCP. The DEFAULT_BASE_OUTPUT_TOKENS = 4096 exported from app/api/a2a/route.ts even has a comment saying it should be imported here, but it isn't used.

`DEFAULT_BASE_OUTPUT_TOKENS` export is misleading

app/api/a2a/route.ts exports:

export const DEFAULT_BASE_OUTPUT_TOKENS = 4096;
// Note: This constant should be imported by @/lib/api/a2a handlers for credit reservation calculations.

But agents/[id]/a2a/route.ts never imports it, and uses 500 instead. The comment is aspirational but the constant is never referenced anywhere. Either use it or remove the export and comment.

Rate limit change is a silent breaking change for developers

The old behavior gave dev environments 10000 req/window (effectively unlimited). The new behavior applies production-level limits by default in dev, requiring explicit opt-in via RATE_LIMIT_MULTIPLIER. This is not mentioned in .env.example or the PR description prominently enough — devs running integration tests locally will start getting 429s with no obvious explanation.

Recommend adding RATE_LIMIT_MULTIPLIER=100 to .env.example with a comment, or logging a more actionable warning when in dev mode with the default multiplier (1).

Minor: redundant `cotBudget ?? undefined` in `chat/route.ts`

...mergeAnthropicCotProviderOptions(selectedModel, process.env, cotBudget ?? undefined),

cotBudget is already number | null. null ?? undefined evaluates to undefined, which is the same as passing undefined directly. resolveAnthropicThinkingBudgetTokens already handles undefined as "fall back to env". This double-call pattern works but is confusing — you resolve the budget, then pass it back only to re-resolve it internally. Consider passing the already-resolved budget directly to a lower-level function, or just pass undefined without the null check.

`supportsExtendedThinking` test gap

The existing test covers claude-3-haiku and claude-3-opus, but does not test claude-3-7-sonnet which is one of the supported models in EXTENDED_THINKING_MODEL_PATTERNS. Given this function gates whether thinking tokens are billed, a missing pattern would silently skip CoT and cause subtle underbilling.

Positive notes

The security rationale (budgets from stored character settings, not request bodies) is correct and well-documented.
parseThinkingBudgetFromCharacterSettings handles invalid types gracefully with fallback to undefined.
Env validation with failOnInvalid: true provides early detection of misconfiguration.
mergeProviderOptions correctly deep-merges nested keys to avoid clobbering gateway/google/anthropic options.
Rate limit Object.freeze prevents accidental mutation of security-critical thresholds at runtime.
RATE_LIMIT_MULTIPLIER being ignored in production is a good safety guard.

claude · 2026-04-02T23:39:41Z

PR Review: feat: Anthropic CoT controls

Good overall approach — putting the thinking budget on the character record rather than in API request bodies is the right call for trust boundary reasons, and the resolver hierarchy (per-agent → env default → env max cap) is clean. Unit test coverage in packages/tests/unit/anthropic-thinking.test.ts is solid. A few issues to address:

Bug (High) — A2A route missing maxOutputTokens when CoT is active

File: app/api/agents/[id]/a2a/route.ts (~line 341)

The MCP route correctly sets maxOutputTokens before calling streamText (using Math.max(DEFAULT_MIN_OUTPUT_TOKENS, effectiveThinkingBudget)), but the A2A handleChat only spreads mergeAnthropicCotProviderOptions without setting maxOutputTokens. Anthropic API rejects requests where max_tokens < budget_tokens when thinking is enabled — this will produce API errors for any A2A agent with a thinking-enabled model/budget.

The A2A route also uses only 500 tokens as the base for credit reservation (500 + effectiveThinkingBudget) while MCP uses 4096, causing systematic underbilling on A2A.

Bug (Medium) — computeEffectiveMaxTokens leaves no room for response tokens

File: app/api/v1/chat/completions/route.ts (~line 43)

When requestMaxTokens is undefined and CoT is active, the function returns Math.max(0, cotBudget) = exactly cotBudget. Since Anthropic counts thinking tokens against max_tokens, with e.g. cotBudget=8192 and no client-supplied max_tokens, the model has zero tokens left for actual response content. Should add a minimum output buffer (e.g. Math.max(requestMaxTokens ?? 0, cotBudget + 1000)).

Stale test file at repo root

File: anthropic-thinking.test.ts (new file at root)

This file imports from ./src/lib/anthropic-thinking and tests functions (validateBudgetTokens, getThinkingConfig, buildThinkingParam) that do not exist in the actual implementation at packages/lib/providers/anthropic-thinking.ts. The real tests are correctly at packages/tests/unit/anthropic-thinking.test.ts. This root-level file looks like an earlier draft — it will fail at import resolution and should be deleted.

Dead exported constant

File: app/api/a2a/route.ts (~line 32)

DEFAULT_BASE_OUTPUT_TOKENS = 4096 is exported with a comment saying it "should be imported by @/lib/api/a2a handlers" but nothing imports it. The A2A agent route uses an inline 500 literal instead. Wire it up or remove it.

Minor — RATE_LIMIT_DISABLED semantic change is undocumented

File: packages/lib/middleware/rate-limit.ts (~line 44)

RATE_LIMIT_DISABLED=true now only skips startup validation warning — actual rate limiting is still enforced. The variable name implies it disables limits. A comment in .env.example would prevent operator confusion.

Positive notes

Resolution logic in resolveAnthropicThinkingBudgetTokens is clean and well-tested.
Explicitly passing budget=0 to disable CoT on temperature-sensitive routes (SEO, promotion, Twitter, App Builder) is the right defensive pattern.
mergeProviderOptions doing deep-merge on gateway/anthropic/google keys avoids silently clobbering existing provider config.
Object.freeze on computed rate limit presets is good runtime defense-in-depth.
mockMiladyPricingMinimumDepositForRouteTests is the right fix for partial mock contamination.

Iteration 1 prr-fix:ic-4180483329-5

Iteration 1 prr-fix:prrc_kwdop6btdm60pqio

claude · 2026-04-02T23:48:53Z

PR Review findings - see below

claude · 2026-04-02T23:49:05Z

PR Review: feat: Anthropic CoT controls

Overall this is well-structured work. Good separation of concerns in anthropic-thinking.ts, correct security rationale (budgets live on character records, not request bodies), and thorough documentation. A few concrete issues to address before merging.

BUGS

/messages route missing response-buffer in maxOutputTokens

In both handleNonStream and handleStream (app/api/v1/messages/route.ts):

const effectiveMaxTokens = cotBudget ? Math.max(request.max_tokens ?? 0, cotBudget) : request.max_tokens;

This only ensures max_tokens >= budgetTokens. Anthropic requires room for the actual response ON TOP of the thinking budget (thinking tokens count against max_tokens). Every other route in this PR adds a buffer:

/chat: Math.max(DEFAULT_MIN_OUTPUT_TOKENS, cotBudget + DEFAULT_MIN_OUTPUT_TOKENS)
/chat/completions: Math.max(requestMaxTokens ?? MIN_RESPONSE_TOKENS, cotBudget + MIN_RESPONSE_TOKENS)
MCP: Math.max(DEFAULT_MIN_OUTPUT_TOKENS, effectiveThinkingBudget) + DEFAULT_MIN_OUTPUT_TOKENS

/messages will produce API rejections when ANTHROPIC_COT_BUDGET is set and request.max_tokens is close to cotBudget.

Test file at root tests a non-existent API

anthropic-thinking.test.ts (repo root) imports from ./src/lib/anthropic-thinking and tests validateBudgetTokens, getThinkingConfig, buildThinkingParam - none of which exist in packages/lib/providers/anthropic-thinking.ts. The actual exports are resolveAnthropicThinkingBudgetTokens, parseAnthropicCotBudgetFromEnv, etc.

This test either targets a file not in this PR, or was written against an earlier API. Either way it does not cover the shipped code. The docs reference packages/tests/unit/anthropic-thinking.test.ts but that path is not in the diff.

A2A credit reservation uses 500-token response buffer

// app/api/agents/[id]/a2a/route.ts
const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;

MCP uses DEFAULT_MIN_OUTPUT_TOKENS (4096) as its buffer; A2A uses 500. DEFAULT_BASE_OUTPUT_TOKENS = 4096 is exported from app/api/a2a/route.ts with a comment saying handlers should import it, but handleChat in the same file uses hardcoded 500. A2A thinking requests will be systematically underreserved vs MCP.

POTENTIAL ISSUES

Rate-limit change breaks existing local dev setups

Previous: dev mode got 10,000 req/window (effectively unlimited).
New: production-level limits by default; RATE_LIMIT_MULTIPLIER must be set explicitly to restore high limits.

Reasonable change, but silent breakage for existing .env.local. A note in CHANGELOG.md pointing devs to add RATE_LIMIT_MULTIPLIER=1000 to .env.local would prevent confusion.

supportsExtendedThinking test coverage gaps

Tests only cover claude-3-5-sonnet, claude-3-opus, claude-3.5-sonnet. The model patterns also include claude-3-7-sonnet, claude-sonnet-4, claude-opus-4 - none tested. These are the newest and most likely production models.

MINOR NITS

Stray blank line before const result = await generateText in non-streaming path of chat/completions/route.ts
DEFAULT_BASE_OUTPUT_TOKENS is exported from app/api/a2a/route.ts with a note saying handlers should import it, but handleChat in the same file uses hardcoded 500
cotBudget ?? undefined in chat/route.ts causes a redundant env re-read inside mergeAnthropicCotProviderOptions; passing the already-resolved value directly avoids it

SUMMARY

Core architecture: Good
Security model: Correct - untrusted callers cannot raise budgets
/messages maxOutputTokens: Bug - missing response buffer, will hit Anthropic API rejections
Root test file: Bug - tests non-existent API, does not cover shipped code
A2A vs MCP reserve inconsistency: Risk - underbilling for A2A thinking requests
Dev rate-limit change: Risk - silent breakage for existing local setups without migration note

Iteration 3 prr-fix:ic-4181043091-0

cursor · 2026-04-02T23:50:45Z

app/api/v1/messages/route.ts

+  const cotBudget = resolveAnthropicThinkingBudgetTokens(model, process.env);
+  // Passing resolved budget to mergeAnthropicCotProviderOptions short-circuits its internal resolution
+  const cotOptions = cotBudget != null ? mergeAnthropicCotProviderOptions(model, process.env, cotBudget) : {};
+  const effectiveMaxTokens = cotBudget ? Math.max(request.max_tokens ?? 0, cotBudget) : request.max_tokens;


Missing response token buffer in effectiveMaxTokens calculation

High Severity

effectiveMaxTokens is computed as Math.max(request.max_tokens ?? 0, cotBudget) without adding any buffer for actual response generation. The chat/completions route correctly uses Math.max(requestMaxTokens ?? MIN_RESPONSE_TOKENS, cotBudget + MIN_RESPONSE_TOKENS) (adding 4096). When request.max_tokens is unset, all tokens go to thinking with zero left for the response, producing empty output.

Additional Locations (2)

app/api/v1/messages/route.ts#L798-L799

packages/lib/api/a2a/skills.ts#L96-L97

Fixed in 9272304.

cursor · 2026-04-02T23:50:46Z

packages/lib/providers/anthropic-thinking.ts

+  /claude-3-opus/, // Claude 3 Opus
+  /claude-sonnet-4/, // Claude Sonnet 4
+  /claude-opus-4/, // Claude Opus 4
+];


Claude 3.5 Sonnet and 3 Opus incorrectly flagged as thinking-capable

High Severity

EXTENDED_THINKING_MODEL_PATTERNS includes claude-3[.-]5-sonnet and claude-3-opus, but per Anthropic docs, extended thinking is only supported starting from Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, and Claude Haiku 4.5. Enabling thinking for Claude 3.5 Sonnet or Claude 3 Opus will cause Anthropic API errors, breaking requests when ANTHROPIC_COT_BUDGET is set.

claude · 2026-04-02T23:54:46Z

Code Review: PR #417 — Anthropic CoT Controls

Overall this is a well-designed feature with solid documentation and clear reasoning for design choices (e.g., keeping thinking budgets on character records, not in request bodies). Below are the issues I found, ordered by severity.

Critical

1. A2A route: streamText never receives maxOutputTokens

app/api/agents/[id]/a2a/route.ts resolves effectiveThinkingBudget and uses it in credit estimation, but the streamText call only spreads mergeAnthropicCotProviderOptions — it never passes maxOutputTokens:

const result = await streamText({
  model: gateway.languageModel(model),
  messages: fullMessages,
  ...mergeAnthropicCotProviderOptions(model, process.env, agentThinkingBudget),
  // maxOutputTokens is missing
});

Anthropic's API requires maxOutputTokens >= budgetTokens when thinking is enabled. Without this, requests can fail when a character has a large thinking budget set. The MCP route correctly calculates and passes maxOutputTokens — A2A should do the same.

2. A2A credit estimation uses 500-token base, contradicting its own comment

In A2A handleChat:

// Add thinking budget to base output estimate (500 tokens) to match MCP route behavior
const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;

The comment says "to match MCP route behavior" but MCP uses DEFAULT_MIN_OUTPUT_TOKENS = 4096. The 500 here is the old pre-PR MCP default — not the current one. This causes systematic underbilling on A2A vs MCP for the same agent:

Route	Base tokens	Effect on credit reserve
MCP	4096	Correct
A2A	500	~1/8 of MCP for same agent/model

Fix: use DEFAULT_MIN_OUTPUT_TOKENS (or a shared constant) instead of the magic 500.

High Priority

3. MIN_RESPONSE_BUFFER in /v1/messages is 4x smaller than other routes

const MIN_RESPONSE_BUFFER = 1000; // messages/route.ts

/v1/chat, /v1/chat/completions, and MCP all use 4096 as the response buffer. With a large thinking budget (e.g., 8192 tokens), the messages route only reserves 1000 tokens for actual response generation — tight enough to cause truncation. Should be 4096 to match, or the difference should be explicitly justified.

4. Token buffer constants are scattered and inconsistent

DEFAULT_MIN_OUTPUT_TOKENS, MIN_RESPONSE_TOKENS, MIN_RESPONSE_BUFFER are defined independently in at least 4 files with values 500, 1000, and 4096. A single shared constant (e.g., in packages/lib/providers/anthropic-thinking.ts) would prevent this class of divergence.

Medium

5. anthropic-thinking.test.ts is in the repo root

The file anthropic-thinking.test.ts imports from ./src/lib/anthropic-thinking but sits in the repo root rather than packages/tests/unit/. It may not be picked up by the test:unit runner and is easy to overlook.

6. supportsExtendedThinking may not cover Claude 3.7 / 4.x models

The function uses regex patterns to detect supported models. Claude 3.7 Sonnet and Claude 4.x variants support extended thinking as of 2026-04-02. If those patterns are not in the list, thinking will silently be disabled for users on newer models. Worth auditing against Anthropic's current model lineup.

7. DEFAULT_BASE_OUTPUT_TOKENS in app/api/a2a/route.ts is exported but unused

export const DEFAULT_BASE_OUTPUT_TOKENS = 4096;
// JSDoc: "This constant should be imported by @/lib/api/a2a handlers"

handleChat uses the magic number 500 instead. Either wire this up or remove it.

Positive Notes

Keeping thinking budgets on character records (not request bodies) is the right call and is well documented.
mergeAnthropicCotProviderOptions / mergeGoogleImageModalitiesWithAnthropicCot cleanly avoid clobbering existing providerOptions.
Fail-fast env validation for ANTHROPIC_COT_BUDGET / ANTHROPIC_COT_BUDGET_MAX is correct.
mockMiladyPricingMinimumDepositForRouteTests properly solves the module mock pollution problem.
Docs and WHY comments throughout are thorough and helpful.

Summary

The two critical issues (missing maxOutputTokens in A2A streamText + 500-token base discrepancy) should block merge — they can cause API errors and systematic underbilling. The MIN_RESPONSE_BUFFER=1000 in /v1/messages is worth fixing before ship too. Everything else is polish.

Iteration 1 prr-fix:prrc_kwdop6btdm60pwi0

Iteration 2 prr-fix:ic-4180539773-2

claude · 2026-04-03T00:08:17Z

PR Review: Anthropic CoT Controls - see full review below

app/api/v1/chat/completions/route.ts

claude · 2026-04-03T00:08:46Z

Code Review

Overall this is solid work — the security model (thinking budgets from character settings, not request bodies) is sound and well-documented, the merge helpers are clean, and the test infrastructure fixes address real Bun flakiness. I found a few bugs and inconsistencies worth addressing before merge.

Bug: 4-arg call to `mergeAnthropicCotProviderOptions` in `chat/completions`

File: app/api/v1/chat/completions/route.ts (~line 546)

The function signature is (modelId, env, agentThinkingBudgetTokens?: number) — three params. The call site passes four:

mergeAnthropicCotProviderOptions(model, process.env, {}, cotBudget)

{} maps to agentThinkingBudgetTokens; cotBudget (fourth) is silently ignored. At runtime {} <= 0 is false (NaN comparison), so the resolved budget is the object {} — likely producing a broken thinking param or a type error inside the Anthropic SDK.

Every other call site in this PR correctly passes three args. Fix:

const cotOptions = cotBudget != null ? mergeAnthropicCotProviderOptions(model, process.env, cotBudget) : {};

Bug: A2A `handleChat` computes `maxOutputTokens` for billing but never passes it to `streamText`

File: app/api/agents/[id]/a2a/route.ts

The MCP route correctly spreads ...(maxOutputTokens && { maxOutputTokens }) into streamText. A2A computes a maxOutputTokens for estimateRequestCost but the streamText call only spreads mergeAnthropicCotProviderOptions with no maxOutputTokens. Anthropic requires max_tokens >= budgetTokens when thinking is enabled; omitting it will likely produce a 400.

Bug: A2A response buffer is 500 tokens — far too small

File: app/api/agents/[id]/a2a/route.ts

const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;

The MCP route uses DEFAULT_MIN_OUTPUT_TOKENS (4096) as the response buffer. 500 tokens is barely a paragraph — thinking will consume its budget and the actual response may be cut off. Use the same 4096 buffer.

Also: DEFAULT_BASE_OUTPUT_TOKENS = 4096 in app/api/a2a/route.ts and DEFAULT_MIN_OUTPUT_TOKENS = 4096 in app/api/agents/[id]/mcp/route.ts are the same constant with different names. These should be one shared export.

Bug: Test file imports a module that doesn't exist in this PR

File: anthropic-thinking.test.ts (repo root)

import { validateBudgetTokens, getThinkingConfig, buildThinkingParam, supportsExtendedThinking }
  from "./src/lib/anthropic-thinking";

The module added by this PR is packages/lib/providers/anthropic-thinking.ts and it exports supportsExtendedThinking, parseAnthropicCotBudgetFromEnv, resolveAnthropicThinkingBudgetTokens, etc. — not validateBudgetTokens, getThinkingConfig, or buildThinkingParam. No ./src/lib/anthropic-thinking file appears in the diff, so this test will fail at import time. The CHANGELOG references packages/tests/unit/anthropic-thinking.test.ts as the intended location — the file appears to have landed at the wrong path with the wrong import.

Minor: `Math.max` guard in `chat/route.ts` is always a no-op

const effectiveMaxOutputTokens = cotBudget
  ? Math.max(DEFAULT_MIN_OUTPUT_TOKENS, cotBudget + DEFAULT_MIN_OUTPUT_TOKENS)
  : undefined;

cotBudget + DEFAULT_MIN_OUTPUT_TOKENS is always greater than DEFAULT_MIN_OUTPUT_TOKENS for any positive budget, so Math.max never selects the left side. Simplify to cotBudget + DEFAULT_MIN_OUTPUT_TOKENS, or follow computeEffectiveMaxTokens in chat/completions which also respects a caller-supplied max_tokens.

Minor: `RATE_LIMIT_DISABLED` is undocumented and misleadingly named

The new early-return for RATE_LIMIT_DISABLED=true only skips the startup validation log — actual rate limiting still runs. The variable isn't in .env.example or the env validator. A developer who sets it expecting to disable limits will be confused. Either document the real scope or rename (e.g. RATE_LIMIT_SKIP_STARTUP_CHECK).

Missing test: Claude 3.7 Sonnet

EXTENDED_THINKING_MODEL_PATTERNS covers claude-3[.-]7-sonnet but the test suite has no cases for it. Worth adding parity with the 3.5 cases.

Must-fix before merge:

Fix 4-arg mergeAnthropicCotProviderOptions call in chat/completions/route.ts
Pass maxOutputTokens to A2A streamText when thinking is active
Fix A2A response buffer from 500 → 4096 and consolidate the duplicate constant
Fix anthropic-thinking.test.ts path and imports to target the actual new module

claude · 2026-04-03T00:08:59Z

PR Review: Anthropic CoT Controls (4 bugs, 1 test gap, 2 quality issues)

BUG 1 - chat/completions/route.ts: Extra arg discards cotBudget. mergeAnthropicCotProviderOptions(model, process.env, {}, cotBudget) passes {} as agentThinkingBudgetTokens (type error) and ignores cotBudget (4th arg). Fix: mergeAnthropicCotProviderOptions(model, process.env, cotBudget).

BUG 2 - agents/[id]/a2a/route.ts: maxOutputTokens computed for credit reservation but never passed to streamText. Anthropic API requires max_tokens >= budget_tokens when thinking active; will reject all CoT-enabled A2A requests. Also base is 500 here vs 4096 everywhere else despite comment claiming MCP parity.

BUG 3 - packages/lib/api/a2a/skills.ts: Math.max(options.maxTokens ?? 4096, cotBudget) leaves zero tokens for actual response when cotBudget >= maxTokens. All other routes use cotBudget + MIN_RESPONSE_TOKENS. Fix: Math.max(options.maxTokens ?? 4096, cotBudget + 4096).

BUG 4 - chat/completions/route.ts and chat/route.ts: ...(effectiveMaxTokens && {...}) silently drops 0. Use != null.

TEST GAP - anthropic-thinking.test.ts (root) imports ./src/lib/anthropic-thinking and tests validateBudgetTokens, getThinkingConfig, buildThinkingParam -- none of which exist in packages/lib/providers/anthropic-thinking.ts. Leftover prototype tests, not covering actual production code. The packages/tests/unit/anthropic-thinking.test.ts mentioned in docs is absent from this diff.

QUALITY - 4096 appears as 5 different constants (MIN_RESPONSE_TOKENS, MIN_RESPONSE_BUFFER, DEFAULT_MIN_OUTPUT_TOKENS, DEFAULT_BASE_OUTPUT_TOKENS) across routes. Extract a single shared constant.

QUALITY - messages/route.ts comment 'short-circuits internal resolution' is misleading; passing resolved cotBudget as agentThinkingBudgetTokens just runs resolveAnthropicThinkingBudgetTokens twice (idempotent but not a short-circuit).

Iteration 1 prr-fix:ic-4180719820-4 prr-fix:ic-4181094158-0

Explains reasoning for dismissed issues inline in code

cursor · 2026-04-03T00:17:10Z

app/api/agents/[id]/a2a/route.ts

    const result = await streamText({
      model: gateway.languageModel(model),
      messages: fullMessages,
+      ...mergeAnthropicCotProviderOptions(model, process.env, agentThinkingBudget),


A2A route missing maxOutputTokens for thinking-enabled streamText

High Severity

The per-agent A2A route enables Anthropic extended thinking via mergeAnthropicCotProviderOptions but never passes maxOutputTokens to streamText. The Anthropic API requires max_tokens >= budget_tokens when thinking is enabled. The MCP route correctly computes and passes maxOutputTokens, but the A2A route omits it entirely, which will cause API rejections when the thinking budget exceeds the SDK's default max tokens. Additionally, the cost estimation base is hardcoded to 500 instead of using 4096 like the MCP route, leading to systematic underbilling.

Additional Locations (1)

app/api/agents/[id]/a2a/route.ts#L306-L308

cursor · 2026-04-03T00:17:10Z

packages/lib/api/a2a/skills.ts

+    // Resolve CoT budget once to compute both provider options and maxOutputTokens
+    const cotBudget = resolveAnthropicThinkingBudgetTokens(model, process.env);
+    // When thinking is enabled, maxOutputTokens must be >= budgetTokens or Anthropic API rejects
+    const effectiveMaxTokens = cotBudget != null ? Math.max(options.maxTokens ?? 4096, cotBudget) : options.maxTokens;


Skills route missing response buffer in maxOutputTokens calculation

Medium Severity

The effectiveMaxTokens calculation uses Math.max(options.maxTokens ?? 4096, cotBudget) without adding a buffer for actual response generation. When cotBudget exceeds the base (e.g. 10000), thinking consumes all available tokens leaving zero for the response. Every other route adds + 4096 (or MIN_RESPONSE_TOKENS) on top of the thinking budget to ensure room for the actual output.

cursor · 2026-04-03T00:17:10Z

app/api/v1/chat/completions/route.ts

  abortSignal: AbortSignal | undefined,
  timeoutMs: number,
  settleReservation: (actualCost: number) => Promise<void>,
+  cotBudget: number | null,


Unused cotBudget parameter in streaming request handler

Low Severity

The cotBudget parameter is passed to handleStreamingRequest but never referenced inside the function body. Only cotOptions and effectiveMaxTokens are used. The non-streaming handler correctly omits this parameter. This is dead code that adds confusion about whether streaming has additional CoT logic.

claude · 2026-04-03T00:17:25Z

Code Review

Overall this is a well-structured PR. The core anthropic-thinking.ts module is clearly documented, properly validated, and the security model (budget comes from server-side character settings, never from client request bodies) is sound. A few issues worth addressing:

Issues

1. Stale root-level test file

anthropic-thinking.test.ts at the repo root imports from ./src/lib/anthropic-thinking — a path that doesn't exist. The real unit tests live at packages/tests/unit/anthropic-thinking.test.ts. This root file looks like a leftover artifact and will fail (or silently be excluded from test runs). It should be removed.

2. Null-check inconsistency in mcp/route.ts

Lines 347–349 use != null:

const estimatedOutputTokens = effectiveThinkingBudget != null
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

But line 379 uses a bare truthy check:

const maxOutputTokens = effectiveThinkingBudget
  ? Math.max(...) + DEFAULT_MIN_OUTPUT_TOKENS
  : undefined;

Since resolveAnthropicThinkingBudgetTokens returns positive integers or null (never 0), both forms are safe today. But the inconsistency makes the intent unclear — prefer != null throughout for explicitness and forward safety.

3. mergeProviderOptions silently clobbers unknown provider keys

The doc comment acknowledges this: keys outside gateway, anthropic, google are shallow-merged (i.e., the extra value wins). If a new call site passes e.g. openai in base and anthropic in extra, the openai key is preserved only because { ...a, ...b } keeps it. But if both base and extra have an openai key, base.openai is silently dropped. Consider adding a test that covers overlapping non-listed keys to document this boundary, or assert it throws in that case if clobbering is unacceptable.

4. Rate limit multiplier is frozen at module load

const rateLimitMultiplier = getRateLimitMultiplier(); // evaluated once
export const RateLimitPresets = { STANDARD: { maxRequests: 60 * rateLimitMultiplier }, ... };

On Vercel serverless this is fine (cold-start per deployment). On long-lived Node processes (e.g. gateway-discord, agent-server), a change to RATE_LIMIT_MULTIPLIER won't take effect without a restart. This is acceptable behavior but worth a note in .env.example alongside the new variable.

5. RATE_LIMIT_DISABLED in validateRateLimitConfig — misleading early return

The comment says "actual rate limiting is still enforced" when RATE_LIMIT_DISABLED=true in non-prod. The early return on line 47–49 skips only the startup log, not the rate limit itself — that's correct. But readers could misread this as disabling validation entirely. Rename the env var to RATE_LIMIT_STARTUP_WARN_DISABLED or add a clearer inline comment to avoid this confusion.

Minor / Nits

parseAnthropicCotBudgetFromEnv returns null for n === 0 (line 94) while parseThinkingBudgetFromCharacterSettings returns the raw 0 — the asymmetry is intentional (character 0 flows into resolveAnthropicThinkingBudgetTokens which then returns null), but this two-step path is easy to break. A comment on parseAnthropicCotBudgetFromEnv noting that 0 from env == disabled (vs 0 from settings flowing through resolve) would help future readers.
mergeAnthropicCotProviderOptions is documented as "equivalent to mergeProviderOptions(undefined, anthropicThinkingProviderOptions(...))" — and that's literally what it does. The extra wrapper adds a call stack level but is harmless. Fine to keep as an ergonomic alias.
EXTENDED_THINKING_MODEL_PATTERNS is module-private but referenced in unit tests via the exported supportsExtendedThinking. When new Claude models ship, this list will need updating — consider a // Keep in sync with Anthropic docs comment pointing to the capability table.

Security ✅

Budget is resolved entirely from server-side state (character.settings + env). Client request bodies cannot inject a thinking budget. This is correctly enforced.
ANTHROPIC_COT_BUDGET_MAX gives operators a hard cap regardless of character JSON — good defense-in-depth.
getRateLimitMultiplier() is no-op in NODE_ENV=production, preventing accidental staging config leakage.

Test Coverage ✅

packages/tests/unit/anthropic-thinking.test.ts (438 lines) covers all exported functions with edge cases. Fix the stale root-level file and coverage is solid.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 8 total unresolved issues (including 7 from previous reviews).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T00:18:58Z

app/api/v1/chat/route.ts

+    const cotBudget = resolveAnthropicThinkingBudgetTokens(selectedModel, process.env);
+    const effectiveMaxOutputTokens = cotBudget
+      ? Math.max(DEFAULT_MIN_OUTPUT_TOKENS, cotBudget + DEFAULT_MIN_OUTPUT_TOKENS)
+      : undefined;


Inconsistent nullish checks for CoT budget across routes

Medium Severity

The cotBudget variable is checked with truthiness (cotBudget ? and effectiveMaxOutputTokens ?) instead of != null. While resolveAnthropicThinkingBudgetTokens returns null (not 0) for disabled, this pattern is inconsistent with other routes (e.g. chat/completions uses != null checks). More critically, if cotBudget is null, the call to mergeAnthropicCotProviderOptions at line 308 passes cotBudget ?? undefined — but if cotBudget were somehow 0, the truthiness check would skip setting maxOutputTokens while still passing 0 to the merge function.

claude · 2026-04-03T00:21:03Z

Code Review: PR #417 — Anthropic CoT Controls

Overall this is a well-structured feature. The core architecture is solid: budget lives on the character record (not in request bodies), the env-based fallback/cap chain is clearly reasoned, and the explicit disable (budget=0) for background services is the right call. A few bugs and inconsistencies need attention before merge.

🔴 Critical

Test file wrong import path — packages/tests/unit/anthropic-thinking.test.ts imports from ./src/lib/anthropic-thinking but the module lives at @/lib/providers/anthropic-thinking. This test will fail to compile/run.

Also, anthropic-thinking.test.ts at the repo root imports from ./src/lib/anthropic-thinking — same issue, different file. If both test files cover the same module, consider consolidating them.

🟠 High — A2A route under-bills extended thinking

app/api/agents/[id]/a2a/route.ts, credit reservation (around the estimateRequestCost call):

// A2A: only 500 base tokens added
const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;

The comment says "to match MCP route behavior" but MCP uses DEFAULT_MIN_OUTPUT_TOKENS = 4096, not 500. For a 1000-token budget, A2A reserves ~1500 tokens while MCP reserves ~5096. A2A will systematically under-bill extended thinking.

Fix: use the same formula as MCP — Math.max(DEFAULT_MIN_OUTPUT_TOKENS, effectiveThinkingBudget) + DEFAULT_MIN_OUTPUT_TOKENS — or import the shared constant.

Also: MCP passes maxOutputTokens to streamText(), but A2A does not. Anthropic requires max_tokens >= budget_tokens when thinking is enabled. Without this, A2A calls with thinking enabled may get API rejections or fall back silently.

🟡 Medium — MCP token calculation inconsistency

app/api/agents/[id]/mcp/route.ts — two different formulas for the same effective budget:

// Credit reservation:
estimatedOutputTokens = baseOutputTokens + effectiveThinkingBudget   // 4096 + budget

// maxOutputTokens for API call:
maxOutputTokens = Math.max(DEFAULT_MIN_OUTPUT_TOKENS, effectiveThinkingBudget) + DEFAULT_MIN_OUTPUT_TOKENS
// = Math.max(4096, budget) + 4096

For budget < 4096, reservation underestimates relative to what's actually sent to Anthropic. These should use the same formula.

🟡 Medium — Rate limit dev mode behavior change is a silent breaking change

packages/lib/middleware/rate-limit.ts: The old message was "Development mode: Rate limits relaxed (10000 req/window)". The new message says dev mode now uses production-equivalent numeric limits. This is a meaningful behavioral change for developers. It's not reflected in the PR description or CHANGELOG. Should be clearly called out and RATE_LIMIT_MULTIPLIER env var documented in .env.example.

🟡 Medium — Duplicated token calculation across v1 routes

The pattern:

const cotBudget = resolveAnthropicThinkingBudgetTokens(model, process.env);
const effectiveMaxTokens = cotBudget != null
  ? Math.max(MIN_RESPONSE_BUFFER, cotBudget + MIN_RESPONSE_BUFFER)
  : undefined;

...is copy-pasted into /v1/chat/route.ts, /v1/messages/route.ts, /v1/responses/route.ts, and /v1/chat/completions/route.ts with slight variations. The MCP/A2A routes will have this too. A computeEffectiveMaxOutputTokens(model, env, agentBudget?) helper in anthropic-thinking.ts would eliminate drift.

🟡 Medium — `messages` route silently inflates `max_tokens`

app/api/v1/messages/route.ts (around the effectiveMaxTokens calculation):

When the caller explicitly provides request.max_tokens and thinking is enabled, the route inflates max_tokens silently to satisfy Anthropic's constraint (max_tokens >= budget_tokens). This is necessary, but the inflated value is not reflected in the response or logged. Callers inspecting the response usage field will see a higher token usage than their requested cap — a billing surprise. At minimum, this should be logged at debug level.

🟢 Low / Nit

app/api/v1/chat/route.ts: cotBudget ?? undefined is redundant — cotBudget is already number | null, and undefined and null are both falsy in this context. Just pass cotBudget.
packages/lib/config/env-validator.ts: ANTHROPIC_COT_BUDGET_MAX accepts "0" as syntactically valid (passes regex), but parseAnthropicCotBudgetMaxFromEnv returns null for "0" (treats it as "no cap"). This is confusing — the validator should either reject "0" explicitly, or the comment in .env.example should say "0" means no cap (currently it says "Optional... caps any effective budget").
packages/lib/providers/anthropic-thinking.ts: Math.trunc() on anthropicThinkingBudgetTokens means 3.9 becomes 3, not 4. This is a valid choice but worth a brief comment since fractional tokens aren't a natural input.
DEFAULT_BASE_OUTPUT_TOKENS exported from app/api/a2a/route.ts but the JSDoc says "This constant should be imported by @/lib/api/a2a handlers." It's currently only exported, not consumed by anything in the diff. Is this wired up, or is it dead code?

✅ What's good

The security model is sound: thinking budget comes from stored character settings, not request bodies. ANTHROPIC_COT_BUDGET_MAX provides a hard cap that untrusted callers can't bypass.
Explicit budget=0 in SEO/promotion/app-builder services is the right design — temperature-sensitive services should never accidentally inherit thinking behavior from env config.
mergeAnthropicCotProviderOptions deep-merging into existing providerOptions (preserving gateway/google keys) is the correct approach.
supportsExtendedThinking() gate before enabling thinking is correctly placed.
The documentation in docs/anthropic-cot-budget.md clearly explains the why.

vercel bot had a problem deploying to Preview March 27, 2026 04:15 Failure

odilitime added 3 commits March 27, 2026 08:02

odilitime marked this pull request as ready for review March 27, 2026 08:06

Copilot AI review requested due to automatic review settings March 27, 2026 08:06

odilitime changed the title ~~fix(types): test project, split check-types, UI package, gateways~~ feat: Anthropic CoT controls Mar 27, 2026

Copilot started reviewing on behalf of odilitime March 27, 2026 08:07 View session

vercel bot had a problem deploying to Preview March 27, 2026 08:08 Failure

cursor bot reviewed Mar 27, 2026

View reviewed changes

packages/lib/api/a2a/skills.ts Outdated Show resolved Hide resolved

odilitime and others added 2 commits March 27, 2026 08:13

Copilot AI reviewed Mar 27, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 27, 2026 08:18 Failure

odilitime added 3 commits March 27, 2026 08:23

packages: improve error handling (env-validator.ts)

9b2bed4

Iteration 1 prr-fix:prrc_kwdop6btdm6yyjtn

packages: add tests for docker-ssh-cloud-deploy.test

d1ba7cb

Iteration 1 prr-fix:prrc_kwdop6btdm6yyjt7

packages: add tests for rate-limit (rate-limit.ts)

fe10e6b

Changes: - rate-limit.ts: PR Review: fix(types): test project, split check-types, UI package, gateways ...

app: fix thinking budget (route.ts)

5adbbbc

Iteration 4 prr-fix:ic-4180968930-2

vercel bot had a problem deploying to Preview April 2, 2026 23:38 Failure

cursor bot reviewed Apr 2, 2026

View reviewed changes

app/api/v1/chat/route.ts Show resolved Hide resolved

odilitime added 2 commits April 2, 2026 23:45

packages: Minor: Misleading no-op comment in app-promotion-assets.ts

a89ce40

Iteration 1 prr-fix:ic-4180483329-5

app: Missing response token capacity in maxOutputTokens calculation

ded69de

Iteration 1 prr-fix:prrc_kwdop6btdm60pqio

vercel bot had a problem deploying to Preview April 2, 2026 23:47 Failure

app: add tests for route (route.ts)

b439d33

Iteration 3 prr-fix:ic-4181043091-0

cursor bot reviewed Apr 2, 2026

View reviewed changes

vercel bot had a problem deploying to Preview April 2, 2026 23:52 Failure

odilitime added 2 commits April 3, 2026 00:03

app: Missing response token buffer in effectiveMaxTokens calculation

3a16c32

Iteration 1 prr-fix:prrc_kwdop6btdm60pwi0

app: fix issues in app (route.ts)

6b2d336

Iteration 2 prr-fix:ic-4180539773-2

vercel bot had a problem deploying to Preview April 3, 2026 00:06 Failure

cursor bot reviewed Apr 3, 2026

View reviewed changes

app/api/v1/chat/completions/route.ts Outdated Show resolved Hide resolved

app: fix issues in app (route.ts)

9d4777a

Iteration 1 prr-fix:ic-4180719820-4 prr-fix:ic-4181094158-0

vercel bot had a problem deploying to Preview April 3, 2026 00:14 Failure

docs: add review dismissal comments

9272304

Explains reasoning for dismissed issues inline in code

cursor bot reviewed Apr 3, 2026

View reviewed changes

vercel bot had a problem deploying to Preview April 3, 2026 00:19 Failure

lalalune merged commit 0384bcb into dev Apr 4, 2026
107 of 112 checks passed

-const createDatabaseAdapter = createDatabaseAdapterDefault as (
-  config: { postgresUrl: string },
-  agentId: UUID,
-) => IDatabaseAdapter;
+type CreateDatabaseAdapterFn = (
+  config: { postgresUrl: string },
+  agentId: UUID,
+) => IDatabaseAdapter;
+function ensureCreateDatabaseAdapter(fn: unknown): CreateDatabaseAdapterFn {
+  if (typeof fn !== "function") {
+    throw new TypeError(
+      'Default export from "@elizaos/plugin-sql/node" is not a callable database adapter factory',
+    );
+  }
+  return fn as CreateDatabaseAdapterFn;
+}
+const createDatabaseAdapter = ensureCreateDatabaseAdapter(
+  createDatabaseAdapterDefault,
+);

Conversation

odilitime commented Mar 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

claude bot commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

odilitime commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

odilitime commented Mar 27, 2026

Uh oh!

coderabbitai bot commented Mar 27, 2026

Uh oh!

odilitime commented Mar 27, 2026

Uh oh!

coderabbitai bot commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026

Code Review — PR #417: Anthropic CoT Controls

🐛 Potential Bugs

⚠️ Breaking Change for Developers

🔍 Code Quality

✅ Strengths

Uh oh!

Uh oh!

claude bot commented Apr 2, 2026

Code Review: feat: Anthropic CoT controls

Bug: anthropic-thinking.test.ts imports a non-existent module

Credit reservation inconsistency between MCP and A2A routes

odilitime commented Mar 27, 2026 •

edited by cursor bot

Loading

vercel bot commented Mar 27, 2026 •

edited

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading

Bug: `anthropic-thinking.test.ts` imports a non-existent module

`DEFAULT_BASE_OUTPUT_TOKENS` export is misleading

Minor: redundant `cotBudget ?? undefined` in `chat/route.ts`

`supportsExtendedThinking` test gap

Missing response token buffer in `effectiveMaxTokens` calculation

Bug: 4-arg call to `mergeAnthropicCotProviderOptions` in `chat/completions`

Bug: A2A `handleChat` computes `maxOutputTokens` for billing but never passes it to `streamText`

Minor: `Math.max` guard in `chat/route.ts` is always a no-op

Minor: `RATE_LIMIT_DISABLED` is undocumented and misleadingly named