Skip to content

fix(mcp): set isError on tool failures + surface request_id/retry/capabilities/ignored_fields, correct anon-stack TTL#44

Merged
mastermanas805 merged 1 commit into
masterfrom
fix/mcp-error-contract-render
Jun 10, 2026
Merged

fix(mcp): set isError on tool failures + surface request_id/retry/capabilities/ignored_fields, correct anon-stack TTL#44
mastermanas805 merged 1 commit into
masterfrom
fix/mcp-error-contract-render

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

Live-cohort dogfood found error-contract + rendering gaps that hurt agent integration. Five fixes in one PR, branched fresh off master (built on #43's deploy-id schema + dispatch-path test patterns).

FIX 1 (P1) — tool failures never set isError

Symptom: every API-side failure (402/404/409/503/501) returned as a NORMAL successful CallToolResult whose text merely said instanode.dev error (…). grep isError src/ = 0 hits. A host/agent could not distinguish failure from success.
Enumeration: grep -n "return textResult(formatError(err));" src/index.ts → 29 sites (all tool catch blocks).
Fix: shared errorResult(err) helper (src/index.ts) wraps formatError and sets isError: true. All 29 catch blocks now return errorResult(err). textResult(text, isError=false) only sets the flag when asked, so SUCCESS results never carry it. A 402/403 upgrade/permission gate IS a failure (op did not happen) → isError:true, with the agent_action text preserved by formatError.
Sites found / touched: 29 / 29.
Live verified: dispatch-path test asserts result.isError === true on a 404, a 402 tier-gate, and an auth-required failure; !== true on success.

FIX 2 (P1) — error text drops request_id and retry_after_seconds

Symptom: grep request_id src/ = 0; the raw envelope carried request_id (support correlation) + sometimes retry_after_seconds (backoff).
Fix: ApiError (src/client.ts) now carries requestId + retryAfterSeconds, lifted off the envelope in a single shared apiErrorFromEnvelope() used by BOTH request<T> and requestMultipart<T> (no drift). formatError (src/index.ts) appends Retry after: <n>s (when present) and Request ID: <id> (always, last line), preserving the headline + agent_action + upgrade/claim URLs.

FIX 3 (P2) — get_capabilities rendered less than its description promised

Fix: the render loop (src/index.ts get_capabilities) now emits resource count: <svc n, …> (resource_count_limit) and durability: RPO <n>m, RTO <n>m (rpo/rto). Backups were already rendered. Only fields present in the live /api/v1/capabilities response are rendered (verified against the live endpoint).

FIX 4 (P2) — stale anon-stack TTL prose (24h → 6h)

Symptom: create_stack description + README said anon stack TTL is "24h" (×4). Real value is 6h (api/internal/handlers/stack.go anonymousStackTTL = 6 * time.Hour, label "6h"; api PR #214).
Fix: corrected every anon-STACK 24h → 6h in src/index.ts (create_stack desc), src/client.ts (3 sites), README.md (2 sites). The anon RESOURCE TTL (db/cache/etc.) is correctly LEFT at 24h — only the STACK TTL is 6h.

FIX 5 (P3) — ignored_fields echo

Fix: ProvisionResultBase.ignored_fields?: string[] + a shared appendIgnoredFields() wired into the 7 resource-provision tools, surfacing dropped unknown params (api #283) so an agent learns its hallucinated params had no effect. No-op on a clean request.


Tests

  • New test/error-contract-unit.test.ts: drives isError through the REAL registered-tool dispatch (server._registeredTools[name].handler(args), the fix(deploy): validate deployment id as 8-hex app_id, not UUID #43 pattern — not a handler bypass); asserts request_id in rendered error text; capabilities render; ignored_fields render; and a prose assertion that no source string pairs stack with a 24h TTL claim.
  • test/client-unit.test.ts: covers the envelope helper's request_id + retry_after_seconds truthy + defensive (malformed) branches.
  • test/mock-api.ts: error envelope now stamps request_id (+ optional retry_after_seconds), mirroring the real api.
  • Added the new test to package.json test/test:nocov and coverage.yml's lcov list (for the 100%-patch diff-cover gate).

Local: npm run build clean; npm test495 pass / 0 fail; client.js 100% line coverage, index.js only uncovered = the pre-existing server.connect listen path.


Follow-ups (flagged, NOT built) — 3 missing tools the dogfood noted

  • get_deploy_logsGET /deploy/{id}/logs (build/runtime log retrieval through MCP).
  • stack-redeploy tool → a POST /stacks/{slug}/redeploy wrapper (today MCP can create + get + update-env a stack but not redeploy it).
  • resource-credentials re-fetchGET /api/v1/resources/:id/credentials (pairs with the cli resource creds verb being added in parallel).

🤖 Generated with Claude Code

…abilities/ignored_fields, correct anon-stack TTL

Live-cohort dogfood found error-contract + rendering gaps that hurt agent
integration. Five fixes in one PR:

FIX 1 (P1) — tool failures now set isError:true. Every API-side failure
(402/404/409/503/501) was returned as a NORMAL successful CallToolResult whose
text merely said "instanode.dev error (...)", so a host could not distinguish
failure from success. Per the MCP spec a tool-execution failure is reported IN
the result via isError:true. Added a shared errorResult(err) helper that wraps
formatError and sets the flag; all ~29 tool catch blocks now return
errorResult(err) instead of textResult(formatError(err)). SUCCESS results never
set it. A 402/403 upgrade/permission gate IS a failure (the op did not happen) →
isError:true, with the agent_action text preserved.

FIX 2 (P1) — error text now includes request_id (always) + retry_after_seconds
(when present). The raw envelope carried both for support correlation /
backoff; formatError dropped them. ApiError now carries requestId +
retryAfterSeconds, lifted off the envelope in a single shared
apiErrorFromEnvelope() used by both request<T> and requestMultipart<T>.

FIX 3 (P2) — get_capabilities now renders resource_count_limit + RPO/RTO
(it already rendered backups), matching its tool description's promise.

FIX 4 (P2) — corrected the anon-STACK TTL prose from 24h to 6h (the real value;
api PR #214 anonymousStackTTL) across src + README. The anon RESOURCE TTL
(db/cache/etc.) is correctly left at 24h.

FIX 5 (P3) — provision responses now echo ignored_fields (api #283) via a
shared appendIgnoredFields() on the 7 resource-provision tools, so an agent
learns its hallucinated params were dropped.

Tests: new test/error-contract-unit.test.ts drives isError through the real
registered-tool dispatch path (handler), asserts request_id in rendered text,
capabilities render, ignored_fields render, and a no-"24h"-stack-TTL prose
assertion; client-unit covers the envelope helper's request_id/retry branches.
495/495 pass; client.js 100% line coverage. Added the new test to the CI test
list + the coverage.yml lcov list for the 100%-patch diff-cover gate.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mastermanas805 mastermanas805 enabled auto-merge (squash) June 10, 2026 20:16
@mastermanas805 mastermanas805 merged commit 2e5692c into master Jun 10, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant