Feat/multi provider merge#1
Open
yjhjstz wants to merge 33 commits into
Open
Conversation
The startup preconnect fired a TCP+TLS handshake to api.anthropic.com even when CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC / DISABLE_TELEMETRY was set, leaking client IP and session timing. Gate it like every other telemetry sink already does.
|
Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information |
DeepSeek's automatic prompt cache keys on the full request body bytes. Anthropic's metadata.user_id is a JSON blob that we populate with the current session_id — which is freshly generated every CLI launch. Result: every new session entered the API with a unique body prefix, forcing cache_creation on each first request and producing zero cache_read_input_tokens across sessions. Pin the session_id field in metadata.user_id to the fixed literal 'claude-code-ds'. Real telemetry / analytics paths still call getSessionId() directly and get the live id — only the wire-level metadata is stabilized. Verified: identical 'say only: ping' requests across two separate CLI launches now show cache_read_input_tokens=15872 on the second call (99.6% hit rate, ~99% cost reduction at Pro discount prices).
… tool_result (CC-1215) When extended thinking + tool_use appear in the same turn under ACP, claude.ts yields two AssistantMessages sharing one message.id and StreamingToolExecutor inserts a tool_result between them. The backward walk used to skip past the tool_result and merge the two assistants, producing duplicate tool_use IDs. ensureToolResultPairing then stripped them, leaving orphaned tool_results and consecutive user messages → API 400. Stop the backward walk at any non-assistant message. Remove the now-unused isToolResultMessage helper. Ref: claude-code-best/claude-code@b62b384
GLM-5.2 supports up to 128K (131072) output tokens, but the model fell through to the catch-all else branch (32K/64K). Add an explicit branch so the Z.AI endpoint's full output capacity is usable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto mode (transcript-classifier auto-approval) was unreachable on the GLM endpoint because of two gates: - modelSupportsAutoMode only allowed claude-*-4-6 for external/firstParty providers, so GLM model IDs never qualified. - The tengu_auto_mode_config kill-switch is never served on the GLM endpoint, so enabledState defaulted to 'disabled' — tripping the circuit breaker and blocking canEnterAuto/carousel availability. Allow glm-5 and above in both gates (glm-5, glm-5.2, glm-6, …), keeping the 'disabled' circuit-breaker default for real Anthropic models. The TRANSCRIPT_CLASSIFIER build flag is already enabled in scripts/build.ts. Claude-Session: https://claude.ai/code/session_012h5pf4zSDdtzKhtSqdCJ8J
Binary build now produces dist/claude-<branch> (e.g. dist/claude-glm) instead of dist/claude, so per-branch artifacts don't overwrite each other. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GLM's /anthropic endpoint does not validate the cch client-attestation header, so computing the xxHash64 body hash and sending the placeholder only adds CPU cost and a useless HTTP header. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace opt-out (isEnvDefinedFalsy) with opt-in (isEnvTruthy) for CLAUDE_CODE_ATTRIBUTION_HEADER. Header is now off unless explicitly set to 1/true/yes/on. Drops GrowthBook dependency from this path.
When running against a third-party endpoint the first-party Anthropic telemetry pipeline doesn't apply. Defaulting to 'no-telemetry' lets the remaining gate-checks (isAnalyticsDisabled, isFeedbackSurveyDisabled in services/analytics/config.ts) short-circuit without requiring users to set DISABLE_TELEMETRY=1 manually. Users can still opt up to 'essential-traffic' via CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to suppress the larger set of non-telemetry network calls (auto-update, MCP registry prefetch, model-capabilities fetch, etc.).
Two related changes: 1. /config TUI now shows an 'Auto-memory' toggle right under 'Auto-compact'. The autoMemoryEnabled setting was already wired up in supportedSettings.ts and honored at runtime by src/memdir/paths.ts:isAutoMemoryEnabled(), but the only places to flip it were the /memory file selector or hand-edited settings.json. 2. Default flips from 'enabled' to 'disabled'. The auto-memory section injects ~3,145 fixed tokens into every system prompt — a 32% surcharge on a minimal -p call (measured: 15,938 -> 10,909 total context tokens). Users running against a third-party endpoint are usually here for the cheap-API-key experience and won't benefit from the memory-persistence machinery that the section instructs. Existing escape hatches still work: - settings.json autoMemoryEnabled: true (per project/user) - CLAUDE_CODE_DISABLE_AUTO_MEMORY=0 env var (highest priority) - /config -> Auto-memory toggle (interactive)
…eepSeekProvider helpers
Recognizes deepseek mode via explicit CLAUDE_CODE_USE_DEEPSEEK=1 env var. Note: Implicit detection via ANTHROPIC_BASE_URL or DEEPSEEK_* envs is intentionally avoided — multiple firstParty-specific code paths (betas, thinking config, preflight) would need to be updated to handle a new provider value, and silently switching the provider on existing ANTHROPIC_BASE_URL users breaks the existing Anthropic-compatible gateway flow that already works for DeepSeek's /anthropic endpoint. isDeepSeekBaseUrl() helper exported for future use by callers that want to detect deepseek mode from the ANTHROPIC_BASE_URL env var without triggering the provider switch.
DeepSeek's API caches request prefixes server-side. Tool schemas appear early in the request body, so keeping their order stable across requests maximizes cache hit rate. Different orderings (e.g. due to dynamic tool loading or feature flags) would otherwise break the cache. Adapted from QingJ01/DeepSeekCode c22d46b.
DeepSeek V4 controls thinking depth entirely via CLAUDE_CODE_EFFORT_LEVEL (server-side); budget_tokens and adaptive thinking are ignored. Send a single minimal thinking param so the SDK still expects thinking blocks in the response. Adapted from QingJ01/DeepSeekCode 28856df. Temperature handling is already correct (only sent when thinking is disabled).
…ting) - countTokensWithAPI / countMessagesTokensWithAPI / countTokensViaHaikuFallback now return null unconditionally — DeepSeek has no /count_tokens endpoint and no Haiku fallback model. - roughTokenCountEstimation switches from content.length to Buffer.byteLength (UTF-8). DeepSeek's tokenizer is byte-pair on UTF-8 bytes, so CJK chars consume ~3 bytes each. The old char-count under-estimates Chinese content by ~3x, causing premature context-exhausted warnings. Adapted from QingJ01/DeepSeekCode e49fdec.
- 402 'Insufficient Balance': friendly message in formatAPIError, classify as 'insufficient_balance', and never retry (account top-up required). - 429 Rate Limit: simple zh-CN message (DeepSeek lacks Anthropic-specific unified-rate-limit headers) and always retry with exponential backoff. - extractDeepSeekTraceId() helper to pull x-ds-trace-id from error headers for debugging. Adapted from QingJ01/DeepSeekCode 5496466.
Generic-but-deepseek-specific behavior adjustments (this branch targets DeepSeek exclusively, so all changes apply unconditionally): - Friendly 422 error message: include nested message hint about tool definitions / message format. - Friendly timeout error: explain DeepSeek queue-saturation cause and suggest lowering effort level. - Classify HTTP 422 as 'invalid_parameters'. - validateModel: known DeepSeek models (deepseek-v4-pro, deepseek-v4-flash) always pass; unrecognized model names pass with a warning explaining the server will silently remap them to deepseek-v4-flash. - model.tsx: surface validateModel's warning to the user via onDone(). - aliases: add 'pro' and 'flash' as canonical aliases (resolved via ANTHROPIC_DEFAULT_*_MODEL envs). - context.ts: use getSessionStartDate() (already present) instead of getLocalISODate() so DeepSeek's server-side prefix cache survives across midnight boundaries within a session. Skipped from upstream cd3a58b: the sanitizeDeepSeekContentBlocks [ERROR] prefix logic, which depends on helpers not yet present in this fork's claude.ts. Adapted from QingJ01/DeepSeekCode cd3a58b.
Previous commit allowed unknown model names with only a warning, but DeepSeek's API silently remaps anything unrecognized to deepseek-v4-flash rather than returning 404. A warning is too easy to miss when the user genuinely typo'd 'deepsek-v4-pro' and wonders why their flash-tier requests are slower than expected. Reject hard with a clear error listing valid models. Also drop the dead sideQuery-based API validation path entirely — DeepSeek never returns 404 for model names, so it's unreachable. Adapted from QingJ01/DeepSeekCode 818e6a3.
DeepSeek silently ignores the is_error: true flag on tool_result content blocks, so the model has no way to detect that a tool call failed and will treat the (often confusing) error text as a normal observation. Add a prefix-injection pass after normalizeMessagesForAPI: when a tool_result has is_error=true, prepend a literal '[ERROR] Tool execution failed:' text block to the content. Walks nested blocks recursively so cached histories with nested tool_results are handled. Adapted from QingJ01/DeepSeekCode cd3a58b (the part that was skipped in the earlier port because it depended on helpers not yet present).
Combined port of QingJ01/DeepSeekCode fad575f + c608652 + 4ba8eca,
adapted: since this branch targets DeepSeek exclusively, all currency
formatting is unconditionally ¥ (no isDeepSeekCurrency() gate).
- modelCost.ts:
- DeepSeek V4 Pro pricing table (¥3/¥6 discounted, ¥12/¥24 full price
via DEEPSEEK_USE_FULL_PRICE=1 — discount window ends 2026-05-31).
- DeepSeek V4 Flash pricing table (¥1/¥2).
- getDeepSeekProCostTier() helper.
- getModelCosts(): route deepseek-v4-pro through tier helper.
- formatPrice(): emit '¥' with 3 decimal places for sub-0.1 prices
(cache reads at ¥0.025 would otherwise round to ¥0.03).
- DEFAULT_UNKNOWN_MODEL_COST: COST_DEEPSEEK_FLASH (was COST_TIER_5_25).
- cost-tracker.ts:
- formatCost(): switch '$' -> '¥'.
- formatTotalCost(): append 'Cache hit rate' and 'Cache savings' lines.
DeepSeek's prompt cache cuts input cost ~120x, so hit rate is the
headline cost driver; surface it next to the totals.
- costHook.ts: drop the hasConsoleBillingAccess() gate — always print the
session cost summary at exit.
- commands/cost/cost.ts: drop the Claude.ai-subscription branch — always
return formatTotalCost() (no overage/subscription concept for DeepSeek).
- CostThresholdDialog.tsx: drop the 'You've spent $5 on the Anthropic
API' literal; generic 'significant amount on API calls' instead.
- screens/REPL.tsx: raise the cost-threshold-reached trigger from 5 (USD)
to 35 (CNY), matching the user-perceived '~5 USD' notification level.
Snapshot of 2026-05-12 prices from both vendors' pricing pages, with multiplier views and a realistic per-request cost table calibrated to the actual cache-hit pattern observed after the session_id-pinning fix. Includes caveats about the V4 Pro discount expiring 2026-05-31, missing Opus-class equivalents on DeepSeek, and capability gaps in DeepSeek's Anthropic-compatible endpoint.
The flag controls a 'cch=00000' placeholder in the x-anthropic-billing-header plus an xxHash64 body-integrity computation. DeepSeek's /anthropic endpoint does not validate this header — every byte we send is dead weight. With CLAUDE_CODE_ATTRIBUTION_HEADER=0 in .env.deepseek the header is suppressed anyway, but disabling the flag also drops the xxhash-wasm initialization on the hot path and trims 2 source-patching ops at build time. Verified: identical 'say only: ping' requests across two CLI launches still produce cache_read_input_tokens=15872 (commit 283678a's session_id pinning remains effective).
MAX_STATUS_CHARS caps the 'git status' output that gets injected into every system prompt. At 2000 chars (~500 tokens) on a dirty repo it dominated the project-context section; cutting it to 1000 chars (~250 tokens) saves cache-write cost on first turn without hiding the information — the truncation message tells the model to run 'git status' via BashTool if it needs the full output. Also fix the truncation message which still said 'exceeds 2k characters' even though the threshold had already been reduced.
This branch can run any backend behind ANTHROPIC_BASE_URL (DeepSeek today). The /thinking toggle description hardcoded 'Claude will think / respond...' which is misleading when the actual model is deepseek-v4-pro. Generic 'Model' label is accurate regardless of provider. Adapted from QingJ01/DeepSeekCode 5adf400 — the rest of that rebrand commit is either pure 'Claude' -> 'DeepSeek' branding, dead code under ClaudeAI-only paths, or logo whitespace trimming.
Drop the USER_TYPE === 'ant' gate on the env var so external builds (e.g. DeepSeek backend) can point the auto-mode classifier at any model without going through GrowthBook config.
Three fixes needed to make `--permission-mode auto` actually work with DeepSeek as both main and classifier model: - betas.ts: extend modelSupportsAutoMode external allowlist to accept ^deepseek- model names alongside the existing Claude family. - permissionSetup.ts: flip AUTO_MODE_ENABLED_DEFAULT from 'disabled' to 'enabled'. With telemetry/GrowthBook stubbed on this branch, tengu_auto_mode_config never resolves and the default was kicking every session out of auto via the circuit-breaker path. - yoloClassifier.ts: drop `type: 'custom'` from YOLO_CLASSIFIER_TOOL_SCHEMA. DeepSeek's /anthropic endpoint returns 400 for unknown tool types; other tools in the codebase already omit this field. Verified end-to-end: `ls /tmp` passes classifier, `curl https://...` triggers deny — both correct behaviors with deepseek-v4-flash as the classifier model (set via CLAUDE_CODE_AUTO_MODE_MODEL).
On platforms without a vendored rg binary (e.g. Android/Termux) or incomplete installs, getRipgrepConfig() returned a builtin path that didn't exist, causing ENOENT on spawn. Now fall back to system rg on PATH (spawning the bare name to prevent PATH hijacking), and carry a `note` surfaced via getRipgrepStatus(). When no rg is available at all, preserve the historical ENOENT path with an explanatory note.
Four high-risk changes from the deepseek branch were unconditional and would have broken Anthropic/GLM runtime behavior: - claude.ts thinking: restore adaptive-vs-budget selection for Anthropic/GLM; DeepSeek keeps the simplified budget_tokens path (effort-level-driven). - claude.ts [ERROR] prefix: only inject into tool_result content for DeepSeek (it ignores is_error); Anthropic/GLM handle is_error correctly. - withRetry.ts 429: DeepSeek always retries (no subscriber tiers); Anthropic retains the ClaudeAI subscriber gate. - validateModel.ts: DeepSeek uses a known-models allowlist (API silently remaps unknown names); all other providers restore main's API-based probe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The deepseek branch relaxed the Anthropic auto-mode allowlist regex from 'claude-(opus|sonnet)-4-6' to 'claude-(opus|sonnet)-4', which incorrectly enables auto mode for older Claude models (4-1, 4-5) that don't support it. Restore the '-6' anchor while keeping the /^deepseek-/ addition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverting the APIProvider type extension: adding 'glm'/'deepseek' as distinct provider values broke every `=== 'firstParty'` gate in the codebase (modelSupportsAutoMode, shouldIncludeFirstPartyOnlyBetas, etc.), which would silently disable features for GLM/DeepSeek users. GLM and DeepSeek both ride Anthropic-compatible firstParty endpoints (ANTHROPIC_BASE_URL + new Anthropic(...)), so they ARE firstParty from the SDK perspective. Keep the APIProvider type unchanged and detect the specific backend via env flags: - isGLMProvider(): CLAUDE_USE_GLM=1 - isDeepSeekProvider(): CLAUDE_USE_DEEPSEEK=1 These gate only the model-aware adaptations (thinking, [ERROR] prefix, 429 retry, validateModel), not the provider routing itself. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P0 fixes from code review: 1. claude.ts getAPIMetadata: the pinned session_id 'claude-code-ds' was unconditional, clobbering Anthropic session telemetry with a fake stable value. Now only applies when isDeepSeekProvider() || isGLMProvider(); real Anthropic keeps getSessionId(). 2. validateModel.ts: remove unused `warning?` field from the return type — it was added by the deepseek branch but never populated by any code path, and no caller destructures it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fcd21b8 to
0227be7
Compare
|
Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information |
This reverts commit 649d1f8.
|
Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.