Skip to content

Feat/multi provider merge#1

Open
yjhjstz wants to merge 33 commits into
mainfrom
feat/multi-provider-merge
Open

Feat/multi provider merge#1
yjhjstz wants to merge 33 commits into
mainfrom
feat/multi-provider-merge

Conversation

@yjhjstz

@yjhjstz yjhjstz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

No description provided.

The startup preconnect fired a TCP+TLS handshake to api.anthropic.com
even when CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC / DISABLE_TELEMETRY
was set, leaking client IP and session timing. Gate it like every other
telemetry sink already does.
@cr-gpt

cr-gpt Bot commented Jun 24, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

yjhjstz and others added 28 commits June 24, 2026 20:21
DeepSeek's automatic prompt cache keys on the full request body bytes.
Anthropic's metadata.user_id is a JSON blob that we populate with the
current session_id — which is freshly generated every CLI launch.
Result: every new session entered the API with a unique body prefix,
forcing cache_creation on each first request and producing zero
cache_read_input_tokens across sessions.

Pin the session_id field in metadata.user_id to the fixed literal
'claude-code-ds'. Real telemetry / analytics paths still call
getSessionId() directly and get the live id — only the wire-level
metadata is stabilized.

Verified: identical 'say only: ping' requests across two separate CLI
launches now show cache_read_input_tokens=15872 on the second call
(99.6% hit rate, ~99% cost reduction at Pro discount prices).
… tool_result (CC-1215)

When extended thinking + tool_use appear in the same turn under ACP, claude.ts
yields two AssistantMessages sharing one message.id and StreamingToolExecutor
inserts a tool_result between them. The backward walk used to skip past the
tool_result and merge the two assistants, producing duplicate tool_use IDs.
ensureToolResultPairing then stripped them, leaving orphaned tool_results and
consecutive user messages → API 400.

Stop the backward walk at any non-assistant message. Remove the now-unused
isToolResultMessage helper.

Ref: claude-code-best/claude-code@b62b384
GLM-5.2 supports up to 128K (131072) output tokens, but the model
fell through to the catch-all else branch (32K/64K). Add an explicit
branch so the Z.AI endpoint's full output capacity is usable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto mode (transcript-classifier auto-approval) was unreachable on the
GLM endpoint because of two gates:

- modelSupportsAutoMode only allowed claude-*-4-6 for external/firstParty
  providers, so GLM model IDs never qualified.
- The tengu_auto_mode_config kill-switch is never served on the GLM
  endpoint, so enabledState defaulted to 'disabled' — tripping the
  circuit breaker and blocking canEnterAuto/carousel availability.

Allow glm-5 and above in both gates (glm-5, glm-5.2, glm-6, …), keeping
the 'disabled' circuit-breaker default for real Anthropic models. The
TRANSCRIPT_CLASSIFIER build flag is already enabled in scripts/build.ts.

Claude-Session: https://claude.ai/code/session_012h5pf4zSDdtzKhtSqdCJ8J
Binary build now produces dist/claude-<branch> (e.g. dist/claude-glm)
instead of dist/claude, so per-branch artifacts don't overwrite each other.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GLM's /anthropic endpoint does not validate the cch client-attestation
header, so computing the xxHash64 body hash and sending the placeholder
only adds CPU cost and a useless HTTP header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace opt-out (isEnvDefinedFalsy) with opt-in (isEnvTruthy) for
CLAUDE_CODE_ATTRIBUTION_HEADER. Header is now off unless explicitly
set to 1/true/yes/on. Drops GrowthBook dependency from this path.
When running against a third-party endpoint the first-party Anthropic
telemetry pipeline doesn't apply. Defaulting to 'no-telemetry' lets the
remaining gate-checks (isAnalyticsDisabled, isFeedbackSurveyDisabled in
services/analytics/config.ts) short-circuit without requiring users to
set DISABLE_TELEMETRY=1 manually.

Users can still opt up to 'essential-traffic' via
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to suppress the larger set of
non-telemetry network calls (auto-update, MCP registry prefetch,
model-capabilities fetch, etc.).
Two related changes:

1. /config TUI now shows an 'Auto-memory' toggle right under
   'Auto-compact'. The autoMemoryEnabled setting was already wired up
   in supportedSettings.ts and honored at runtime by
   src/memdir/paths.ts:isAutoMemoryEnabled(), but the only places to
   flip it were the /memory file selector or hand-edited settings.json.

2. Default flips from 'enabled' to 'disabled'. The auto-memory section
   injects ~3,145 fixed tokens into every system prompt — a 32% surcharge
   on a minimal -p call (measured: 15,938 -> 10,909 total context tokens).
   Users running against a third-party endpoint are usually here for the
   cheap-API-key experience and won't benefit from the memory-persistence
   machinery that the section instructs. Existing escape hatches still work:
   - settings.json autoMemoryEnabled: true (per project/user)
   - CLAUDE_CODE_DISABLE_AUTO_MEMORY=0 env var (highest priority)
   - /config -> Auto-memory toggle (interactive)
Recognizes deepseek mode via explicit CLAUDE_CODE_USE_DEEPSEEK=1 env var.

Note: Implicit detection via ANTHROPIC_BASE_URL or DEEPSEEK_* envs is
intentionally avoided — multiple firstParty-specific code paths (betas,
thinking config, preflight) would need to be updated to handle a new
provider value, and silently switching the provider on existing
ANTHROPIC_BASE_URL users breaks the existing Anthropic-compatible
gateway flow that already works for DeepSeek's /anthropic endpoint.

isDeepSeekBaseUrl() helper exported for future use by callers that want
to detect deepseek mode from the ANTHROPIC_BASE_URL env var without
triggering the provider switch.
DeepSeek's API caches request prefixes server-side. Tool schemas appear
early in the request body, so keeping their order stable across requests
maximizes cache hit rate. Different orderings (e.g. due to dynamic tool
loading or feature flags) would otherwise break the cache.

Adapted from QingJ01/DeepSeekCode c22d46b.
DeepSeek V4 controls thinking depth entirely via CLAUDE_CODE_EFFORT_LEVEL
(server-side); budget_tokens and adaptive thinking are ignored. Send a
single minimal thinking param so the SDK still expects thinking blocks
in the response.

Adapted from QingJ01/DeepSeekCode 28856df. Temperature handling is
already correct (only sent when thinking is disabled).
…ting)

- countTokensWithAPI / countMessagesTokensWithAPI / countTokensViaHaikuFallback
  now return null unconditionally — DeepSeek has no /count_tokens endpoint and
  no Haiku fallback model.
- roughTokenCountEstimation switches from content.length to Buffer.byteLength
  (UTF-8). DeepSeek's tokenizer is byte-pair on UTF-8 bytes, so CJK chars
  consume ~3 bytes each. The old char-count under-estimates Chinese content
  by ~3x, causing premature context-exhausted warnings.

Adapted from QingJ01/DeepSeekCode e49fdec.
- 402 'Insufficient Balance': friendly message in formatAPIError, classify
  as 'insufficient_balance', and never retry (account top-up required).
- 429 Rate Limit: simple zh-CN message (DeepSeek lacks Anthropic-specific
  unified-rate-limit headers) and always retry with exponential backoff.
- extractDeepSeekTraceId() helper to pull x-ds-trace-id from error headers
  for debugging.

Adapted from QingJ01/DeepSeekCode 5496466.
Generic-but-deepseek-specific behavior adjustments (this branch targets
DeepSeek exclusively, so all changes apply unconditionally):

- Friendly 422 error message: include nested message hint about
  tool definitions / message format.
- Friendly timeout error: explain DeepSeek queue-saturation cause and
  suggest lowering effort level.
- Classify HTTP 422 as 'invalid_parameters'.
- validateModel: known DeepSeek models (deepseek-v4-pro, deepseek-v4-flash)
  always pass; unrecognized model names pass with a warning explaining
  the server will silently remap them to deepseek-v4-flash.
- model.tsx: surface validateModel's warning to the user via onDone().
- aliases: add 'pro' and 'flash' as canonical aliases (resolved via
  ANTHROPIC_DEFAULT_*_MODEL envs).
- context.ts: use getSessionStartDate() (already present) instead of
  getLocalISODate() so DeepSeek's server-side prefix cache survives
  across midnight boundaries within a session.

Skipped from upstream cd3a58b: the sanitizeDeepSeekContentBlocks
[ERROR] prefix logic, which depends on helpers not yet present in this
fork's claude.ts.

Adapted from QingJ01/DeepSeekCode cd3a58b.
Previous commit allowed unknown model names with only a warning, but
DeepSeek's API silently remaps anything unrecognized to deepseek-v4-flash
rather than returning 404. A warning is too easy to miss when the user
genuinely typo'd 'deepsek-v4-pro' and wonders why their flash-tier
requests are slower than expected.

Reject hard with a clear error listing valid models. Also drop the dead
sideQuery-based API validation path entirely — DeepSeek never returns
404 for model names, so it's unreachable.

Adapted from QingJ01/DeepSeekCode 818e6a3.
DeepSeek silently ignores the is_error: true flag on tool_result content
blocks, so the model has no way to detect that a tool call failed and
will treat the (often confusing) error text as a normal observation.

Add a prefix-injection pass after normalizeMessagesForAPI: when a
tool_result has is_error=true, prepend a literal '[ERROR] Tool
execution failed:' text block to the content. Walks nested blocks
recursively so cached histories with nested tool_results are handled.

Adapted from QingJ01/DeepSeekCode cd3a58b (the part that was skipped in
the earlier port because it depended on helpers not yet present).
Combined port of QingJ01/DeepSeekCode fad575f + c608652 + 4ba8eca,
adapted: since this branch targets DeepSeek exclusively, all currency
formatting is unconditionally ¥ (no isDeepSeekCurrency() gate).

- modelCost.ts:
  - DeepSeek V4 Pro pricing table (¥3/¥6 discounted, ¥12/¥24 full price
    via DEEPSEEK_USE_FULL_PRICE=1 — discount window ends 2026-05-31).
  - DeepSeek V4 Flash pricing table (¥1/¥2).
  - getDeepSeekProCostTier() helper.
  - getModelCosts(): route deepseek-v4-pro through tier helper.
  - formatPrice(): emit '¥' with 3 decimal places for sub-0.1 prices
    (cache reads at ¥0.025 would otherwise round to ¥0.03).
  - DEFAULT_UNKNOWN_MODEL_COST: COST_DEEPSEEK_FLASH (was COST_TIER_5_25).

- cost-tracker.ts:
  - formatCost(): switch '$' -> '¥'.
  - formatTotalCost(): append 'Cache hit rate' and 'Cache savings' lines.
    DeepSeek's prompt cache cuts input cost ~120x, so hit rate is the
    headline cost driver; surface it next to the totals.

- costHook.ts: drop the hasConsoleBillingAccess() gate — always print the
  session cost summary at exit.

- commands/cost/cost.ts: drop the Claude.ai-subscription branch — always
  return formatTotalCost() (no overage/subscription concept for DeepSeek).

- CostThresholdDialog.tsx: drop the 'You've spent $5 on the Anthropic
  API' literal; generic 'significant amount on API calls' instead.

- screens/REPL.tsx: raise the cost-threshold-reached trigger from 5 (USD)
  to 35 (CNY), matching the user-perceived '~5 USD' notification level.
Snapshot of 2026-05-12 prices from both vendors' pricing pages, with
multiplier views and a realistic per-request cost table calibrated to
the actual cache-hit pattern observed after the session_id-pinning fix.

Includes caveats about the V4 Pro discount expiring 2026-05-31, missing
Opus-class equivalents on DeepSeek, and capability gaps in DeepSeek's
Anthropic-compatible endpoint.
The flag controls a 'cch=00000' placeholder in the x-anthropic-billing-header
plus an xxHash64 body-integrity computation. DeepSeek's /anthropic endpoint
does not validate this header — every byte we send is dead weight. With
CLAUDE_CODE_ATTRIBUTION_HEADER=0 in .env.deepseek the header is suppressed
anyway, but disabling the flag also drops the xxhash-wasm initialization
on the hot path and trims 2 source-patching ops at build time.

Verified: identical 'say only: ping' requests across two CLI launches still
produce cache_read_input_tokens=15872 (commit 283678a's session_id pinning
remains effective).
MAX_STATUS_CHARS caps the 'git status' output that gets injected into
every system prompt. At 2000 chars (~500 tokens) on a dirty repo it
dominated the project-context section; cutting it to 1000 chars (~250
tokens) saves cache-write cost on first turn without hiding the
information — the truncation message tells the model to run 'git status'
via BashTool if it needs the full output.

Also fix the truncation message which still said 'exceeds 2k characters'
even though the threshold had already been reduced.
This branch can run any backend behind ANTHROPIC_BASE_URL (DeepSeek
today). The /thinking toggle description hardcoded 'Claude will think /
respond...' which is misleading when the actual model is deepseek-v4-pro.
Generic 'Model' label is accurate regardless of provider.

Adapted from QingJ01/DeepSeekCode 5adf400 — the rest of that rebrand
commit is either pure 'Claude' -> 'DeepSeek' branding, dead code under
ClaudeAI-only paths, or logo whitespace trimming.
Drop the USER_TYPE === 'ant' gate on the env var so external builds
(e.g. DeepSeek backend) can point the auto-mode classifier at any
model without going through GrowthBook config.
Three fixes needed to make `--permission-mode auto` actually work with
DeepSeek as both main and classifier model:

- betas.ts: extend modelSupportsAutoMode external allowlist to accept
  ^deepseek- model names alongside the existing Claude family.
- permissionSetup.ts: flip AUTO_MODE_ENABLED_DEFAULT from 'disabled' to
  'enabled'. With telemetry/GrowthBook stubbed on this branch,
  tengu_auto_mode_config never resolves and the default was kicking
  every session out of auto via the circuit-breaker path.
- yoloClassifier.ts: drop `type: 'custom'` from YOLO_CLASSIFIER_TOOL_SCHEMA.
  DeepSeek's /anthropic endpoint returns 400 for unknown tool types;
  other tools in the codebase already omit this field.

Verified end-to-end: `ls /tmp` passes classifier, `curl https://...`
triggers deny — both correct behaviors with deepseek-v4-flash as the
classifier model (set via CLAUDE_CODE_AUTO_MODE_MODEL).
On platforms without a vendored rg binary (e.g. Android/Termux) or
incomplete installs, getRipgrepConfig() returned a builtin path that
didn't exist, causing ENOENT on spawn. Now fall back to system rg on
PATH (spawning the bare name to prevent PATH hijacking), and carry a
`note` surfaced via getRipgrepStatus(). When no rg is available at all,
preserve the historical ENOENT path with an explanatory note.
Four high-risk changes from the deepseek branch were unconditional and
would have broken Anthropic/GLM runtime behavior:

- claude.ts thinking: restore adaptive-vs-budget selection for Anthropic/GLM;
  DeepSeek keeps the simplified budget_tokens path (effort-level-driven).
- claude.ts [ERROR] prefix: only inject into tool_result content for DeepSeek
  (it ignores is_error); Anthropic/GLM handle is_error correctly.
- withRetry.ts 429: DeepSeek always retries (no subscriber tiers); Anthropic
  retains the ClaudeAI subscriber gate.
- validateModel.ts: DeepSeek uses a known-models allowlist (API silently
  remaps unknown names); all other providers restore main's API-based probe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yjhjstz and others added 3 commits June 24, 2026 20:30
The deepseek branch relaxed the Anthropic auto-mode allowlist regex from
'claude-(opus|sonnet)-4-6' to 'claude-(opus|sonnet)-4', which incorrectly
enables auto mode for older Claude models (4-1, 4-5) that don't support it.
Restore the '-6' anchor while keeping the /^deepseek-/ addition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverting the APIProvider type extension: adding 'glm'/'deepseek' as
distinct provider values broke every `=== 'firstParty'` gate in the
codebase (modelSupportsAutoMode, shouldIncludeFirstPartyOnlyBetas, etc.),
which would silently disable features for GLM/DeepSeek users.

GLM and DeepSeek both ride Anthropic-compatible firstParty endpoints
(ANTHROPIC_BASE_URL + new Anthropic(...)), so they ARE firstParty from
the SDK perspective. Keep the APIProvider type unchanged and detect the
specific backend via env flags:

- isGLMProvider(): CLAUDE_USE_GLM=1
- isDeepSeekProvider(): CLAUDE_USE_DEEPSEEK=1

These gate only the model-aware adaptations (thinking, [ERROR] prefix,
429 retry, validateModel), not the provider routing itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P0 fixes from code review:

1. claude.ts getAPIMetadata: the pinned session_id 'claude-code-ds' was
   unconditional, clobbering Anthropic session telemetry with a fake stable
   value. Now only applies when isDeepSeekProvider() || isGLMProvider();
   real Anthropic keeps getSessionId().

2. validateModel.ts: remove unused `warning?` field from the return type —
   it was added by the deepseek branch but never populated by any code path,
   and no caller destructures it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yjhjstz yjhjstz force-pushed the feat/multi-provider-merge branch from fcd21b8 to 0227be7 Compare June 24, 2026 12:35
@cr-gpt

cr-gpt Bot commented Jun 24, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

@cr-gpt

cr-gpt Bot commented Jun 24, 2026

Copy link
Copy Markdown

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant