Skip to content

fix: improve CI mode 401 error UX#432

Open
MattBro wants to merge 7 commits intomainfrom
fix/ci-401-error-ux
Open

fix: improve CI mode 401 error UX#432
MattBro wants to merge 7 commits intomainfrom
fix/ci-401-error-ux

Conversation

@MattBro
Copy link
Copy Markdown
Contributor

@MattBro MattBro commented May 4, 2026

Problem

When the wizard hits a 401 from the Claude agent in CI mode, it currently emits "Claude Code auth is conflicting with the wizard. Please try again after logging out: claude auth logout" regardless of the actual cause. That advice only applies when there's a real local conflict (ANTHROPIC_* env var or apiKeyHelper in ~/.claude/settings.json). For users running CI mode with a --api-key that's the wrong token type or missing scopes, the message is misleading and sends them down the wrong path.

A second issue: --api-key accepts any string and silently runs to a 401 if the user passes a non-PAT token (e.g. a pha_ OAuth access token from a partner integration, or a phc_ project key). There's no early signal that the wrong token type was passed.

Changes

  1. Smarter 401 message. At 401 time, re-run checkAllSettingsConflicts(options.installDir). Only emit the "Claude Code auth conflicting / claude auth logout" hint when a real conflict is detected. Otherwise emit a CI-specific message listing realistic causes: wrong token prefix (pha_/phc_ instead of phx_), missing llm_gateway:read scope, expired key, region mismatch. Always include the verbose log path so users can dig deeper. Both the CI logger and the TUI overlay use the same logic.
  2. --api-key prefix warning. Non-phx_ values now produce a visible warning identifying pha_ as an OAuth access token and phc_ as a project key. The run continues — defensive, not gatekeeping.
  3. Verbose log additions. initializeAgent now logs the API key prefix (phx_***), the gateway URL, and the result of checkAllSettingsConflicts. The 401 handler also logs the conflict-check verdict. Per-message SDK JSON dumps already capture upstream response bodies, so no separate body-logging entry was added.

Test plan

I'm an agent (Claude Opus 4.7). Automated checks:

  • pnpm test — 589/589 pass
  • pnpm run lint — 0 errors (only pre-existing warnings)
  • pnpm run build and typecheck — clean

No manual end-to-end testing.

Follow-up (not in this PR)

src/utils/setup-utils.ts:495 requests introspection in the interactive OAuth scope set. Per PostHog/posthog#56835, introspection is not a grantable OAuth scope — it's the RFC 7662 token-introspection endpoint, exposed as introspection_endpoint per RFC 8414. Either the AS silently drops it or the wizard's interactive OAuth quietly errors with invalid_scope. Either way, it's stale and should be removed. Leaving as a separate cleanup since this PR is scoped to CI 401 UX.

LLM context

Authored by Claude Opus 4.7 via Claude Code. Driven by a real partner-integration 401 where a pha_ OAuth access token was rejected by the LLM Gateway and the wizard's "Claude Code auth conflicting" message sent the user down the wrong debugging path. Companion PR in posthog/posthog adds the missing scopes to ALLOWED_PROVISIONING_SCOPES. This wizard PR makes the wizard more honest about what's actually wrong when a 401 hits.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

@MattBro MattBro marked this pull request as ready for review May 4, 2026 16:32
@MattBro MattBro requested a review from a team May 4, 2026 16:32
@gewenyu99
Copy link
Copy Markdown
Collaborator

Unrelated aside, I feel like you might know. What is the exact scope we need rn for the Wizard to run? I don't think we've got this written down anywhere. I'll definitely want to write this down.

The minimum is llm_gateway:read right? And ideally we also give them write for dashboards, insights, and queries.

@MattBro
Copy link
Copy Markdown
Contributor Author

MattBro commented May 4, 2026

Strict minimum to boot:

  • user:read/api/users/@me/
  • project:read/api/projects/{id}/
  • llm_gateway:read — Claude agent calls

Without those three the wizard crashes before the agent runs.

Minimum to be useful:

  • query:read — so the agent can verify events flow via HogQL after instrumentation

Current set in setup-utils.ts:495 (after #433 drops stale introspection):

  • dashboard:write
  • insight:write
  • health_issue:read

dashboard:write and insight:write are for the Claude agent's MCP tools, not the wizard's own API calls. health_issue:read is for the PostHog doctor flow at src/lib/workflows/posthog-doctor/fetch.ts:12 (/api/environments/{id}/health_issues/).

Authoritative source for "what scopes does the wizard need" should be the union of:

  1. The wizard repo's direct API calls (user, project, health_issue endpoints)
  2. required_scopes on the MCP tools the wizard's agent flow actually invokes (derive from services/mcp/src/tools/* in posthog/posthog)

#434 makes this declarative: wizard fetches scopes_supported from AS metadata at startup and intersects with a known-needed set. The intersection is the answer to your question, in code.

Copy link
Copy Markdown
Collaborator

@gewenyu99 gewenyu99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Matt!

logToFile(
'API key prefix:',
config.posthogApiKey
? `${config.posthogApiKey.slice(0, 4)}***`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine but gonna check with @sarahxsanders. This won't trigger YARA right?

Context for you Matt, we have some pattern matching rules to prevent the agent from doing dumb stuff, like dumping API keys in logs. So checking to see if this will get yoked by Yara

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope this shouldn't trigger it! the rules trigger on the shape of a full key

Comment thread bin.ts Outdated
}
// Warn (don't fail) on unexpected key prefix — `phx_` is the personal
// API key the LLM Gateway expects. We don't hard-fail because future
// PostHog releases may introduce new prefixes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostHog releases may introduce new prefixes.
Curious about this one, is this a known plan or speculative?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speculative - dropped that line in ee52ea1.

Comment thread bin.ts Outdated
Comment thread bin.ts Outdated
MattBro and others added 5 commits May 6, 2026 13:07
The wizard previously told every 401 error the user got was "Claude Code
auth is conflicting with the wizard," even when no settings.json or
managed-settings conflict existed. CI users hitting a 401 from a bad PAT
prefix (e.g. passing pha_ OAuth tokens or phc_ project keys), missing
llm_gateway:read scope, expired key, or region mismatch were sent down
the wrong debug path.

Re-runs checkAllSettingsConflicts at error time and only suggests
'claude auth logout' when a real conflict is detected. Otherwise lists
the realistic CI-mode 401 causes. Always surfaces the verbose log path.

Also warns (does not fail) on unexpected --api-key prefix in CI mode and
adds a few diagnostic lines to the verbose log: API key prefix, gateway
URL, and the conflict check result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ci flag was plumbed through showAuthError → store → session but
never read anywhere. Removing it; both CI and TUI overlays already
branch on hasSettingsConflict alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent (Wen Yu) Ge <29069505+gewenyu99@users.noreply.github.com>
A leftover markdown triple-backtick after the line 225 string literal
broke tsdown's parser ("Cannot assign to this expression").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MattBro MattBro force-pushed the fix/ci-401-error-ux branch from 63aee6f to ba6511f Compare May 6, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants