Skip to content

bug(slack): unauthenticated file fetch returns HTML, gets forwarded to model as image, poisons session #776

@howie

Description

@howie

Description

When the Slack handler fetches a file URL from files.slack.com, it does not validate the response content-type or magic bytes before forwarding to the model API. We observed an image/png-labeled payload sent to Anthropic whose decoded bytes were Slack workspace login HTML (~55KB), consistent with the bot token lacking the files:read OAuth scope so Slack served the login page in place of the file binary. openab base64-encoded those HTML bytes, labeled them with the Slack-reported MIME, and forwarded to Anthropic, which rejected with 400 invalid_request_error "Could not process image".

Two failure modes:

  1. Confusing UX: the user sees an Anthropic 400 that looks like an Anthropic-side image format problem, not a Slack auth/scope problem.
  2. Session poisoning: the bad payload is persisted into the claude-agent-acp session JSONL on PVC. Subsequent messages in the same Slack thread resume the session and replay the bad image block as part of conversation history, so Anthropic re-rejects on every turn until the JSONL is manually deleted.

Reproduction

  1. Configure a Slack app for openab without files:read scope. Verify with:

    kubectl exec deployment/openab-claude -- sh -c \
      'curl -sS -D - -o /dev/null -X POST https://slack.com/api/auth.test \
         -H "Authorization: Bearer $SLACK_BOT_TOKEN" | grep -i "^x-oauth-scopes:"'

    The response header should not contain a comma-separated files:read token.

  2. As a Slack user, upload any image to the bot in a thread.

  3. Bot reply:

    :warning: Internal Error (code: -32603)
    Internal error: API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"Could not process image"},"request_id":"req_011..."}
    
  4. Add files:read scope to the Slack app, reinstall, rotate the token, redeploy, and verify the new scope is in x-oauth-scopes from step 1. Now upload another image to the same Slack thread: same 400.

  5. Recover by deleting the session JSONL inside the pod:

    kubectl exec deployment/openab-claude -- bash -c \
      'grep -lE "Could not process image" /home/node/.claude/projects/-home-node/*.jsonl | xargs rm -v'

Expected Behavior

When openab fetches a file URL from Slack and the response is not a valid image, it should:

  1. Detect the failure synchronously, by inspecting the HTTP Content-Type header and/or the response body's magic bytes, before base64-encoding for the model API.
  2. Fail fast with a user-actionable error, e.g. "I couldn't access that image — make sure the bot has the files:read OAuth scope." Don't forward unverified bytes to the model.
  3. Not persist the unverified payload to claude session history, so that recovering from misconfiguration is "fix the config" without an out-of-band PVC cleanup.

End-to-end success criterion: after the operator adds files:read and rotates the token, uploading an image in the same Slack thread succeeds on the next message, with no manual session deletion.

Actual Behavior

  1. openab fetches files.slack.com/... with the bot token. We did not capture the raw HTTP response in our logs; based on the bytes that ended up in the conversation block we infer Slack returned the workspace login HTML page in place of the file binary.
  2. openab does not visibly validate Content-Type or magic bytes. It reads the body, base64-encodes it, and labels it with the MIME from the Slack file metadata (image/png in our case).
  3. openab forwards to Anthropic; Anthropic decodes, sees magic 3c 21 44 4f 43 54 59 50 (<!DOCTYP) instead of PNG 89 50 4e 47 0d 0a 1a 0a, and returns 400.
  4. claude-agent-acp persists the user message (containing the bad image block) into ~/.claude/projects/-home-node/<session>.jsonl regardless of API success.
  5. Subsequent messages in the same Slack thread resume that session and resend the bad image block as part of history. Anthropic 400s again.

Evidence

A. The forwarded payload was HTML, not PNG

Decoded base64 of an image/png payload sent to Anthropic from a poisoned session in our cluster (Anthropic request_id req_011CarWAQQ11f3C1jzLbPkgP):

field value
claimed media_type image/png
base64 length 74,124 chars
decoded bytes 55,592
magic bytes (first 8) 3c 21 44 4f 43 54 59 50 (<!DOCTYP)
last 120 bytes ...slack-www-hhvm-main-iad-ya7nx4dnuum1/ 2026-05-08 21:55:25/...</body></html>

PNG magic should be 89 50 4e 47 0d 0a 1a 0a. The bytes are HTML; the last-byte tail matches Slack's web-page footer. We did not capture the raw fetch response, so the "Slack returned HTML" link is inferred — sanitized fetch logs from anyone who can reproduce in their cluster would tighten this.

B. The payload was replayed from session history on subsequent turns

Same session ID 774ec817-7ace-4e47-81a9-c875962ef720 (claude-agent-acp session JSONL on PVC):

  • Line 6 (06:14 UTC, original incident): user message with an image block whose base64 decodes to the HTML above. Anthropic 400 (request_id req_011CarWAQQ11f3C1jzLbPkgP).
  • Line 54 (~10 hours later, 08:04 UTC): user message in the same Slack thread containing only text ("你可看到上面的圖片了嗎?") and sender_context — no image block. Anthropic 400 again (request_id req_011Carea29tJDW...).
  • Line 58 (08:06 UTC): same pattern, text-only user message ("can you read the image ?"). Anthropic 400 (request_id req_011CarefyzaStqh7dZPAd3j5).
  • Line 60 immediately after: synthetic assistant message recording the 400. isApiErrorMessage: true, apiErrorStatus: 400, model <synthetic>.

i.e. text-only user turns failed with the same Could not process image error, which is consistent with the bad image block at line 6 being replayed from JSONL on every turn. After we deleted that JSONL on the PVC, a fresh thread/session in the same channel was unblocked.

Suggested Fix

Two layers of defense, both worth doing:

1. Validate before forwarding (primary)

After fetching from Slack, before base64-encoding for the model API:

  • Check the HTTP Content-Type header from the Slack response. Require the type to be in the model's supported set (for Anthropic Vision: image/png, image/jpeg, image/gif, image/webp). Generic image/* is too broad — image/svg+xml, image/heic, image/avif, image/tiff will be rejected by the model anyway.
  • Check the response body's magic bytes. Whitelist the full signatures, not just the first nibble:
    • PNG: 89 50 4e 47 0d 0a 1a 0a
    • JPEG: ff d8 ff (followed by e0/e1/e2/...)
    • GIF: 47 49 46 38 37 61 or 47 49 46 38 39 61
    • WebP: 52 49 46 46 ?? ?? ?? ?? 57 45 42 50
  • If either check fails, surface a specific error to the user (e.g. "I couldn't access that image — does the bot have the files:read OAuth scope?") and skip the model call.

2. Don't persist unverified bytes to session history (secondary)

Persist the user-turn image block to the claude session JSONL only after the model call returns 200, not before. This way a 4xx from the model leaves the session in the same shape it was before the failed turn, and the operator's normal "fix config and try again" path works without manual JSONL cleanup.

This second defense matters even if (1) is in place: a real-but-corrupted upload can pass magic-byte check and still be rejected by the model, and once the bad block is in history the thread continues failing.

Severity / Impact

  • Severity: high for any operator who hits the misconfiguration. The user-facing error blames Anthropic; root cause is a Slack scope; once a thread is poisoned, every reply in it is broken; cluster operator intervention (pod exec, file deletion) is required to recover.
  • Surface: any openab Slack ingest path where the bot token may not have files:read (which is not enforced or warned about by the chart's required-scopes section in the helm install NOTES — chart 0.8.2 NOTES does list files:read as required, but operators commonly miss it because pod startup succeeds and Slack returns 200 instead of 401 on the file fetch).
  • Discord ingest: not tested — Discord has different auth/file URL semantics, but the same "fetched bytes forwarded without validation" code path may apply if shared.

Regression / version scope

  • Reproduced on chart 0.8.2 with image ghcr.io/openabdev/openab-claude:latest (digest sha256:3d2017efbf1ab9702a9e3e7eaccb50e1147488cad04517ba50f29f63fd07d0ab at the time of repro).
  • Earlier versions not tested. Whether this is a recent regression or always present is unknown to us; if maintainers know when image ingest was introduced, that bounds the regression range.

Workaround state

Operators can:

  1. Add files:read to all bots' Slack apps and reinstall / rotate tokens (prevents new failures).
  2. Delete poisoned session JSONLs on the PVC (recovers existing broken threads). For us, grep -lE "Could not process image" /home/node/.claude/projects/-home-node/*.jsonl | xargs rm -v was sufficient — we did not need to touch thread_map.json or .claude/sessions/.

We have documented the operator-side runbook at https://github.com/heyu-ai/openab-workspace/blob/main/docs/runbook.md (search "Slack 整合問題"). It covers diagnosis, remediation, and the PVC-cleanup recovery, but it is operator workaround, not an upstream fix.

Environment

  • openab chart 0.8.2
  • Image digests at time of repro:
    • claude: ghcr.io/openabdev/openab-claude@sha256:3d2017efbf1ab9702a9e3e7eaccb50e1147488cad04517ba50f29f63fd07d0ab
    • codex: ghcr.io/openabdev/openab-codex@sha256:4cc5f6fcf6983f57cdcd29560f96ab0d4b144c4fd4dc96d65a04f229f865a2f0
    • gemini: ghcr.io/openabdev/openab-gemini@sha256:f157b30aecfba8fac873a04087e836ba45b5ac164ca09f58ac9959c716cb8bd8
  • Three agents enabled (claude / codex / gemini); all three Slack apps had identical OAuth scope sets missing files:read before remediation
  • Cluster: OrbStack-built-in K8s, single-tenant dev environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingp1High — address this sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions