bug(slack): unauthenticated file fetch returns HTML, gets forwarded to model as image, poisons session

## Description

When the Slack handler fetches a file URL from `files.slack.com`, it does not validate the response content-type or magic bytes before forwarding to the model API. We observed an `image/png`-labeled payload sent to Anthropic whose decoded bytes were Slack workspace login HTML (~55KB), consistent with the bot token lacking the `files:read` OAuth scope so Slack served the login page in place of the file binary. openab base64-encoded those HTML bytes, labeled them with the Slack-reported MIME, and forwarded to Anthropic, which rejected with `400 invalid_request_error "Could not process image"`.

Two failure modes:

1. **Confusing UX**: the user sees an Anthropic 400 that looks like an Anthropic-side image format problem, not a Slack auth/scope problem.
2. **Session poisoning**: the bad payload is persisted into the claude-agent-acp session JSONL on PVC. Subsequent messages in the same Slack thread resume the session and replay the bad image block as part of conversation history, so Anthropic re-rejects on every turn until the JSONL is manually deleted.

## Reproduction

1. Configure a Slack app for openab without `files:read` scope. Verify with:

   ```bash
   kubectl exec deployment/openab-claude -- sh -c \
     'curl -sS -D - -o /dev/null -X POST https://slack.com/api/auth.test \
        -H "Authorization: Bearer $SLACK_BOT_TOKEN" | grep -i "^x-oauth-scopes:"'
   ```

   The response header should not contain a comma-separated `files:read` token.

2. As a Slack user, upload any image to the bot in a thread.

3. Bot reply:

   ```
   :warning: Internal Error (code: -32603)
   Internal error: API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"Could not process image"},"request_id":"req_011..."}
   ```

4. Add `files:read` scope to the Slack app, reinstall, rotate the token, redeploy, and verify the new scope is in `x-oauth-scopes` from step 1. Now upload another image to the **same Slack thread**: same 400.

5. Recover by deleting the session JSONL inside the pod:

   ```bash
   kubectl exec deployment/openab-claude -- bash -c \
     'grep -lE "Could not process image" /home/node/.claude/projects/-home-node/*.jsonl | xargs rm -v'
   ```

## Expected Behavior

When openab fetches a file URL from Slack and the response is not a valid image, it should:

1. **Detect the failure synchronously**, by inspecting the HTTP `Content-Type` header and/or the response body's magic bytes, before base64-encoding for the model API.
2. **Fail fast with a user-actionable error**, e.g. "I couldn't access that image — make sure the bot has the `files:read` OAuth scope." Don't forward unverified bytes to the model.
3. **Not persist the unverified payload to claude session history**, so that recovering from misconfiguration is "fix the config" without an out-of-band PVC cleanup.

End-to-end success criterion: after the operator adds `files:read` and rotates the token, uploading an image in the same Slack thread succeeds on the next message, with no manual session deletion.

## Actual Behavior

1. openab fetches `files.slack.com/...` with the bot token. We did not capture the raw HTTP response in our logs; based on the bytes that ended up in the conversation block we infer Slack returned the workspace login HTML page in place of the file binary.
2. openab does not visibly validate Content-Type or magic bytes. It reads the body, base64-encodes it, and labels it with the MIME from the Slack file metadata (`image/png` in our case).
3. openab forwards to Anthropic; Anthropic decodes, sees magic `3c 21 44 4f 43 54 59 50` (`<!DOCTYP`) instead of PNG `89 50 4e 47 0d 0a 1a 0a`, and returns 400.
4. claude-agent-acp persists the user message (containing the bad image block) into `~/.claude/projects/-home-node/<session>.jsonl` regardless of API success.
5. Subsequent messages in the same Slack thread resume that session and resend the bad image block as part of history. Anthropic 400s again.

## Evidence

### A. The forwarded payload was HTML, not PNG

Decoded base64 of an `image/png` payload sent to Anthropic from a poisoned session in our cluster (Anthropic request_id `req_011CarWAQQ11f3C1jzLbPkgP`):

| field | value |
|------|-------|
| claimed media_type | `image/png` |
| base64 length | 74,124 chars |
| decoded bytes | 55,592 |
| magic bytes (first 8) | `3c 21 44 4f 43 54 59 50` (`<!DOCTYP`) |
| last 120 bytes | `...slack-www-hhvm-main-iad-ya7nx4dnuum1/ 2026-05-08 21:55:25/...</body></html>` |

PNG magic should be `89 50 4e 47 0d 0a 1a 0a`. The bytes are HTML; the last-byte tail matches Slack's web-page footer. We did not capture the raw fetch response, so the "Slack returned HTML" link is inferred — sanitized fetch logs from anyone who can reproduce in their cluster would tighten this.

### B. The payload was replayed from session history on subsequent turns

Same session ID `774ec817-7ace-4e47-81a9-c875962ef720` (claude-agent-acp session JSONL on PVC):

- Line 6 (06:14 UTC, original incident): `user` message with an `image` block whose base64 decodes to the HTML above. Anthropic 400 (request_id `req_011CarWAQQ11f3C1jzLbPkgP`).
- Line 54 (~10 hours later, 08:04 UTC): user message in the same Slack thread containing **only text** ("你可看到上面的圖片了嗎?") and `sender_context` — no image block. Anthropic 400 again (request_id `req_011Carea29tJDW...`).
- Line 58 (08:06 UTC): same pattern, text-only user message ("can you read the image ?"). Anthropic 400 (request_id `req_011CarefyzaStqh7dZPAd3j5`).
- Line 60 immediately after: synthetic assistant message recording the 400. `isApiErrorMessage: true`, `apiErrorStatus: 400`, model `<synthetic>`.

i.e. text-only user turns failed with the same `Could not process image` error, which is consistent with the bad image block at line 6 being replayed from JSONL on every turn. After we deleted that JSONL on the PVC, a fresh thread/session in the same channel was unblocked.

## Suggested Fix

Two layers of defense, both worth doing:

### 1. Validate before forwarding (primary)

After fetching from Slack, before base64-encoding for the model API:

- Check the HTTP `Content-Type` header from the Slack response. Require the type to be in the model's supported set (for Anthropic Vision: `image/png`, `image/jpeg`, `image/gif`, `image/webp`). Generic `image/*` is too broad — `image/svg+xml`, `image/heic`, `image/avif`, `image/tiff` will be rejected by the model anyway.
- Check the response body's magic bytes. Whitelist the full signatures, not just the first nibble:
  - PNG: `89 50 4e 47 0d 0a 1a 0a`
  - JPEG: `ff d8 ff` (followed by `e0`/`e1`/`e2`/...)
  - GIF: `47 49 46 38 37 61` or `47 49 46 38 39 61`
  - WebP: `52 49 46 46 ?? ?? ?? ?? 57 45 42 50`
- If either check fails, surface a specific error to the user (e.g. "I couldn't access that image — does the bot have the `files:read` OAuth scope?") and skip the model call.

### 2. Don't persist unverified bytes to session history (secondary)

Persist the user-turn image block to the claude session JSONL **only after** the model call returns 200, not before. This way a 4xx from the model leaves the session in the same shape it was before the failed turn, and the operator's normal "fix config and try again" path works without manual JSONL cleanup.

This second defense matters even if (1) is in place: a real-but-corrupted upload can pass magic-byte check and still be rejected by the model, and once the bad block is in history the thread continues failing.

## Severity / Impact

- **Severity**: high for any operator who hits the misconfiguration. The user-facing error blames Anthropic; root cause is a Slack scope; once a thread is poisoned, every reply in it is broken; cluster operator intervention (pod exec, file deletion) is required to recover.
- **Surface**: any openab Slack ingest path where the bot token may not have `files:read` (which is not enforced or warned about by the chart's required-scopes section in the `helm install` NOTES — chart 0.8.2 NOTES does list `files:read` as required, but operators commonly miss it because pod startup succeeds and Slack returns 200 instead of 401 on the file fetch).
- **Discord ingest**: not tested — Discord has different auth/file URL semantics, but the same "fetched bytes forwarded without validation" code path may apply if shared.

## Regression / version scope

- Reproduced on chart 0.8.2 with image `ghcr.io/openabdev/openab-claude:latest` (digest `sha256:3d2017efbf1ab9702a9e3e7eaccb50e1147488cad04517ba50f29f63fd07d0ab` at the time of repro).
- Earlier versions not tested. Whether this is a recent regression or always present is unknown to us; if maintainers know when image ingest was introduced, that bounds the regression range.

## Workaround state

Operators can:

1. Add `files:read` to all bots' Slack apps and reinstall / rotate tokens (prevents new failures).
2. Delete poisoned session JSONLs on the PVC (recovers existing broken threads). For us, `grep -lE "Could not process image" /home/node/.claude/projects/-home-node/*.jsonl | xargs rm -v` was sufficient — we did not need to touch `thread_map.json` or `.claude/sessions/`.

We have documented the operator-side runbook at https://github.com/heyu-ai/openab-workspace/blob/main/docs/runbook.md (search "Slack 整合問題"). It covers diagnosis, remediation, and the PVC-cleanup recovery, but it is operator workaround, not an upstream fix.

## Environment

- openab chart 0.8.2
- Image digests at time of repro:
  - claude: `ghcr.io/openabdev/openab-claude@sha256:3d2017efbf1ab9702a9e3e7eaccb50e1147488cad04517ba50f29f63fd07d0ab`
  - codex: `ghcr.io/openabdev/openab-codex@sha256:4cc5f6fcf6983f57cdcd29560f96ab0d4b144c4fd4dc96d65a04f229f865a2f0`
  - gemini: `ghcr.io/openabdev/openab-gemini@sha256:f157b30aecfba8fac873a04087e836ba45b5ac164ca09f58ac9959c716cb8bd8`
- Three agents enabled (claude / codex / gemini); all three Slack apps had identical OAuth scope sets missing `files:read` before remediation
- Cluster: OrbStack-built-in K8s, single-tenant dev environment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(slack): unauthenticated file fetch returns HTML, gets forwarded to model as image, poisons session #776

Description

Reproduction

Expected Behavior

Actual Behavior

Evidence

A. The forwarded payload was HTML, not PNG

B. The payload was replayed from session history on subsequent turns

Suggested Fix

1. Validate before forwarding (primary)

2. Don't persist unverified bytes to session history (secondary)

Severity / Impact

Regression / version scope

Workaround state

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

field	value
claimed media_type	`image/png`
base64 length	74,124 chars
decoded bytes	55,592
magic bytes (first 8)	`3c 21 44 4f 43 54 59 50` (`<!DOCTYP`)
last 120 bytes	`...slack-www-hhvm-main-iad-ya7nx4dnuum1/ 2026-05-08 21:55:25/...</body></html>`

bug(slack): unauthenticated file fetch returns HTML, gets forwarded to model as image, poisons session #776

Description

Description

Reproduction

Expected Behavior

Actual Behavior

Evidence

A. The forwarded payload was HTML, not PNG

B. The payload was replayed from session history on subsequent turns

Suggested Fix

1. Validate before forwarding (primary)

2. Don't persist unverified bytes to session history (secondary)

Severity / Impact

Regression / version scope

Workaround state

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions