Skip to content

feat(gateway): Google Chat attachment support (image / file / audio + STT)#762

Open
canyugs wants to merge 3 commits into
openabdev:mainfrom
canyugs:feat/gateway-googlechat-attachments
Open

feat(gateway): Google Chat attachment support (image / file / audio + STT)#762
canyugs wants to merge 3 commits into
openabdev:mainfrom
canyugs:feat/gateway-googlechat-attachments

Conversation

@canyugs
Copy link
Copy Markdown
Contributor

@canyugs canyugs commented May 6, 2026

Summary

Adds inbound attachment support to the Google Chat gateway adapter (image, text file, audio) using the PR #731 base64 pattern, plus completes the audio → STT path in OAB core for all Custom Gateway adapters (Feishu, Google Chat, …).

User attaches file in Google Chat
  │
  ▼
Webhook → Gateway parses message.attachment[]
  │  returns 200 immediately (background task)
  ▼
Gateway tokio::spawn:
  - SA-token GET /v1/media/{resourceName}?alt=media (30s timeout)
  - image: resize 1200px JPEG75 + base64
  - text: 5×512KB whitelist + base64
  - audio: forward as-is + base64
  ▼
GatewayEvent.content.attachments → core
  │
  ▼  src/gateway.rs decodes per attachment_type:
  - "image"     → ContentBlock::Image  (already in main)
  - "text_file" → ContentBlock::Text   (already in main)
  - "audio"     → stt::transcribe → ContentBlock::Text  (NEW)
  │
  ▼
Agent sees image / text / voice-transcript and replies

Changes

File Change Description
gateway/src/adapters/googlechat.rs MOD GoogleChatMessage.attachment[] parsing + GoogleChatMediaRef enum (Image/File/Audio). download_googlechat_image / _file / _audio via Media API + SA token. parse_attachments branches on source (skips DRIVE_FILE) + contentType. Webhook handler spawns background task for downloads so it returns 200 within Google Chat's 30 s deadline. Per-request 30 s timeout. Text file caps: 5 count / 1 MB total (matches Discord/Slack).
src/gateway.rs MOD Add "audio" branch in attachment → ContentBlock conversion. Decodes base64 → stt::transcribe (gated on stt_config.enabled) → wraps transcript as [Voice message transcript]: …. Falls back to [Voice message — transcription failed for X] so STT failures aren't silent. Threads stt_config through GatewayParams.
src/main.rs MOD Pass cfg.stt.clone() to GatewayParams.stt_config.
docs/google-chat.md MOD Document inbound attachment behavior, size limits, and Drive limitations.

Key Design Decisions

  1. PR feat(gateway): feishu image and text file attachment support #731 base64 pattern — Google Chat Media API requires SA-token auth that the agent doesn't have; gateway downloads, compresses, and base64-encodes before sending over the existing WebSocket attachment channel. No new HTTP server needed (vs. the closed PR feat(gateway): support images and audio for LINE/Telegram #726 MediaStore proxy).

  2. Webhook returns 200 immediately — Multi-attachment downloads can exceed Google Chat's 30 s webhook deadline. Webhook handler does sync parsing then tokio::spawn for downloads + event emit, so Google Chat won't retry.

  3. Per-request 30 s timeout — A hung Media API connection can no longer block the spawned task indefinitely.

  4. Text file caps match Discord/SlackTEXT_FILE_COUNT_CAP = 5 + TEXT_TOTAL_CAP = 1 MB. Text files concatenate into the agent prompt and need an aggregate cap; image and audio are independent and have only per-file size caps. (Feishu currently caps nothing — that's an existing gap in the Feishu adapter, not a model to copy.)

  5. STT in core, not gatewaystt::transcribe already exists for Discord/Slack via download_and_transcribe. Custom Gateway audio just needed a "audio" branch in src/gateway.rs to bridge the same pipeline. No new STT infrastructure; only reuse.

  6. STT failure fallback — Returns [Voice message — transcription failed for X] as a ContentBlock::Text instead of silently dropping audio, so the agent can prompt the user to re-send.

  7. Drive-linked files skippedDRIVE_FILE source needs separate Drive API integration; left for a follow-up. UPLOADED_CONTENT (the common path) works.

Testing

E2E tested on k3s VPS with Google Chat workspace:

Scenario Result
Single image (red/blue/green PNG) PASS
Image + text → agent identifies color PASS — "Blue."
Markdown text file (.md) → agent summarizes PASS — correct one-sentence summary
Empty text + image only PASS — event still forwarded
Multiple images (2 PNG) → "Blue, green" PASS — attachment_count=2
.bin (non-whitelist extension) PASS — skipped (attachment_count=0)
48 MB PNG > 10 MB limit PASS — Content-Length early reject
Audio (.m4a) + STT via Whisper PASS — full transcript routed to agent
Smoke test after webhook bg refactor PASS — single image still replies "Blue."
cargo test — gateway 65 / core 238 PASS

Not Yet Supported

  • Drive-linked attachmentsDRIVE_FILE source skipped; needs Drive API + token.
  • Outbound media (bot sends image / file to user) — needs GatewayReply schema extension.
  • Streaming download — 25 MB audio is buffered in full; high-concurrency OOM is a known follow-up.

Breaking Changes

None. GatewayParams adds an stt_config field but cfg.stt already has Default, and the "audio" arm is gated on stt_config.enabled (default false).

Discord Discussion URL

https://discord.com/channels/1491295327620169908/1501153334042824764

🤖 Generated with Claude Code

canyugs and others added 3 commits May 6, 2026 22:28
Implements image / text file / audio download from Google Chat via
Media API + service account token, following the PR openabdev#731 base64 pattern.

Changes:
- GoogleChatMessage: parse attachment[] array (Attachment / AttachmentDataRef / DriveDataRef structs)
- GoogleChatMediaRef enum: Image / File / Audio variants for typed dispatch
- parse_attachments(): branches on contentType prefix, skips DRIVE_FILE source
- download_googlechat_image(): resize → 1200px JPEG q75, max 10MB, GIF preserved
- download_googlechat_file(): text extension whitelist (.txt/.md/.py/...), max 512KB
- download_googlechat_audio(): forwarded as-is for core STT pipeline, max 25MB
- media_url(): percent-encode resource_name as path segment
- webhook handler: parses attachments, async-downloads via adapter token, populates Content.attachments
- Empty-text events with attachments are now forwarded (previously dropped)
- Tests: 11 new (parse, download success/skip/oversized, URL encoding)

Refs: openabdev#731 (Feishu pattern)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends src/gateway.rs attachment handling to transcribe audio attachments
via the existing STT pipeline (previously only Discord/Slack adapters
went through download_and_transcribe; Custom Gateway adapters got no
audio path even though stt::transcribe was available).

When a gateway adapter (Feishu, Google Chat, etc.) sends an Attachment
with attachment_type = "audio", core now:
1. Decodes base64 → audio bytes
2. Calls stt::transcribe with the configured SttConfig
3. Wraps the transcript as a ContentBlock::Text:
   "[Voice message transcript]: ..."

The audio branch is gated on stt_config.enabled — if STT is disabled in
config, audio attachments fall through unchanged (same as before).

Threads stt_config through GatewayParams and run_gateway_adapter.

This closes the audio attachment gap left by the (now-closed) PR openabdev#726
without re-introducing the HTTP MediaStore proxy approach. Pairs with
the Google Chat adapter audio download (separate PR) — once both land,
Google Chat voice/audio attachments work end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses #4 must-fix items:

#1+#2 Webhook timeout safety:
- Spawn background tokio task for attachment downloads so the webhook
  returns 200 within Google Chat's 30s deadline regardless of how long
  downloads take.
- Add 30s per-request timeout to all Media API GET calls — a single
  hung connection can no longer stall the download task indefinitely.
- Refactor event emission into send_googlechat_event helper to share
  between sync (no-attachment) and async (background download) paths.

#4 Text file caps (matches Discord/Slack):
- TEXT_FILE_COUNT_CAP = 5: skip text files past the 5th with a warning.
- TEXT_TOTAL_CAP = 1 MB: skip text files that would push the running
  aggregate past 1 MB with a warning.
- Image and audio attachments are not capped (same as Discord/Slack).

#6 STT silent failure:
- When stt::transcribe returns None, push a fallback ContentBlock::Text
  ("[Voice message — transcription failed for ...]") so the agent
  knows a voice message was attempted and can ask the user to re-send.
  Previously the failure was silent and confusing.

Skipped from issue #4: #3 (streaming download), #5 (cross-adapter
refactor — adapters stay independent per design), #7-openabdev#10 (cosmetic).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@canyugs canyugs requested a review from thepagent as a code owner May 6, 2026 17:58
Copilot AI review requested due to automatic review settings May 6, 2026 17:58
@github-actions github-actions Bot added the pending-screening PR awaiting automated screening label May 6, 2026
@shaun-agent
Copy link
Copy Markdown
Contributor

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report ## Intent

PR #762 adds inbound attachment handling to the Google Chat gateway so users can send images, text files, and audio messages to OpenAB agents from Google Chat. It also fills a core gateway gap: custom gateway adapters can now convert inbound audio attachments into STT transcripts before passing them to the agent.

The operator-visible problem is that Google Chat users currently lose attachment context, and audio attachments from custom gateways do not reach the existing transcription path. This makes Google Chat less capable than Discord/Slack for multimodal or voice workflows.

Feat

This is primarily a feature PR with a small core integration fix.

Behavioral change:

  • Google Chat webhook events parse message.attachment[].
  • Uploaded Google Chat media is downloaded by the gateway with service-account auth.
  • Images are resized/compressed and forwarded as base64.
  • Whitelisted text files are capped, base64 encoded, and forwarded.
  • Audio files are forwarded as base64 and transcribed in core when STT is enabled.
  • Google Chat webhook handling returns 200 quickly, with media download and event emission moved into a background task.
  • Drive-linked files are explicitly skipped.
  • Docs describe supported attachment behavior and limits.

Who It Serves

Primary beneficiaries:

  • Google Chat end users who want to send images, text files, and voice messages to agents.
  • Deployers running OpenAB in Google Chat workspaces.
  • Agent runtime operators who need consistent custom gateway attachment behavior.
  • Maintainers trying to reduce adapter-specific gaps between Google Chat, Discord, Slack, and Feishu.

Rewritten Prompt

Implement inbound attachment support for the Google Chat gateway adapter.

Parse Google Chat message.attachment[] for uploaded media, skip unsupported Drive-linked attachments, and download supported media using service-account authenticated Google Chat Media API requests. Return the webhook response promptly and perform media download plus gateway event emission asynchronously so Google Chat does not retry due to webhook timeout.

Support:

  • Images: download, enforce size limits, resize to max 1200px, JPEG encode, base64 attach as image.
  • Text files: allow only safe text extensions/content types, enforce count and aggregate size caps, base64 attach as text_file.
  • Audio: download within limits, base64 attach as audio.

In core gateway attachment conversion, add support for audio attachments by decoding base64 and invoking the existing STT transcription path when STT is enabled. Convert successful transcripts to text content blocks and emit an explicit fallback text block when transcription fails.

Add focused tests for attachment parsing, caps, unsupported sources/types, core audio conversion, STT disabled behavior, and webhook fast-return behavior where practical. Update Google Chat docs with supported types, limits, and unsupported Drive files.

Merge Pitch

This is worth advancing because it closes a major Google Chat parity gap and reuses the existing gateway attachment channel instead of introducing a new media-serving subsystem. The PR also improves the shared custom gateway path by making audio attachments useful beyond Google Chat.

Risk profile is moderate. The high-risk areas are the large googlechat.rs change, background task lifecycle/error visibility, memory use from buffering media, and correctness of Google Chat media auth/download behavior. Reviewers will likely focus on whether the gateway-owned background work is observable, bounded, and testable enough before merge.

Best-Practice Comparison

Relevant OpenClaw principles:

  • Gateway-owned scheduling: Relevant. The gateway owns the delayed media download and event emission after webhook acknowledgement.
  • Durable job persistence: Partially relevant. This PR uses tokio::spawn, so work can be lost if the process exits after returning 200.
  • Isolated executions: Somewhat relevant. Media handling is separated into background work, but not isolated as durable jobs.
  • Explicit delivery routing: Relevant. Attachments are converted into typed gateway event content before core handling.
  • Retry/backoff and run logs: Relevant gap. The PR mentions timeouts and fallback behavior, but not durable retries or structured run logs for failed background media processing.

Relevant Hermes Agent principles:

  • Gateway daemon tick model: Mostly not relevant unless OpenAB wants scheduled or recoverable background gateway work.
  • File locking to prevent overlap: Not directly relevant for per-webhook media downloads.
  • Atomic writes for persisted state: Not relevant unless the background task is made durable.
  • Fresh session per scheduled run: Not relevant.
  • Self-contained prompts for scheduled tasks: Not relevant.

The main best-practice tension is that returning 200 before media processing is correct for Google Chat webhook behavior, but without durable persistence it trades retry avoidance for possible silent event loss during crashes or deploys.

Implementation Options

Option 1: Conservative merge with review hardening
Keep the current base64 gateway approach and background task model. Require targeted tests, stronger logging, clearer error paths, and validation of limits/auth behavior before merge. Defer durable media jobs and streaming downloads.

Option 2: Balanced gateway media job abstraction
Keep base64 attachments, but introduce a small internal media-processing job abstraction inside the gateway. Add structured run logs, bounded concurrency, explicit timeout/error reporting, and a clean test surface. Still avoid persistent storage for now.

Option 3: Durable media processing queue
After webhook parse, persist a media-processing job before returning 200. A gateway worker processes jobs with retry/backoff, logs outcomes, and emits events only after successful attachment processing. This aligns more closely with OpenClaw/Hermes durability patterns.

Option 4: MediaStore/proxy architecture
Revisit a media store or proxy design where the gateway stores fetched media and sends references instead of base64 payloads. Core or agents retrieve media through controlled URLs or internal APIs. This is broader and likely requires schema/API changes.

Comparison Table

Option Speed to ship Complexity Reliability Maintainability User impact Fit for OpenAB right now
1. Conservative merge with hardening High Low-Medium Medium Medium High Strong
2. Gateway media job abstraction Medium Medium Medium-High High High Strong
3. Durable media processing queue Low High High Medium-High High Medium
4. MediaStore/proxy architecture Low High High Medium High Weak-Medium

Recommendation

Advance this PR toward merge using Option 1, with a reviewer checklist focused on bounded background execution, observability, test coverage, and memory limits.

This is the right next step because the feature solves a real Google Chat usability gap without requiring a new media architecture. The PR already follows the existing base64 attachment pattern and keeps STT in core, which is the cleaner boundary.

Recommended follow-up split:

  • Merge Google Chat uploaded attachment support and core audio-to-STT bridging first.
  • Open separate issues for durable gateway media jobs, streaming downloads, Drive-linked files, and outbound media replies.
  • Consider Option 2 after merge if more adapters start needing the same background media-processing behavior.

@JARVIS-coding-Agent
Copy link
Copy Markdown
Contributor

Code Review Summary

Overall: Architecture is sound, test coverage is solid. A few items need addressing before merge.


Merge Requirements

1. Clarify media_url() path encoding (potential blocker)

media_url() percent-encodes the entire resourceName including /%2F. Google Chat Media API uses GET /v1/media/{resourceName=**} (reserved expansion), which expects slashes to be preserved.

If attachmentDataRef.resourceName is a path-style value like spaces/X/messages/Y/attachments/Z, this will 404. If it's an opaque token (e.g. AATT_xxx), current implementation is fine.

→ Please confirm the actual format of resourceName from webhook payloads.

2. Remove "env" from TEXT_EXTS whitelist

Users accidentally uploading .env files (containing API keys, DB passwords) would have their secrets exposed in the agent prompt context. Risk outweighs benefit.

3. Add fallback for base64 decode failure (src/gateway.rs:676)

STT failure has a proper fallback message, but base64 decode failure silently drops the audio with no log and no user-facing indication. Should add warn! + a fallback text block (e.g. [Voice message — decode failed for {filename}]).

4. Add fallback when STT is disabled (src/gateway.rs:676)

When stt_config.enabled = false, audio attachments fall through to _ => {} — silently discarded. This contradicts the PR's stated design of "STT failures aren't silent." Suggest adding [Voice message received — STT not enabled] or similar.

5. Concurrency cap for background download tasks (follow-up issue required)

Each attachment webhook spawns an unbounded tokio::spawn with no concurrency semaphore. A high-traffic room or attacker can stack large downloads (10MB images + 25MB audio) and exhaust memory. The 30s timeout only prevents hung connections, not volume. Please open a follow-up issue to add an adapter-level semaphore limiting concurrent media downloads.


Minimum merge criteria

  • Author confirms resourceName format → determines if perf: cache dependency build layer in Dockerfile #1 is a bug
  • Remove "env" from TEXT_EXTS
  • base64 decode failure: add warn! + fallback block
  • STT disabled: add explicit fallback text for audio
  • Open follow-up issue for concurrency cap

Reviewed by: JARVIS, FRIDAY, ULTRON (multi-agent group review)

@masami-agent
Copy link
Copy Markdown
Contributor

PR Review: #762

Reviewed at commit: 3b95c0a

Summary

  • Problem: Google Chat adapter lacks inbound attachment support (image, text file, audio)
  • Approach: Gateway downloads via Media API → compress/base64 → existing WebSocket attachment channel; STT reuses core stt::transcribe pipeline
  • Risk level: Medium

Core Assessment

  1. Problem clearly stated: ✅ — detailed architecture diagram, design decisions, and test matrix in PR description
  2. Approach appropriate: ✅ — follows PR feat(gateway): feishu image and text file attachment support #731 base64 pattern, no new HTTP server needed
  3. Alternatives considered: ✅ — explicitly compared against closed PR feat(gateway): support images and audio for LINE/Telegram #726 MediaStore proxy approach
  4. Best approach for now: ✅ — minimal change, reuses existing STT infrastructure

Findings

🟡 NIT

N1: media_url percent-encodes / — may not match Google API {+name} URI template semantics

  • Where: gateway/src/adapters/googlechat.rs, media_url() function
  • What: Google Media API uses /v1/media/{+name}?alt=media where {+name} is a reserved expansion (RFC 6570) that preserves / unencoded. The current implementation encodes all non-unreserved characters including /.
  • Why it matters: If attachmentDataRef.resourceName ever contains / (as the {+name} template suggests it might), the encoded %2F could cause a 404. E2E tests pass, indicating the current resourceName values are likely opaque tokens without /, or Google tolerates %2F.
  • Suggestion: Either preserve / in the encoding match arm, or add a comment explaining why full encoding is intentional and safe given the observed resourceName format.

N2: Audio attachment silently dropped when stt_config.enabled == false

  • Where: src/gateway.rs, "audio" if stt_config.enabled => match arm
  • What: When the guard fails, the audio attachment falls through to _ => {} with no log.
  • Why it matters: Operators may not realize audio attachments are being discarded.
  • Suggestion: Add a tracing::debug! in the _ => {} arm (or a separate "audio" arm) when an audio attachment is skipped due to STT being disabled.

N3: gateway/Cargo.lock version bump 0.1.00.4.0 without corresponding Cargo.toml change in diff

  • Where: gateway/Cargo.lock
  • What: The lock file shows openab-gateway version changed to 0.4.0, but no gateway/Cargo.toml change appears in this diff.
  • Suggestion: Confirm gateway/Cargo.toml already has version = "0.4.0" on the PR branch (likely from a prior commit or base branch).

N4: Minor allocation in no-attachment path

  • Where: gateway/src/adapters/googlechat.rs, webhook handler
  • What: text is converted to String via .to_string() before checking whether attachments exist. In the no-attachment fast path, this allocation is unnecessary.
  • Suggestion: Move the .to_string() to just before the tokio::spawn block. Minor perf nit, not blocking.

🟢 INFO

  • Excellent test coverage — unit tests for parse_attachments, wiremock integration tests for all three download functions, size rejection test, and extension whitelist test.
  • STT failure fallback is well-designed — returns [Voice message — transcription failed for X] instead of silently dropping, allowing the agent to prompt the user.
  • File aggregate cap logic (TEXT_FILE_COUNT_CAP + TEXT_TOTAL_CAP) is correctly implemented as match arm expressions.
  • stt::transcribe call parameters exactly match the existing function signature.
  • #[allow(clippy::too_many_arguments)] on send_googlechat_event is acceptable; struct refactor can be a follow-up.

Verdict: APPROVE

High-quality PR with thorough documentation, comprehensive E2E testing, and clean reuse of existing infrastructure. All findings are minor nits — no blockers.

@obrutjack
Copy link
Copy Markdown
Collaborator

PR Review (Stage 2 — Second Review): #762

Summary

  • Problem: Google Chat adapter lacks inbound attachment support (image, text file, audio)
  • Approach: Gateway downloads via Media API → compress/base64 → existing WebSocket attachment channel; STT reuses core stt::transcribe pipeline
  • Risk level: Medium

Prior Review Verification (JARVIS + Masami)

Finding Correct? My Assessment
JARVIS #1 / Masami N1: media_url() encodes / as %2F See 🟡 below — needs clarification
JARVIS #2: "env" in TEXT_EXTS Upgraded to 🔴 — security risk
JARVIS #3: base64 decode failure silent drop 🔴 — contradicts stated design goal
JARVIS #4: STT disabled silent discard 🟡 — should log
JARVIS #5: concurrency cap Follow-up issue is fine
Masami N3: Cargo.lock 0.1.0→0.4.0 Needs confirmation
Masami N4: allocation nit Non-blocking

Findings

🔴 Critical: "env" in TEXT_EXTS whitelist — credential exposure risk

  • Where: gateway/src/adapters/googlechat.rs, TEXT_EXTS constant
  • What: .env files typically contain API keys, database passwords, and tokens. Allowing them through the whitelist means a user who accidentally uploads a .env file will have its entire content base64-encoded and injected into the agent prompt context.
  • Why it matters: Agent prompt context may be logged, stored, or visible to other users in shared workspaces. This is a data leak vector.
  • Fix: Remove "env" from TEXT_EXTS.

🔴 Critical: base64 decode failure in audio path — silent data loss

  • Where: src/gateway.rs:676, inside "audio" if stt_config.enabled => arm
  • What: When base64::decode fails, the if let Ok(bytes) simply does not match, and the entire audio attachment is silently dropped — no log, no fallback text block.
  • Why it matters: The PR description explicitly states "STT failures aren't silent" as a design goal. A base64 decode failure is functionally equivalent to an STT failure from the user's perspective — their voice message vanishes without trace.
  • Fix:
"audio" if stt_config.enabled => {
    use base64::Engine;
    match base64::engine::general_purpose::STANDARD.decode(&att.data) {
        Ok(bytes) => {
            // existing STT logic...
        }
        Err(e) => {
            tracing::warn!(filename = %att.filename, error = %e, "gateway audio base64 decode failed");
            extra_blocks.push(ContentBlock::Text {
                text: format!("[Voice message — decode failed for {}]", att.filename),
            });
        }
    }
}

🟡 Minor: media_url() encoding — needs clarification

  • Where: gateway/src/adapters/googlechat.rs, media_url() function
  • What: The function percent-encodes ALL non-unreserved characters including /. Google Chat Media API uses GET /v1/media/{+name} (RFC 6570 reserved expansion = slashes should be preserved).
  • Analysis: E2E tests pass, suggesting resourceName values from Google Chat webhooks are likely opaque tokens without /, or Google tolerates %2F. The unit test uses a synthetic value with /.
  • Ask: Please confirm the actual resourceName format from a real webhook payload. If it's always an opaque token (no slashes), add a comment explaining this assumption.

🟡 Minor: STT disabled — audio silently discarded

  • Where: src/gateway.rs, _ => {} catch-all after "audio" if stt_config.enabled =>
  • What: When stt_config.enabled == false, audio attachments fall through to _ => {} with zero indication.
  • Fix: Add tracing::debug! or a fallback text block so operators know audio was received but STT is disabled.

🟡 Minor: Text file aggregate cap — readability

  • Where: gateway/src/adapters/googlechat.rs, File match arm in background task
  • What: The nested if let Some(ref att) = result { if size > cap { None } else { result } } pattern works correctly but is easy to misread.
  • Suggestion: Consider restructuring for clarity (not blocking).

🟢 Info (Positive Observations)

  • Excellent test coverage — wiremock integration tests for all three download paths, size rejection, extension whitelist, and attachment parsing.
  • send_googlechat_event extraction is a clean refactor.
  • Background task pattern (return 200 immediately, spawn downloads) is correct for Google Chat's 30s webhook deadline.
  • STT failure fallback design is solid.
  • Cargo.lock version bump (0.1.0 → 0.4.0) — please confirm gateway/Cargo.toml already has this version on the PR branch.

Verdict: REQUEST CHANGES

Two fixes required before approve:

  1. Remove "env" from TEXT_EXTS (security)
  2. Add base64 decode failure fallback in audio path (matches your own stated design goal)

Everything else is minor / non-blocking. Overall this is a high-quality PR with solid architecture and thorough testing. 👏

@canyugs Please address the two 🔴 items above. The rest are suggestions you can take or leave. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-contributor pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants