Skip to content

fix(uploads): attach compiled binary for AI-generated docs, not source#5266

Merged
waleedlatif1 merged 7 commits into
stagingfrom
fix/gmail-pdf-attachment-corrupt
Jun 29, 2026
Merged

fix(uploads): attach compiled binary for AI-generated docs, not source#5266
waleedlatif1 merged 7 commits into
stagingfrom
fix/gmail-pdf-attachment-corrupt

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • AI-generated documents (pdf/docx/pptx/xlsx) created in Chat are stored as their generation source, with the rendered binary kept in a separate content-addressed artifact store. Read/preview/share paths swap in the binary, but attachment/upload/provider paths downloaded the raw source — so a generated PDF emailed via Gmail (and ~34 other tools) arrived as the generator script renamed .pdf (issue Mothership Chat PDF Attachment Sent via Gmail Is Corrupted #5260).
  • Added a shared resolveServableDocBytes resolver + downloadServableFileFromStorage wrapper; the file-serve route now delegates to the same resolver so the serve and attachment paths resolve identically.
  • Migrated ~34 attachment/upload/parse tool routes + the LLM provider attachment path to the servable download. Media-only tools (image/audio/video) and source-editing paths (file/manage compress/decompress) intentionally keep the raw download.
  • Surface a retryable 409 (shared docNotReadyResponse) when a doc's artifact is still compiling, instead of shipping source bytes.

Root cause / when introduced

  • Not an encoding bug and not a regression — a pure omission. The divergence began when Chat's file-generation tools started storing generated docs as source (the read path was updated; attachment/upload paths were not). Real uploaded files (which carry %PDF/ZIP magic) were always unaffected.

Type of Change

  • Bug fix

Testing

  • New doc-servable.test.ts covering every resolver branch (passthrough / artifact-load / not-ready-throws / isolated-vm-compile / non-doc).
  • Updated agiloft + brex route tests for the new return shape.
  • Full affected suite green (149 tests), tsc clean, biome clean, check:api-validation passes.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

AI-generated documents (pdf/docx/pptx/xlsx) created in Chat are stored as
their generation source, with the rendered binary in a separate
content-addressed artifact store. Read/preview paths swap in the binary, but
attachment/upload/provider paths downloaded the raw source — so a generated
PDF emailed via Gmail (and 30+ other tools) arrived as the generator script
renamed .pdf.

- Add shared resolveServableDocBytes resolver + downloadServableFileFromStorage
  wrapper; the file-serve route now delegates to the same resolver so the two
  paths resolve identically.
- Migrate ~34 attachment/upload/parse tool routes + the LLM provider attachment
  path to the servable download; media-only tools and source-editing paths keep
  the raw download intentionally.
- Surface a retryable 409 (shared docNotReadyResponse) when a doc artifact is
  still compiling instead of shipping source.
@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Wide but mechanical refactor across many integration routes; behavior change for generated docs is intentional, with new 409 handling and post-resolve size limits that could affect edge cases if artifacts differ greatly from stored metadata size.

Overview
Fixes attachments and uploads sending generation source (e.g. Python) instead of the real PDF/DOCX/PPTX/XLSX binary for Chat-generated docs, while preview/serve already used the compiled artifact.

Centralizes resolution in resolveServableDocBytes (moved out of the file-serve route) and adds downloadServableFileFromStorage plus shared docNotReadyResponse (409 when the artifact is still compiling). The serve route now delegates to the same resolver so serve and tool paths stay aligned.

Migrates ~34 tool routes (email, cloud uploads, parse, A2A, Teams/Slack, etc.), LLM provider file uploads, and file/manage text extraction to the servable download; several routes now size-check resolved bytes and use the resolved contentType for MIME types.

Adds doc-servable.test.ts and updates Agiloft/Brex mocks for the { buffer, contentType } return shape.

Reviewed by Cursor Bugbot for commit a1a69cb. Configure here.

Comment thread apps/sim/app/api/tools/slack/utils.ts
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug where AI-generated documents (PDF/DOCX/PPTX/XLSX) created in Chat were sent as raw generation source (Python/JS script bytes) instead of the compiled binary artifact when attached via email, chat, or upload tools. The fix introduces a shared resolveServableDocBytes resolver and downloadServableFileFromStorage wrapper, then migrates ~34 tool routes and the LLM provider attachment path to use it.

  • Core resolver (doc-compile.ts): resolveServableDocBytes checks magic bytes to distinguish real uploads from generated-doc sources, loads the content-addressed compiled artifact, throws a retryable DocCompileUserError when the artifact is still building (E2B regime), or falls back to isolated-vm compilation.
  • Unified servable download (file-utils.server.ts): downloadServableFileFromStorage wraps downloadFileFromStorage with the resolver for doc-type files, re-validates the maxBytes limit against the (potentially larger) compiled artifact, and returns the correct binary content type alongside the buffer.
  • 409 surface (servable-file-response.ts): docNotReadyResponse helper returns a consistent 409 JSON response for DocCompileUserError, added to all affected tool route catch blocks.

Confidence Score: 5/5

Safe to merge. The bug fix is well-scoped, the new resolver is the single source of truth for both the file-serve route and all attachment paths, and all previous feedback has been addressed.

The core resolver logic is sound: magic-byte passthrough correctly handles real uploaded binaries for every affected format, DocCompileUserError is surfaced consistently as a 409 across all 34+ tool routes, and the post-resolve maxBytes re-check guards against compiled artifacts larger than their source. All previously flagged gaps (XLSX test coverage, empty content type fallback, Slack/Teams/UptimeRobot 409 handling) have been closed in follow-up commits.

No files require special attention. The new shared files carry the most logic and are well-tested.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/server/files/doc-compile.ts Adds resolveServableDocBytes as the single resolver for binary-vs-source decisions; correctly handles magic-byte passthrough, artifact lookup, E2B not-ready guard, and isolated-vm fallback.
apps/sim/lib/uploads/utils/file-utils.server.ts Adds downloadServableFileFromStorage with a cheap doc-extension pre-filter, dynamic imports of the resolver, and a post-resolve maxBytes re-check for compiled artifacts larger than their source.
apps/sim/lib/uploads/utils/servable-file-response.ts New shared helper converting DocCompileUserError to a consistent 409 JSON response; returns null for other error types so callers continue their own error handling.
apps/sim/lib/copilot/tools/server/files/doc-servable.test.ts New test file covering all resolver branches including binary passthrough, artifact swap, DocCompileUserError, isolated-vm compile, and all XLSX edge cases.
apps/sim/providers/file-attachments.server.ts Migrates downloadFileForUpload to downloadServableFileFromStorage so LLM provider attachment paths now attach compiled binaries.
apps/sim/app/api/files/serve/[...path]/route.ts File-serve route delegates to resolveServableDocBytes instead of duplicating logic. raw=1 bypass preserved. Significant code reduction with identical behavior.
apps/sim/app/api/tools/file/manage/route.ts extractUserFileTextContent now uses servable download so PDF text extraction receives compiled binary. Compress/decompress paths retain downloadFileFromStorage.
apps/sim/app/api/tools/slack/utils.ts Migrated to downloadServableFileFromStorage; route-level docNotReadyResponse added in send-message/route.ts.
apps/sim/tools/microsoft_teams/server-utils.ts Migrated to downloadServableFileFromStorage with MAX_TEAMS_FILE_SIZE guard post-resolve. Route-level 409 handling added in both write_channel and write_chat.
apps/sim/app/api/tools/gmail/send/route.ts Migrated to servable download with 409 handling and post-resolve total-size check against Gmail's 25MB limit.

Reviews (6): Last reviewed commit: "fix(sendgrid): reject attachments exceed..." | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/tools/server/files/doc-servable.test.ts
Comment thread apps/sim/lib/uploads/utils/file-utils.server.ts
…sends

The slack send-message and teams write_channel/write_chat routes call
download helpers that can throw DocCompileUserError while a generated doc is
still compiling. Map it to the shared docNotReadyResponse 409 (matching the
other migrated tool routes) instead of a generic 500. The provider attachment
path is internal LLM execution (no HTTP response), so it intentionally
propagates the typed error.
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 29, 2026 7:21pm

Request Review

…lsx tests

Address review findings:
- uptimerobot create-psp/update-psp now map DocCompileUserError to the shared
  409 (Greptile + Cursor flagged the gap alongside slack/teams).
- downloadServableFileFromStorage returns the extension-derived MIME
  (getMimeTypeFromExtension) for non-doc files instead of an empty string when
  userFile.type is unset.
- Add resolveServableDocBytes tests for the three xlsx branches (binary ZIP
  passthrough, not-ready throw under E2B+beta, no-workspaceId raw passthrough).
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/tools/microsoft_teams/server-utils.ts
Size limits were checked against userFile.size (source metadata) before
resolution, but a generated doc resolves to a larger compiled binary — so a
small-source doc could pass the pre-check yet exceed the service limit. Add a
post-resolution check on the actual resolved bytes (mirroring docusign/vanta)
across gmail send/draft/edit-draft, smtp, outlook send/draft, telegram, sftp,
and teams; the cheap source pre-check stays as an early reject.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d1b3230. Configure here.

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/tools/sftp/upload/route.ts
The SFTP batch upload checked each resolved file against the 100MB cap
individually, so multiple resolved attachments could each pass while their
combined size exceeded the limit. Accumulate resolved bytes across the loop
and reject once the running total exceeds the cap.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/tools/sendgrid/send-mail/route.ts
…resolved bytes

SendGrid had no attachment-size guard, so a generated doc resolving to a large
compiled binary could be sent and fail opaquely at the API. Add a post-resolution
total-size check (30MB, SendGrid's documented message limit) matching the
gmail/smtp/outlook routes.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit a1a69cb. Configure here.

@waleedlatif1 waleedlatif1 merged commit 950c260 into staging Jun 29, 2026
16 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/gmail-pdf-attachment-corrupt branch June 29, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant