-
-
Notifications
You must be signed in to change notification settings - Fork 23.6k
fix: correct file input field mapping for PDF and other file uploads #5633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: correct file input field mapping for PDF and other file uploads #5633
Conversation
Fix a copy-paste bug where fileInputFieldFromExt was incorrectly used instead of fileInputFieldFromMimeType in the fallback condition. This caused PDF files to be processed with TextLoader instead of PDFLoader, resulting in 'Unable to upload documents' errors. The bug affected file uploads across multiple features: - Agent file attachments (createAttachment) - Vector store uploads (upsertVector) - Chatflow file inputs (buildChatflow) - Document store uploads (documentstore) Why it failed for PDFs but not Word/Excel: - .docx and .xlsx extensions are properly mapped in mapExtToInputField() - .pdf extension was not mapped, so it fell to the MIME type fallback - The fallback incorrectly assigned the ext value instead of MIME value - This caused PDFs to use 'txtFile' instead of 'pdfFile' loader Fixes: PDF upload failures in Agentflows with full file upload enabled Related: CVE-2025-61687 file validation improvements
Summary of ChangesHello @vellanki-santhosh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request rectifies a critical bug that prevented PDF files from being uploaded and processed correctly within Agentflows. By correcting a subtle but impactful logical error in the file type mapping mechanism, the system now accurately assigns the appropriate file loader based on MIME type, ensuring robust handling of various document formats. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request effectively addresses the critical bug where PDF and other files were incorrectly mapped, leading to processing failures. The fix correctly prioritizes MIME type mapping when extension-based mapping defaults to 'txtFile'. The changes are consistent across all affected files and directly resolve the reported issue.
| if (fileInputFieldFromExt !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| } else if (fileInputFieldFromMimeType !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| fileInputField = fileInputFieldFromMimeType | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix correctly addresses the bug by using fileInputFieldFromMimeType in the fallback condition. To improve readability and prevent similar copy-paste errors in the future, consider refactoring this conditional logic. The current structure, while fixed, could still be prone to misinterpretation or accidental reintroduction of the bug. A more sequential approach to determining fileInputField would enhance maintainability.
let fileInputField = mapExtToInputField(fileExtension);
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype);
}| if (fileInputFieldFromExt !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| } else if (fileInputFieldFromMimeType !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| fileInputField = fileInputFieldFromMimeType | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change correctly applies the fix for the file input field mapping. Similar to the comment in packages/server/src/services/documentstore/index.ts, consider refactoring this conditional logic for improved clarity and maintainability. The suggested pattern simplifies the decision-making process, making it less susceptible to future errors.
let fileInputField = mapExtToInputField(fileExtension);
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype);
}| if (fileInputFieldFromExt !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| } else if (fileInputFieldFromMimeType !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| fileInputField = fileInputFieldFromMimeType | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correction here is accurate and resolves the bug. For better code robustness and to prevent recurrence of this type of error, I recommend refactoring this if/else if block into a more linear assignment, as suggested in the other files. This enhances the clarity of the fallback logic.
let fileInputField = mapExtToInputField(fileExtension);
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype);
}| if (fileInputFieldFromExt !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| } else if (fileInputFieldFromMimeType !== 'txtFile') { | ||
| fileInputField = fileInputFieldFromExt | ||
| fileInputField = fileInputFieldFromMimeType | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change correctly implements the fix for the file input field mapping. To enhance code clarity and reduce the potential for similar logical errors, consider refactoring this conditional block. A more direct assignment with a fallback check can make the intent clearer and the code more maintainable.
let fileInputField = mapExtToInputField(fileExtension);
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype);
}Apply code review suggestion to use clearer linear fallback instead
of nested conditionals. This makes the intent more obvious and
prevents future copy-paste errors.
Changes:
- Refactored mapping logic in all 4 affected files to use:
let fileInputField = mapExtToInputField(fileExtension)
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype)
}
- Added comprehensive regression tests to verify PDF, Word, and Excel
file mappings work correctly with the new logic
- Tests ensure .pdf files correctly map to 'pdfFile' loader via MIME
type fallback
The simplified logic is more maintainable and self-documenting:
1. Try file extension mapping first
2. If unrecognized (returns 'txtFile'), fall back to MIME type
Co-authored-by: Gemini Code Assist <code-assist@google.com>
✅ Updates AppliedThank you @gemini Code Assist for the review suggestion! I've refactored the code to use the cleaner linear fallback pattern. 🔄 Changes Made1. Refactored mapping logic (all 4 files): // Before (nested conditionals - prone to copy-paste errors):
let fileInputField = 'txtFile'
if (fileInputFieldFromExt !== 'txtFile') {
fileInputField = fileInputFieldFromExt
} else if (fileInputFieldFromMimeType !== 'txtFile') {
fileInputField = fileInputFieldFromMimeType
}
// After (clear linear fallback):
let fileInputField = mapExtToInputField(fileExtension)
if (fileInputField === 'txtFile') {
fileInputField = mapMimeTypeToInputField(file.mimetype)
}This makes the intent crystal clear:
🧪 Regression Tests AddedCreated comprehensive test suite at
📊 Test CoverageThe new tests specifically cover the bug scenario:
Ready for maintainer review and workflow approval! 🚀 |

🐛 Bug Fix
Fixes the "Unable to upload documents" error when uploading PDF files to Agentflows with full file upload enabled.
🔍 Root Cause
Copy-paste bug in file type mapping logic where
fileInputFieldFromExtwas incorrectly used instead offileInputFieldFromMimeTypein the fallback condition. This caused PDF files to be processed withTextLoaderinstead ofPDFLoader.Affected code pattern:
🔧 Changes
Fixed file input field assignment in 4 files:
packages/server/src/utils/createAttachment.ts- Agent file attachmentspackages/server/src/utils/upsertVector.ts- Vector store uploadspackages/server/src/utils/buildChatflow.ts- Chatflow file inputspackages/server/src/services/documentstore/index.ts- Document store uploads📝 Why PDFs Failed but Word/Excel Worked
.docxand.xlsxextensions are properly mapped inmapExtToInputField().pdfextension was not mapped, so it fell to the MIME type fallbacktxtFileloader instead ofpdfFileloader✅ Testing
Tested with:
Teaching Large Language Models an Unseen Language on the Fly.pdfExpected behavior: PDF files should now be correctly processed using PDFLoader with proper configuration (including
pdfUsageandlegacyBuildsettings).📚 Related