fix: inject TextFile content as text on non-multimodal models#5962
fix: inject TextFile content as text on non-multimodal models#5962felipebridge wants to merge 1 commit into
Conversation
`_process_message_files()` previously raised a `ValueError` for *any* file attached to a non-vision model, including `TextFile` objects whose content is pure text and requires no visual understanding. Distinguish `TextFile` from genuinely visual inputs (ImageFile, PDFFile, AudioFile, VideoFile): * `TextFile` on a non-multimodal model → call `read_text()` and append the decoded content to the message `content` field, then strip the `files` key. The model never sees the attachment as a binary blob. * Any other file type on a non-multimodal model → keep raising `ValueError` with the original message, directing the user to a vision-capable model. `TextFile` is already exported from `crewai_files`; the import is added alongside the existing `format_multimodal_content` import inside the `try/except ImportError` guard. Fixes crewAIInc#5137
📝 WalkthroughWalkthroughThis PR implements graceful TextFile handling for non-multimodal LLMs. Instead of rejecting all file attachments when a model lacks vision capabilities, BaseLLM now extracts text content from TextFile objects and injects it directly into message text, allowing text-only models to process text files while still rejecting image attachments. A new regression test suite validates the injection behavior and error cases. ChangesTextFile Content Injection for Non-Multimodal Models
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/crewai/tests/llms/test_multimodal.py (1)
390-447: ⚡ Quick winAdd regression coverage for dict-shaped
filespayloads.The new tests only exercise list-based
files. Given the issue repro usesinput_filesas a mapping, add a direct dict-based case to lock this contract.✅ Suggested test
class TestTextFileNonMultimodalInjection: @@ def test_text_file_injected_into_message_content(self) -> None: @@ assert "Summarise this:" in result[0]["content"] + + def test_text_file_dict_payload_injected(self) -> None: + """Dict-based files payload is also injected for non-vision models.""" + llm = self._make_non_vision_llm() + msg: dict = { + "role": "user", + "content": "Summarise this:", + "files": {"file": TextFile(source=b"dict path text")}, + } + + result = llm._process_message_files([msg]) + + assert "files" not in result[0] + assert "dict path text" in result[0]["content"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/llms/test_multimodal.py` around lines 390 - 447, Add a test that covers dict-shaped files payloads for _process_message_files: create a msg where "files" is a mapping (e.g., {"a.txt": TextFile(source=b"dict file")}) and call llm = self._make_non_vision_llm(); result = llm._process_message_files([msg]); assert the text was injected into result[0]["content"], assert "files" key is removed, and for multimodal error cases do a similar mapping with ImageFile to assert ValueError from _process_message_files; this locks the contract for dict-based input alongside the existing list-based tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/llms/base_llm.py`:
- Around line 743-748: The code assumes msg.get("files") is list-like before
checking TextFile, which breaks when callers pass a dict or single file (e.g.,
input_files={"file": TextFile(...)}) causing TextFile checks to fail; update the
logic around the files variable (where files: list[Any] = msg.get("files") is
set and used) to normalize files into a list first — if files is a dict, replace
it with list(files.values()), if it's a single File-like object (not
list/tuple/set), wrap it in [files] — then proceed with the existing text_files
= [f for f in files if isinstance(f, TextFile)] and non_text_files handling.
- Around line 758-761: The current code unconditionally drops msg["files"] even
when msg["content"] is non-string, which discards attached TextFile text; update
the logic around msg, injected and files so you only remove/pop "files" when
you've actually merged their text into the message. Concretely, when
msg.get("content") is a str, keep the existing injection behavior and then pop
"files"; when content is not a str, iterate msg.get("files", []) and for any
TextFile instances extract their text and append/merge it into msg["content"]
(or set a new string content combining existing non-string content and the
extracted text) before popping only those files you consumed; if there are
non-TextFile attachments, leave msg["files"] untouched. Ensure you reference
msg, injected, "files" and the TextFile class when implementing the fix.
---
Nitpick comments:
In `@lib/crewai/tests/llms/test_multimodal.py`:
- Around line 390-447: Add a test that covers dict-shaped files payloads for
_process_message_files: create a msg where "files" is a mapping (e.g., {"a.txt":
TextFile(source=b"dict file")}) and call llm = self._make_non_vision_llm();
result = llm._process_message_files([msg]); assert the text was injected into
result[0]["content"], assert "files" key is removed, and for multimodal error
cases do a similar mapping with ImageFile to assert ValueError from
_process_message_files; this locks the contract for dict-based input alongside
the existing list-based tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 21f8d49f-9200-451b-8b9d-ff57b0697d32
📒 Files selected for processing (2)
lib/crewai/src/crewai/llms/base_llm.pylib/crewai/tests/llms/test_multimodal.py
| files: list[Any] = msg.get("files") or [] | ||
| if not files: | ||
| continue | ||
| text_files = [f for f in files if isinstance(f, TextFile)] | ||
| non_text_files = [f for f in files if not isinstance(f, TextFile)] | ||
| if non_text_files: |
There was a problem hiding this comment.
Normalize files input before TextFile type checks.
At Line 743, this path assumes list-like file objects. If files arrives as a dict (the issue repro uses input_files={"file": TextFile(...)}), iteration happens over keys and TextFile detection fails, causing a false ValueError.
💡 Proposed fix
- files: list[Any] = msg.get("files") or []
- if not files:
+ raw_files = msg.get("files")
+ if not raw_files:
continue
- text_files = [f for f in files if isinstance(f, TextFile)]
- non_text_files = [f for f in files if not isinstance(f, TextFile)]
+ files = (
+ list(raw_files.values())
+ if isinstance(raw_files, dict)
+ else list(raw_files)
+ )
+ text_files = [f for f in files if isinstance(f, TextFile)]
+ non_text_files = [f for f in files if not isinstance(f, TextFile)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/llms/base_llm.py` around lines 743 - 748, The code
assumes msg.get("files") is list-like before checking TextFile, which breaks
when callers pass a dict or single file (e.g., input_files={"file":
TextFile(...)}) causing TextFile checks to fail; update the logic around the
files variable (where files: list[Any] = msg.get("files") is set and used) to
normalize files into a list first — if files is a dict, replace it with
list(files.values()), if it's a single File-like object (not list/tuple/set),
wrap it in [files] — then proceed with the existing text_files = [f for f in
files if isinstance(f, TextFile)] and non_text_files handling.
| existing = msg.get("content", "") | ||
| if isinstance(existing, str): | ||
| msg["content"] = f"{existing}\n\n{injected}".strip() | ||
| msg.pop("files", None) |
There was a problem hiding this comment.
Avoid dropping text attachments when content is non-string.
At Line 759, injection only happens for string content; at Line 761 files is still removed. That silently discards attached TextFile content for non-string message bodies.
💡 Proposed fix
if text_files:
injected = "\n\n".join(f.read_text() for f in text_files)
existing = msg.get("content", "")
- if isinstance(existing, str):
- msg["content"] = f"{existing}\n\n{injected}".strip()
+ if not isinstance(existing, str):
+ raise ValueError(
+ "Non-multimodal models require string message content "
+ "when injecting TextFile attachments."
+ )
+ msg["content"] = f"{existing}\n\n{injected}".strip()
msg.pop("files", None)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| existing = msg.get("content", "") | |
| if isinstance(existing, str): | |
| msg["content"] = f"{existing}\n\n{injected}".strip() | |
| msg.pop("files", None) | |
| existing = msg.get("content", "") | |
| if not isinstance(existing, str): | |
| raise ValueError( | |
| "Non-multimodal models require string message content " | |
| "when injecting TextFile attachments." | |
| ) | |
| msg["content"] = f"{existing}\n\n{injected}".strip() | |
| msg.pop("files", None) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/llms/base_llm.py` around lines 758 - 761, The current
code unconditionally drops msg["files"] even when msg["content"] is non-string,
which discards attached TextFile text; update the logic around msg, injected and
files so you only remove/pop "files" when you've actually merged their text into
the message. Concretely, when msg.get("content") is a str, keep the existing
injection behavior and then pop "files"; when content is not a str, iterate
msg.get("files", []) and for any TextFile instances extract their text and
append/merge it into msg["content"] (or set a new string content combining
existing non-string content and the extracted text) before popping only those
files you consumed; if there are non-TextFile attachments, leave msg["files"]
untouched. Ensure you reference msg, injected, "files" and the TextFile class
when implementing the fix.
Summary
Fixes #5137
_process_message_files()previously raised aValueErrorfor any file attached to a non-vision model, includingTextFileobjects whose content is pure text requiring no visual understanding.Root cause
The guard was a blunt
any(msg.get("files")...)check that did not distinguish between file types.Fix
Separate
TextFileinstances from genuinely visual inputs (ImageFile,PDFFile,AudioFile,VideoFile):TextFileon a non-multimodal model → callread_text()and append the decoded content to the messagecontentfield, then strip thefileskey. The model receives plain text with no binary blobs.ValueErrorwith the original message directing the user to a vision-capable model.TextFileis already exported fromcrewai_files; it is added alongside the existingformat_multimodal_contentimport inside thetry/except ImportErrorguard so the overall feature-flag logic is unchanged.Tests
Five new regression tests in
TestTextFileNonMultimodalInjection:ImageFileon non-multimodal still raisesTextFile+ImageFilestill raises (because of the image)Summary by CodeRabbit