Skip to content

fix: inject TextFile content as text on non-multimodal models#5962

Open
felipebridge wants to merge 1 commit into
crewAIInc:mainfrom
felipebridge:fix/text-file-injection-non-multimodal
Open

fix: inject TextFile content as text on non-multimodal models#5962
felipebridge wants to merge 1 commit into
crewAIInc:mainfrom
felipebridge:fix/text-file-injection-non-multimodal

Conversation

@felipebridge
Copy link
Copy Markdown

@felipebridge felipebridge commented May 28, 2026

Summary

Fixes #5137

_process_message_files() previously raised a ValueError for any file attached to a non-vision model, including TextFile objects whose content is pure text requiring no visual understanding.

Root cause

The guard was a blunt any(msg.get("files")...) check that did not distinguish between file types.

Fix

Separate TextFile instances from genuinely visual inputs (ImageFile, PDFFile, AudioFile, VideoFile):

  • TextFile on a non-multimodal model → call read_text() and append the decoded content to the message content field, then strip the files key. The model receives plain text with no binary blobs.
  • Any other file type on a non-multimodal model → keep raising ValueError with the original message directing the user to a vision-capable model.

TextFile is already exported from crewai_files; it is added alongside the existing format_multimodal_content import inside the try/except ImportError guard so the overall feature-flag logic is unchanged.

Tests

Five new regression tests in TestTextFileNonMultimodalInjection:

  • text file content injected into message body
  • multiple text files all injected
  • ImageFile on non-multimodal still raises
  • mixed TextFile + ImageFile still raises (because of the image)
  • messages without files pass through unmodified

Summary by CodeRabbit

  • Bug Fixes
    • Text files can now be used with text-only language models—their contents are automatically extracted and integrated into messages
    • Image files and other non-text attachments continue to be properly rejected for non-multimodal models
    • Mixed text and image file attachments are now correctly handled

Review Change Stack

`_process_message_files()` previously raised a `ValueError` for *any*
file attached to a non-vision model, including `TextFile` objects whose
content is pure text and requires no visual understanding.

Distinguish `TextFile` from genuinely visual inputs (ImageFile, PDFFile,
AudioFile, VideoFile):

* `TextFile` on a non-multimodal model → call `read_text()` and append
  the decoded content to the message `content` field, then strip the
  `files` key.  The model never sees the attachment as a binary blob.
* Any other file type on a non-multimodal model → keep raising
  `ValueError` with the original message, directing the user to a
  vision-capable model.

`TextFile` is already exported from `crewai_files`; the import is added
alongside the existing `format_multimodal_content` import inside the
`try/except ImportError` guard.

Fixes crewAIInc#5137
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR implements graceful TextFile handling for non-multimodal LLMs. Instead of rejecting all file attachments when a model lacks vision capabilities, BaseLLM now extracts text content from TextFile objects and injects it directly into message text, allowing text-only models to process text files while still rejecting image attachments. A new regression test suite validates the injection behavior and error cases.

Changes

TextFile Content Injection for Non-Multimodal Models

Layer / File(s) Summary
TextFile import and non-multimodal content injection
lib/crewai/src/crewai/llms/base_llm.py
Extended conditional crewai_files import to include TextFile. Refactored _process_message_files non-multimodal path from a blanket file rejection to per-message iteration: TextFile content is read and appended to message content, the files field is removed, and non-text attachments still raise ValueError.
Regression tests for TextFile non-multimodal handling
lib/crewai/tests/llms/test_multimodal.py
Added TestTextFileNonMultimodalInjection test class covering non-multimodal model scenarios (gpt-3.5-turbo, litellm): validates TextFile injection into message content, verifies ImageFile rejection, confirms mixed-file error, and ensures pass-through of messages without files.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A TextFile walks through modeless door,
Its words injected, content soars!
No vision needed, just plain text,
The crew moves forward, all are blessed. 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: injecting TextFile content as text for non-multimodal models, which directly addresses the linked issue.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #5137 requirements: TextFile content is injected as text for non-multimodal models, other file types still raise ValueError, and regression tests verify the behavior.
Out of Scope Changes check ✅ Passed All changes are scoped to fix TextFile handling on non-multimodal models and add corresponding regression tests, with no unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/crewai/tests/llms/test_multimodal.py (1)

390-447: ⚡ Quick win

Add regression coverage for dict-shaped files payloads.

The new tests only exercise list-based files. Given the issue repro uses input_files as a mapping, add a direct dict-based case to lock this contract.

✅ Suggested test
 class TestTextFileNonMultimodalInjection:
@@
     def test_text_file_injected_into_message_content(self) -> None:
@@
         assert "Summarise this:" in result[0]["content"]
+
+    def test_text_file_dict_payload_injected(self) -> None:
+        """Dict-based files payload is also injected for non-vision models."""
+        llm = self._make_non_vision_llm()
+        msg: dict = {
+            "role": "user",
+            "content": "Summarise this:",
+            "files": {"file": TextFile(source=b"dict path text")},
+        }
+
+        result = llm._process_message_files([msg])
+
+        assert "files" not in result[0]
+        assert "dict path text" in result[0]["content"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/llms/test_multimodal.py` around lines 390 - 447, Add a test
that covers dict-shaped files payloads for _process_message_files: create a msg
where "files" is a mapping (e.g., {"a.txt": TextFile(source=b"dict file")}) and
call llm = self._make_non_vision_llm(); result =
llm._process_message_files([msg]); assert the text was injected into
result[0]["content"], assert "files" key is removed, and for multimodal error
cases do a similar mapping with ImageFile to assert ValueError from
_process_message_files; this locks the contract for dict-based input alongside
the existing list-based tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/llms/base_llm.py`:
- Around line 743-748: The code assumes msg.get("files") is list-like before
checking TextFile, which breaks when callers pass a dict or single file (e.g.,
input_files={"file": TextFile(...)}) causing TextFile checks to fail; update the
logic around the files variable (where files: list[Any] = msg.get("files") is
set and used) to normalize files into a list first — if files is a dict, replace
it with list(files.values()), if it's a single File-like object (not
list/tuple/set), wrap it in [files] — then proceed with the existing text_files
= [f for f in files if isinstance(f, TextFile)] and non_text_files handling.
- Around line 758-761: The current code unconditionally drops msg["files"] even
when msg["content"] is non-string, which discards attached TextFile text; update
the logic around msg, injected and files so you only remove/pop "files" when
you've actually merged their text into the message. Concretely, when
msg.get("content") is a str, keep the existing injection behavior and then pop
"files"; when content is not a str, iterate msg.get("files", []) and for any
TextFile instances extract their text and append/merge it into msg["content"]
(or set a new string content combining existing non-string content and the
extracted text) before popping only those files you consumed; if there are
non-TextFile attachments, leave msg["files"] untouched. Ensure you reference
msg, injected, "files" and the TextFile class when implementing the fix.

---

Nitpick comments:
In `@lib/crewai/tests/llms/test_multimodal.py`:
- Around line 390-447: Add a test that covers dict-shaped files payloads for
_process_message_files: create a msg where "files" is a mapping (e.g., {"a.txt":
TextFile(source=b"dict file")}) and call llm = self._make_non_vision_llm();
result = llm._process_message_files([msg]); assert the text was injected into
result[0]["content"], assert "files" key is removed, and for multimodal error
cases do a similar mapping with ImageFile to assert ValueError from
_process_message_files; this locks the contract for dict-based input alongside
the existing list-based tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 21f8d49f-9200-451b-8b9d-ff57b0697d32

📥 Commits

Reviewing files that changed from the base of the PR and between 2148c7e and 6244523.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/llms/base_llm.py
  • lib/crewai/tests/llms/test_multimodal.py

Comment on lines +743 to +748
files: list[Any] = msg.get("files") or []
if not files:
continue
text_files = [f for f in files if isinstance(f, TextFile)]
non_text_files = [f for f in files if not isinstance(f, TextFile)]
if non_text_files:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize files input before TextFile type checks.

At Line 743, this path assumes list-like file objects. If files arrives as a dict (the issue repro uses input_files={"file": TextFile(...)}), iteration happens over keys and TextFile detection fails, causing a false ValueError.

💡 Proposed fix
-                files: list[Any] = msg.get("files") or []
-                if not files:
+                raw_files = msg.get("files")
+                if not raw_files:
                     continue
-                text_files = [f for f in files if isinstance(f, TextFile)]
-                non_text_files = [f for f in files if not isinstance(f, TextFile)]
+                files = (
+                    list(raw_files.values())
+                    if isinstance(raw_files, dict)
+                    else list(raw_files)
+                )
+                text_files = [f for f in files if isinstance(f, TextFile)]
+                non_text_files = [f for f in files if not isinstance(f, TextFile)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/llms/base_llm.py` around lines 743 - 748, The code
assumes msg.get("files") is list-like before checking TextFile, which breaks
when callers pass a dict or single file (e.g., input_files={"file":
TextFile(...)}) causing TextFile checks to fail; update the logic around the
files variable (where files: list[Any] = msg.get("files") is set and used) to
normalize files into a list first — if files is a dict, replace it with
list(files.values()), if it's a single File-like object (not list/tuple/set),
wrap it in [files] — then proceed with the existing text_files = [f for f in
files if isinstance(f, TextFile)] and non_text_files handling.

Comment on lines +758 to +761
existing = msg.get("content", "")
if isinstance(existing, str):
msg["content"] = f"{existing}\n\n{injected}".strip()
msg.pop("files", None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid dropping text attachments when content is non-string.

At Line 759, injection only happens for string content; at Line 761 files is still removed. That silently discards attached TextFile content for non-string message bodies.

💡 Proposed fix
                 if text_files:
                     injected = "\n\n".join(f.read_text() for f in text_files)
                     existing = msg.get("content", "")
-                    if isinstance(existing, str):
-                        msg["content"] = f"{existing}\n\n{injected}".strip()
+                    if not isinstance(existing, str):
+                        raise ValueError(
+                            "Non-multimodal models require string message content "
+                            "when injecting TextFile attachments."
+                        )
+                    msg["content"] = f"{existing}\n\n{injected}".strip()
                     msg.pop("files", None)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
existing = msg.get("content", "")
if isinstance(existing, str):
msg["content"] = f"{existing}\n\n{injected}".strip()
msg.pop("files", None)
existing = msg.get("content", "")
if not isinstance(existing, str):
raise ValueError(
"Non-multimodal models require string message content "
"when injecting TextFile attachments."
)
msg["content"] = f"{existing}\n\n{injected}".strip()
msg.pop("files", None)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/llms/base_llm.py` around lines 758 - 761, The current
code unconditionally drops msg["files"] even when msg["content"] is non-string,
which discards attached TextFile text; update the logic around msg, injected and
files so you only remove/pop "files" when you've actually merged their text into
the message. Concretely, when msg.get("content") is a str, keep the existing
injection behavior and then pop "files"; when content is not a str, iterate
msg.get("files", []) and for any TextFile instances extract their text and
append/merge it into msg["content"] (or set a new string content combining
existing non-string content and the extracted text) before popping only those
files you consumed; if there are non-TextFile attachments, leave msg["files"]
untouched. Ensure you reference msg, injected, "files" and the TextFile class
when implementing the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] / [HELP] "Model does not support multimodal input [...] Use a vision-capable model" for TextFile input

1 participant