Skip to content

improve multimodal image preprocessing with max_image_pixels auto-resize#1309

Merged
shihaobai merged 2 commits into
mainfrom
wzj_dev
May 14, 2026
Merged

improve multimodal image preprocessing with max_image_pixels auto-resize#1309
shihaobai merged 2 commits into
mainfrom
wzj_dev

Conversation

@hiworldwzj
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a --max_image_pixels parameter to limit the pixel count of input images, automatically resizing them if they exceed the threshold (defaulting to ~4K resolution). The implementation includes updates to the CLI, documentation, and multimodal processing logic. Feedback highlights several critical areas for improvement: a potential infinite loop if the pixel limit is set to zero, a possible ZeroDivisionError when handling zero-pixel inputs, and performance inefficiencies caused by redundant image decoding and multiple thread pool executions. Additionally, the hardcoded JPEG conversion may lead to data loss for transparent images, and the high-quality optimization settings could introduce unnecessary CPU overhead.

Comment on lines +309 to +313
while new_w * new_h > max_image_pixels:
if new_w >= new_h:
new_w = max(1, new_w - 1)
else:
new_h = max(1, new_h - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If max_image_pixels is set to 0, this loop will become infinite because new_w * new_h (which will be 1 * 1) will always be greater than 0, and max(1, new_w - 1) will keep the value at 1. Ensure max_image_pixels is at least 1 before entering this logic.

Comment on lines +165 to +174
src_w, src_h = await loop.run_in_executor(_IMAGE_VERIFY_POOL, _verify_image_bytes, img_data)
# 2) Resize (or no-op) after verification.
img_data, resized_w, resized_h = await loop.run_in_executor(
_IMAGE_VERIFY_POOL,
_resize_image_bytes_if_needed,
img_data,
src_w,
src_h,
max_image_pixels,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation performs image verification and resizing in two separate run_in_executor calls. This is inefficient because _verify_image_bytes already decodes the image (via image.load()), and _resize_image_bytes_if_needed decodes it again (via Image.open()). Additionally, large image bytes are passed between the event loop and the thread pool twice.

Consider combining these operations into a single helper function to avoid redundant decoding and overhead.

if old_pixels <= max_image_pixels:
return src_w, src_h

scale = (max_image_pixels / old_pixels) ** 0.5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential ZeroDivisionError if old_pixels is 0. While _verify_image_bytes should catch invalid images, src_w and src_h can be 0 if provided via the image_size type (lines 148-149).

Suggested change
scale = (max_image_pixels / old_pixels) ** 0.5
if old_pixels <= max_image_pixels or old_pixels == 0:
return src_w, src_h

resized_image = image.resize((new_w, new_h), resampling).convert("RGB")

buffer = BytesIO()
resized_image.save(buffer, format="JPEG", quality=96, optimize=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding format="JPEG" and quality=96 with optimize=True might be suboptimal.

  1. If the input was a PNG with transparency, convert("RGB") will result in a black background, and the alpha channel will be lost.
  2. optimize=True can be CPU-intensive for a real-time server.
  3. quality=96 is very high; 90 is usually sufficient for VLM tasks and results in smaller payloads.

@shihaobai shihaobai merged commit 45e8cca into main May 14, 2026
1 check passed
@shihaobai shihaobai deleted the wzj_dev branch May 14, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants