Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a --max_image_pixels parameter to limit the pixel count of input images, automatically resizing them if they exceed the threshold (defaulting to ~4K resolution). The implementation includes updates to the CLI, documentation, and multimodal processing logic. Feedback highlights several critical areas for improvement: a potential infinite loop if the pixel limit is set to zero, a possible ZeroDivisionError when handling zero-pixel inputs, and performance inefficiencies caused by redundant image decoding and multiple thread pool executions. Additionally, the hardcoded JPEG conversion may lead to data loss for transparent images, and the high-quality optimization settings could introduce unnecessary CPU overhead.
| while new_w * new_h > max_image_pixels: | ||
| if new_w >= new_h: | ||
| new_w = max(1, new_w - 1) | ||
| else: | ||
| new_h = max(1, new_h - 1) |
| src_w, src_h = await loop.run_in_executor(_IMAGE_VERIFY_POOL, _verify_image_bytes, img_data) | ||
| # 2) Resize (or no-op) after verification. | ||
| img_data, resized_w, resized_h = await loop.run_in_executor( | ||
| _IMAGE_VERIFY_POOL, | ||
| _resize_image_bytes_if_needed, | ||
| img_data, | ||
| src_w, | ||
| src_h, | ||
| max_image_pixels, | ||
| ) |
There was a problem hiding this comment.
The current implementation performs image verification and resizing in two separate run_in_executor calls. This is inefficient because _verify_image_bytes already decodes the image (via image.load()), and _resize_image_bytes_if_needed decodes it again (via Image.open()). Additionally, large image bytes are passed between the event loop and the thread pool twice.
Consider combining these operations into a single helper function to avoid redundant decoding and overhead.
| if old_pixels <= max_image_pixels: | ||
| return src_w, src_h | ||
|
|
||
| scale = (max_image_pixels / old_pixels) ** 0.5 |
There was a problem hiding this comment.
Potential ZeroDivisionError if old_pixels is 0. While _verify_image_bytes should catch invalid images, src_w and src_h can be 0 if provided via the image_size type (lines 148-149).
| scale = (max_image_pixels / old_pixels) ** 0.5 | |
| if old_pixels <= max_image_pixels or old_pixels == 0: | |
| return src_w, src_h |
| resized_image = image.resize((new_w, new_h), resampling).convert("RGB") | ||
|
|
||
| buffer = BytesIO() | ||
| resized_image.save(buffer, format="JPEG", quality=96, optimize=True) |
There was a problem hiding this comment.
Hardcoding format="JPEG" and quality=96 with optimize=True might be suboptimal.
- If the input was a PNG with transparency,
convert("RGB")will result in a black background, and the alpha channel will be lost. optimize=Truecan be CPU-intensive for a real-time server.quality=96is very high;90is usually sufficient for VLM tasks and results in smaller payloads.
No description provided.