Skip to content

feat: add ImgEdit benchmark with edit type subsets#517

Merged
davidberenstein1957 merged 16 commits intofeat/add-partiprompts-benchmark-to-prunafrom
feat/add-imgedit-benchmark
Feb 27, 2026
Merged

feat: add ImgEdit benchmark with edit type subsets#517
davidberenstein1957 merged 16 commits intofeat/add-partiprompts-benchmark-to-prunafrom
feat/add-imgedit-benchmark

Conversation

@davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Jan 31, 2026

Closes #510

Summary

  • Add ImgEdit benchmark for image editing evaluation with 8 edit type subsets
  • Fetch instructions and judge prompts from GitHub (PKU-YuanGroup/ImgEdit)
  • Support subset filtering: replace, add, remove, adjust, extract, style, background, compose

Usage

from pruna.data import PrunaDataModule

# Load all edit types
dm = PrunaDataModule.from_string("ImgEdit")

# Load specific edit type
dm = PrunaDataModule.from_string("ImgEdit", subset="replace")

Test plan

  • PrunaDataModule.from_string("ImgEdit") works
  • Subset filter works for all 8 edit types
  • Auxiliaries include subset, image_id, judge_prompt fields
  • Docstring tests pass

Closes #510

- Add setup_imgedit_dataset in datasets/prompt.py
- Support subset filter (replace, add, remove, adjust, extract, style, background, compose)
- Fetch instructions and judge prompts from GitHub (PKU-YuanGroup/ImgEdit)
- Register ImgEdit in base_datasets
- Add BenchmarkInfo entry with accuracy metric, task_type image_edit
- Add test for loading with subset filter

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

davidberenstein1957 and others added 3 commits January 31, 2026 17:19
…nting

- Rename subset parameter to category in setup_imgedit_dataset
- Add empty dataset guard before ds.select([0])
- Fix trailing newlines linting issue
- Update tests to use category parameter

Co-authored-by: Cursor <cursoragent@cursor.com>
Prevents crash when category filter produces empty dataset.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link

This PR has been inactive for 10 days and is now marked as stale.

…hmark pattern

- Resolve conflicts in prompt.py and test_datamodule.py
- Refactor setup_imgedit_dataset: ImgEditCategory, fraction, test_sample_size, _prepare_test_only_prompt_dataset
- Add ImgEdit to BENCHMARK_CATEGORY_CONFIG

Made-with: Cursor
@davidberenstein1957 davidberenstein1957 changed the base branch from main to feat/add-partiprompts-benchmark-to-pruna February 27, 2026 09:08
name="imgedit",
display_name="ImgEdit",
description="Image editing benchmark with 8 edit types for evaluating editing capabilities.",
metrics=["accuracy"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be img_edit_score but is not implemented in pruna

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@davidberenstein1957 davidberenstein1957 force-pushed the feat/add-partiprompts-benchmark-to-pruna branch from 3be835c to 7ebb4cd Compare February 27, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add ImgEdit benchmark with edit type subsets

1 participant