-
Notifications
You must be signed in to change notification settings - Fork 79
feat: add DPG (Descriptive Prompt Generation) benchmark #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…mpts benchmark - Introduced `from_benchmark` method in `PrunaDataModule` to create instances from benchmark classes. - Added `Benchmark`, `BenchmarkEntry`, and `BenchmarkRegistry` classes for managing benchmarks. - Implemented `PartiPrompts` benchmark for text-to-image generation with various categories and challenges. - Created utility function `benchmark_to_datasets` to convert benchmarks into datasets compatible with `PrunaDataModule`. - Added integration tests for benchmark functionality and data module interactions.
…filtering - Remove heavy benchmark abstraction (Benchmark class, registry, adapter, 24 subclasses) - Extend setup_parti_prompts_dataset with category and num_samples params - Add BenchmarkInfo dataclass for metadata (metrics, description, subsets) - Switch PartiPrompts to prompt_with_auxiliaries_collate to preserve Category/Challenge - Merge tests into test_datamodule.py Reduces 964 lines to 128 lines (87% reduction) Co-authored-by: Cursor <cursoragent@cursor.com>
Document all dataclass fields per Numpydoc PR01 with summary on new line per GL01. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add list_benchmarks() to filter benchmarks by task type - Add get_benchmark_info() to retrieve benchmark metadata - Add COCO, ImageNet, WikiText to benchmark_info registry Co-authored-by: Cursor <cursoragent@cursor.com>
Update benchmark metrics to match registered names: - clip -> clip_score - clip_iqa -> clipiqa - Remove unimplemented top5_accuracy Co-authored-by: Cursor <cursoragent@cursor.com>
- Add setup_oneig_text_rendering_dataset in datasets/prompt.py - Register OneIGTextRendering in base_datasets - Add BenchmarkInfo entry with clip_score, clipiqa metrics - Auxiliaries include text_content for OCR evaluation - Add test for loading and auxiliaries Co-authored-by: Cursor <cursoragent@cursor.com>
- Add setup_oneig_alignment_dataset in datasets/prompt.py - Support category filter (Anime_Stylization, Portrait, General_Object) - Register OneIGAlignment in base_datasets - Add BenchmarkInfo entry with accuracy metric, task_type text_generation - Auxiliaries include questions, dependencies, category - Add test for loading with category filter Co-authored-by: Cursor <cursoragent@cursor.com>
- Add setup_dpg_dataset in datasets/prompt.py - Support category filter (entity, attribute, relation, global, other) - Register DPG in base_datasets - Add BenchmarkInfo entry with accuracy metric, task_type text_generation - Auxiliaries include questions, category_broad - Add test for loading with category filter Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
- Fix task_type from text_generation to text_to_image for DPG, OneIGAlignment, and OneIGTextRendering - Remove unused imports in test file Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR has been inactive for 10 days and is now marked as stale. |
Closes #516
Summary
Usage
Test plan
PrunaDataModule.from_string("DPG")works