Skip to content

How should this method be optimized to target multiple test datasets, including GSM8K and MMLU? #3

@anaivebird

Description

@anaivebird

Is this method only suitable for fine‑tuning on GSM8K and not for optimizing across all metrics? We’ve found that while GSM8K performance remains strong, other metrics—like MMLU—drop off sharply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions