Skip to content

Add Response Model Support for Model Adapters#49

Merged
cemde merged 7 commits intomainfrom
feature/add-instructor-library
Mar 22, 2026
Merged

Add Response Model Support for Model Adapters#49
cemde merged 7 commits intomainfrom
feature/add-instructor-library

Conversation

@cemde
Copy link
Collaborator

@cemde cemde commented Mar 22, 2026

Description

MASEval's simulators (tool, user, agentic user) relied on hand-rolled JSON extraction
and manual retry loops to coerce LLM outputs into structured formats. This was fragile:
Each simulator reimplemented the same parse-validate-retry logic, and failures from
malformed JSON were a recurring source of flaky benchmark runs.

This PR introduces the instructor library as a
core dependency and threads its response_model support through the entire model adapter
stack. Key changes:

  • ModelAdapter.chat() gains a response_model parameter. Pass any Pydantic
    BaseModel class and get a validated instance back in
    ChatResponse.structured_response. All four provider adapters (OpenAI, Anthropic,
    Google GenAI, LiteLLM) implement _structured_chat() using instructor-patched clients.

  • Simulators are simplified. LLMSimulator.__call__ now delegates structured output
    handling to instructor via the adapter's response_model support, replacing ~150 lines
    of manual JSON parsing and retry logic with a single chat() call. Each simulator
    declares its expected schema as a class-level _response_model attribute.

  • New maseval.core.instructor module provides create_instructor_client() for
    wrapping provider SDK clients, and flatten_model_schema() for generating
    provider-compatible JSON schemas from Pydantic models (used by Tau2's tool parameter
    generation, replacing its own _flatten_schema()).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

Documentation

  • Added/updated docstrings for new/modified functions as instructed CONTRIBUTING.md
  • Updated relevant documentation in docs/ (if applicable)
  • Tag github issue with this PR (if applicable)

Changelog

  • Added entry to CHANGELOG.md under [Unreleased] section
    • Use Added section for new features
    • Use Changed section for modifications to existing functionality
    • Use Fixed section for bug fixes
    • Use Removed section for deprecated/removed features
  • OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

  • Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
  • Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

@github-actions
Copy link

github-actions bot commented Mar 22, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  maseval
  __init__.py
  maseval/benchmark/tau2
  tau2.py
  maseval/benchmark/tau2/domains
  base.py
  maseval/core
  instructor.py 89
  model.py
  simulator.py 224
  maseval/interface/inference
  anthropic.py 388-390
  google_genai.py 334-336
  litellm.py 220-221
  openai.py 304-306
Project Total  

This report was generated by python-coverage-comment-action

@cemde cemde marked this pull request as ready for review March 22, 2026 21:46
@cemde cemde added enhancement New feature or request interface regarding the `maseval/interface` subpackage. core In regards to the core package `maseval/core` labels Mar 22, 2026
@cemde cemde merged commit ff413c8 into main Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core In regards to the core package `maseval/core` enhancement New feature or request interface regarding the `maseval/interface` subpackage.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant