Add Response Model Support for Model Adapters#49
Merged
Conversation
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
MASEval's simulators (tool, user, agentic user) relied on hand-rolled JSON extraction
and manual retry loops to coerce LLM outputs into structured formats. This was fragile:
Each simulator reimplemented the same parse-validate-retry logic, and failures from
malformed JSON were a recurring source of flaky benchmark runs.
This PR introduces the instructor library as a
core dependency and threads its
response_modelsupport through the entire model adapterstack. Key changes:
ModelAdapter.chat()gains aresponse_modelparameter. Pass any PydanticBaseModelclass and get a validated instance back inChatResponse.structured_response. All four provider adapters (OpenAI, Anthropic,Google GenAI, LiteLLM) implement
_structured_chat()using instructor-patched clients.Simulators are simplified.
LLMSimulator.__call__now delegates structured outputhandling to instructor via the adapter's
response_modelsupport, replacing ~150 linesof manual JSON parsing and retry logic with a single
chat()call. Each simulatordeclares its expected schema as a class-level
_response_modelattribute.New
maseval.core.instructormodule providescreate_instructor_client()forwrapping provider SDK clients, and
flatten_model_schema()for generatingprovider-compatible JSON schemas from Pydantic models (used by Tau2's tool parameter
generation, replacing its own
_flatten_schema()).Type of Change
Checklist
Contribution
Documentation
docs/(if applicable)Changelog
CHANGELOG.mdunder[Unreleased]sectionExample:
- Support for multi-agent tracing (PR:#123)Architecture (if applicable)
maseval/core/do NOT import frommaseval/interface/Additional Notes