test: add unit and integration tests for stdlib components (#817)#830
Merged
planetf1 merged 3 commits intogenerative-computing:mainfrom Apr 14, 2026
Merged
Conversation
…e-computing#817) 32 tests across 3 files covering react components, react framework orchestrator, and unit_test_eval. Framework tests use ScriptedBackend fake to exercise real aact/call_tools paths without LLM inference.
Contributor
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
pytest's default import mode treats both test/stdlib/components/test_react.py and test/stdlib/frameworks/test_react.py as the same module 'test_react', causing a collection error in CI. Rename the framework-level file to avoid the collision.
ajbozarth
reviewed
Apr 13, 2026
Contributor
ajbozarth
left a comment
There was a problem hiding this comment.
I didn't have bandwidth to do a deep review, but I did run the tests and have Claude review, which gave a couple inline comments:
ran uv run pytest test/stdlib/components/ test/stdlib/frameworks/:
100 passed, 1 skipped, 1 xpassed, 15 warnings in 159.23s (0:02:39)All new tests passed
…e-computing#817) Address review feedback on PR generative-computing#830. Fix a bug where examples with no user messages caused inputs/targets/input_ids to fall out of alignment, extend tests to cover the misalignment scenario, and move a local import to module level.
ajbozarth
approved these changes
Apr 14, 2026
Contributor
ajbozarth
left a comment
There was a problem hiding this comment.
Changes LGTM and Claude approve so I'm going to give an approve on this since in general more tests are better than none,
Merged
via the queue into
generative-computing:main
with commit Apr 14, 2026
75062f4
8 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
test: add unit and integration tests for stdlib components (#817)
Type of PR
Description
Adds 32 tests across 3 files for
stdlib/components/react,stdlib/frameworks/react, andstdlib/components/unit_test_eval. These are part of the broader #726 testing epic to boost unit and integration coverage for pure logic and reduce reliance on e2e tests.Bug fix:
from_json_filelist misalignmentReview feedback identified a bug in
TestBasedEval.from_json_file()where examples without user messages causedinputs,targets, andinput_idsto fall out of sync. The eval runner (cli/eval/runner.py) does positional lookups across these lists, so misalignment would silently score predictions against the wrong reference targets. Fixed with an earlycontinueto skip examples that have no user messages, with a new test to cover the scenario.Coverage improvement
All three modules went from 0% to near-full coverage without any e2e infrastructure:
mellea/stdlib/frameworks/react.pymellea/stdlib/components/react.pymellea/stdlib/components/unit_test_eval.pyApproach: ScriptedBackend for orchestrator testing
The
react()framework orchestrator is tricky to test — it coordinates a loop ofaact()→_call_tools()→ checkfinal_answer. Previously the only options were mock-heavy unit tests (patchingmfuncs.aact+mfuncs._call_tools, which mirrors internal call structure and breaks on any refactor) or full e2e tests against a real LLM backend (slow, non-deterministic, needs Ollama/API keys).Instead, these tests use a lightweight
ScriptedBackendfake that subclassesBackendand overrides_generate_from_context()to return scripted responses. The key insight is that realaact(), real_call_tools(), realMelleaToolfunctions all run — only LLM inference is faked. This means the tests exercise the actual pipeline (context management, plugin hooks, tool execution) and are robust to internal refactors ofreact().These are marked
@pytest.mark.integration(not unit) because they wire multiple real components together and mock only at the outermost boundary.Test breakdown
test/stdlib/components/test_react.pyReactInitiatorformat/parts/parse,ReactThoughtformat/parse, finalizer override edge casetest/stdlib/components/test_unit_test_eval.pyformat_for_llm,set_judge_contextbranching (0/1/N targets),from_json_filewith edge cases (multi-user messages, mixed user messages, non-assistant filtering)test/stdlib/frameworks/test_react.pyTesting