test: add unit and integration tests for stdlib components (#817) by planetf1 · Pull Request #830 · generative-computing/mellea

planetf1 · 2026-04-13T12:13:13Z

test: add unit and integration tests for stdlib components (#817)

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes test: unit tests for stdlib components (react, unit_test_eval) #817

Adds 32 tests across 3 files for stdlib/components/react, stdlib/frameworks/react, and stdlib/components/unit_test_eval. These are part of the broader #726 testing epic to boost unit and integration coverage for pure logic and reduce reliance on e2e tests.

Bug fix: `from_json_file` list misalignment

Review feedback identified a bug in TestBasedEval.from_json_file() where examples without user messages caused inputs, targets, and input_ids to fall out of sync. The eval runner (cli/eval/runner.py) does positional lookups across these lists, so misalignment would silently score predictions against the wrong reference targets. Fixed with an early continue to skip examples that have no user messages, with a new test to cover the scenario.

Coverage improvement

All three modules went from 0% to near-full coverage without any e2e infrastructure:

Module	Before	After
`mellea/stdlib/frameworks/react.py`	0%	100%
`mellea/stdlib/components/react.py`	0%	97% (1 uncovered line: warning log path)
`mellea/stdlib/components/unit_test_eval.py`	0%	98% (2 uncovered lines)

Approach: ScriptedBackend for orchestrator testing

The react() framework orchestrator is tricky to test — it coordinates a loop of aact() → _call_tools() → check final_answer. Previously the only options were mock-heavy unit tests (patching mfuncs.aact + mfuncs._call_tools, which mirrors internal call structure and breaks on any refactor) or full e2e tests against a real LLM backend (slow, non-deterministic, needs Ollama/API keys).

Instead, these tests use a lightweight ScriptedBackend fake that subclasses Backend and overrides _generate_from_context() to return scripted responses. The key insight is that real aact(), real _call_tools(), real MelleaTool functions all run — only LLM inference is faked. This means the tests exercise the actual pipeline (context management, plugin hooks, tool execution) and are robust to internal refactors of react().

These are marked @pytest.mark.integration (not unit) because they wire multiple real components together and mock only at the outermost boundary.

Test breakdown

File	Tests	Level	What's covered
`test/stdlib/components/test_react.py`	10	Unit	`ReactInitiator` format/parts/parse, `ReactThought` format/parse, finalizer override edge case
`test/stdlib/components/test_unit_test_eval.py`	16	Unit	Pydantic validation, `format_for_llm`, `set_judge_context` branching (0/1/N targets), `from_json_file` with edge cases (multi-user messages, mixed user messages, non-assistant filtering)
`test/stdlib/frameworks/test_react.py`	7	Integration	Loop termination, budget exhaustion, non-final tool continuation, model_options tool merge, format branch, multi-tool rejection, context type guard

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

…e-computing#817) 32 tests across 3 files covering react components, react framework orchestrator, and unit_test_eval. Framework tests use ScriptedBackend fake to exercise real aact/call_tools paths without LLM inference.

github-actions · 2026-04-13T12:13:31Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

pytest's default import mode treats both test/stdlib/components/test_react.py and test/stdlib/frameworks/test_react.py as the same module 'test_react', causing a collection error in CI. Rename the framework-level file to avoid the collision.

ajbozarth

I didn't have bandwidth to do a deep review, but I did run the tests and have Claude review, which gave a couple inline comments:

ran uv run pytest test/stdlib/components/ test/stdlib/frameworks/:

100 passed, 1 skipped, 1 xpassed, 15 warnings in 159.23s (0:02:39)

All new tests passed

test/stdlib/components/test_unit_test_eval.py

test/stdlib/frameworks/test_react_framework.py

…e-computing#817) Address review feedback on PR generative-computing#830. Fix a bug where examples with no user messages caused inputs/targets/input_ids to fall out of alignment, extend tests to cover the misalignment scenario, and move a local import to module level.

ajbozarth

Changes LGTM and Claude approve so I'm going to give an approve on this since in general more tests are better than none,

github-actions bot added the testing label Apr 13, 2026

planetf1 marked this pull request as ready for review April 13, 2026 12:15

planetf1 requested a review from a team as a code owner April 13, 2026 12:15

planetf1 requested review from AngeloDanducci and ajbozarth April 13, 2026 12:15

ajbozarth reviewed Apr 13, 2026

View reviewed changes

test/stdlib/components/test_unit_test_eval.py Show resolved Hide resolved

test/stdlib/frameworks/test_react_framework.py Outdated Show resolved Hide resolved

planetf1 enabled auto-merge April 14, 2026 09:54

ajbozarth approved these changes Apr 14, 2026

View reviewed changes

planetf1 added this pull request to the merge queue Apr 14, 2026

Merged via the queue into generative-computing:main with commit 75062f4 Apr 14, 2026
8 checks passed

planetf1 deleted the test/stdlib-components-817 branch April 14, 2026 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add unit and integration tests for stdlib components (#817)#830

test: add unit and integration tests for stdlib components (#817)#830
planetf1 merged 3 commits intogenerative-computing:mainfrom
planetf1:test/stdlib-components-817

planetf1 commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

test: add unit and integration tests for stdlib components (#817)

Type of PR

Description

Bug fix: from_json_file list misalignment

Coverage improvement

Approach: ScriptedBackend for orchestrator testing

Test breakdown

Testing

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented Apr 13, 2026 •

edited

Loading

Bug fix: `from_json_file` list misalignment