Skip to content

test: add unit and integration tests for stdlib components (#817)#830

Merged
planetf1 merged 3 commits intogenerative-computing:mainfrom
planetf1:test/stdlib-components-817
Apr 14, 2026
Merged

test: add unit and integration tests for stdlib components (#817)#830
planetf1 merged 3 commits intogenerative-computing:mainfrom
planetf1:test/stdlib-components-817

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Apr 13, 2026

test: add unit and integration tests for stdlib components (#817)

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Adds 32 tests across 3 files for stdlib/components/react, stdlib/frameworks/react, and stdlib/components/unit_test_eval. These are part of the broader #726 testing epic to boost unit and integration coverage for pure logic and reduce reliance on e2e tests.

Bug fix: from_json_file list misalignment

Review feedback identified a bug in TestBasedEval.from_json_file() where examples without user messages caused inputs, targets, and input_ids to fall out of sync. The eval runner (cli/eval/runner.py) does positional lookups across these lists, so misalignment would silently score predictions against the wrong reference targets. Fixed with an early continue to skip examples that have no user messages, with a new test to cover the scenario.

Coverage improvement

All three modules went from 0% to near-full coverage without any e2e infrastructure:

Module Before After
mellea/stdlib/frameworks/react.py 0% 100%
mellea/stdlib/components/react.py 0% 97% (1 uncovered line: warning log path)
mellea/stdlib/components/unit_test_eval.py 0% 98% (2 uncovered lines)

Approach: ScriptedBackend for orchestrator testing

The react() framework orchestrator is tricky to test — it coordinates a loop of aact()_call_tools() → check final_answer. Previously the only options were mock-heavy unit tests (patching mfuncs.aact + mfuncs._call_tools, which mirrors internal call structure and breaks on any refactor) or full e2e tests against a real LLM backend (slow, non-deterministic, needs Ollama/API keys).

Instead, these tests use a lightweight ScriptedBackend fake that subclasses Backend and overrides _generate_from_context() to return scripted responses. The key insight is that real aact(), real _call_tools(), real MelleaTool functions all run — only LLM inference is faked. This means the tests exercise the actual pipeline (context management, plugin hooks, tool execution) and are robust to internal refactors of react().

These are marked @pytest.mark.integration (not unit) because they wire multiple real components together and mock only at the outermost boundary.

Test breakdown

File Tests Level What's covered
test/stdlib/components/test_react.py 10 Unit ReactInitiator format/parts/parse, ReactThought format/parse, finalizer override edge case
test/stdlib/components/test_unit_test_eval.py 16 Unit Pydantic validation, format_for_llm, set_judge_context branching (0/1/N targets), from_json_file with edge cases (multi-user messages, mixed user messages, non-assistant filtering)
test/stdlib/frameworks/test_react.py 7 Integration Loop termination, budget exhaustion, non-final tool continuation, model_options tool merge, format branch, multi-tool rejection, context type guard

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

…e-computing#817)

32 tests across 3 files covering react components, react framework
orchestrator, and unit_test_eval. Framework tests use ScriptedBackend
fake to exercise real aact/call_tools paths without LLM inference.
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1 planetf1 marked this pull request as ready for review April 13, 2026 12:15
@planetf1 planetf1 requested a review from a team as a code owner April 13, 2026 12:15
pytest's default import mode treats both test/stdlib/components/test_react.py
and test/stdlib/frameworks/test_react.py as the same module 'test_react',
causing a collection error in CI. Rename the framework-level file to avoid
the collision.
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have bandwidth to do a deep review, but I did run the tests and have Claude review, which gave a couple inline comments:

ran uv run pytest test/stdlib/components/ test/stdlib/frameworks/:

100 passed, 1 skipped, 1 xpassed, 15 warnings in 159.23s (0:02:39)

All new tests passed

…e-computing#817)

Address review feedback on PR generative-computing#830. Fix a bug where examples with no
user messages caused inputs/targets/input_ids to fall out of alignment,
extend tests to cover the misalignment scenario, and move a local import
to module level.
@planetf1 planetf1 enabled auto-merge April 14, 2026 09:54
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM and Claude approve so I'm going to give an approve on this since in general more tests are better than none,

@planetf1 planetf1 added this pull request to the merge queue Apr 14, 2026
Merged via the queue into generative-computing:main with commit 75062f4 Apr 14, 2026
8 checks passed
@planetf1 planetf1 deleted the test/stdlib-components-817 branch April 14, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: unit tests for stdlib components (react, unit_test_eval)

2 participants