test: cover malformed citation edge-case warning branches (#827) by planetf1 · Pull Request #1165 · generative-computing/mellea

planetf1 · 2026-05-27T07:09:07Z

Summary

The Granite 3.2 and 3.3 citation parsers have ~15 logging.WARNING branches that fire when a model emits malformed output — unclosed <co> tags, non-numeric citation IDs, duplicate citation IDs in the same response, nested control tokens. Every one of those branches was a silent continue with no test coverage; a regression there would ship unnoticed, and real models do emit malformed <co> blocks in production.

This PR extends test_granite32_output.py (+127 lines) and test_granite33_output.py (+25 lines) with targeted warning-branch tests. Each test drives a module-private helper directly with adversarial input, asserts the expected logging.WARNING is emitted via caplog, and asserts the function returns the documented safe default rather than crashing. No production code is changed.

Coverage delta: granite32 78.4% → 83.5%, granite33 83.1% → 84.2%.

Branches covered

Function	Trigger	File
`_parse_citations_text`	no `<co>` tags in input	g32
`_parse_citations_text`	inner Document regex misses	g32
`_parse_citations_text`	embedded `\nDocument N` in citation text	g32
`_get_docs_from_citations`	non-numeric doc_id	g32
`_get_docs_from_citations`	non-numeric citation_id	g32
`_add_citation_response_spans`	citation ID absent from response	g32
`_add_citation_response_spans`	citation at position 0 of first sentence	g32
`_add_citation_response_spans`	duplicate citation ID across two sentences	g32
`_parse_citations_text`	no `N:` numeric pattern	g33
`_validate_response`	nested `CITE_START`	g33
`_validate_response`	tag count ≠ citation_info length	g33
`_get_docs_from_citations`	non-numeric doc_id	g33

Test plan

uv run pytest test/formatters/granite/test_granite32_output.py test/formatters/granite/test_granite33_output.py -v — 70 passed, 0 failed
uv run pytest test/formatters/granite/ -m "not qualitative and not slow" — all 216 pass, no regressions
All new tests run in <6 s; no backend, Ollama, or qualitative markers needed

Where this fits

Part of epic #726 (testing strategy overhaul), sub-issue #827. PR #818 added the baseline ~200 unit tests; this is PR A of a two-part #827 workstream (edge-case warning branches). PR B — granite-common base/types.py port + answer_relevance_* fixtures — is deferred to a follow-up per the split endorsed in the issue.

Kept as draft pending resolution of discussion #1166 — Does Granite 4.x need its own formatter, or does it use Granite33InputProcessor/Granite33OutputProcessor?. The tests here are correct regardless of the outcome, but the answer may inform whether a granite4 formatter (and corresponding tests) should be bundled with PR B.

…-computing#827) Extends the granite32 and granite33 output-processor unit tests with warning-branch coverage for malformed <co> citation tags and Granite 3.3 control tokens that were previously untested. Branches now covered: - _parse_citations_text: no <co> tags, regex-miss on inner Document pattern, nested Document token in citation text (granite32) - _parse_citations_text: no numeric N: pattern (granite33) - _get_docs_from_citations: non-numeric doc_id warning, non-numeric citation_id warning (granite32); non-numeric doc_id warning (granite33) - _add_citation_response_spans: citation ID absent from response, citation at position 0 of first sentence, duplicate citation ID in two sentences (granite32) - _validate_response: nested CITE_START, citation count mismatch (granite33) Coverage delta: granite32 78.4%→83.5%, granite33 83.1%→84.2%. This is PR A of the two-part generative-computing#827 workstream. PR B (granite-common base/types port + answer_relevance fixtures) is deferred. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

- Broaden _get_docs_from_citations signature to str | None; production code already handles None via the `if not docs` guard, but the type annotation was narrower than the runtime contract. - Add `assert result.tool_calls is not None` before len() and subscript in test_tool_call_parsing (granite32 and granite33); tool_calls is typed list[ToolCall] | None on AssistantMessage. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 · 2026-05-28T19:51:31Z

Moved to ready for review. This was held as draft pending Discussion #1166 — whether Granite 4.x needs its own formatter. The tests here cover Granite 3.2/3.3 warning branches only and are correct regardless of how that discussion resolves; any Granite 4.x work would land in PR B.

planetf1 · 2026-05-28T19:57:48Z

Reverting to draft pending a design decision surfaced in Discussion #1166.

@jakelorocco confirmed there that the Granite32/Granite33 InputProcessor/OutputProcessor classes are not actually used by Mellea — the framework routes all models through the Intrinsics input/output processors instead. A quick grep confirms nothing in core, backends, stdlib, cli, or docs/examples ever imports or instantiates these classes; they are exported in mellea/formatters/granite/__init__.py but are otherwise dead code in the current execution paths.

This doesn't make the tests incorrect, but it does mean they cover warning branches that no production path currently reaches. Before asking reviewers to spend time here, the right call is to first settle in #1166 whether this code is intended to be wired up eventually, deprecated, or removed. Holding as draft until that's resolved.

github-actions Bot added the testing label May 27, 2026

planetf1 marked this pull request as ready for review May 28, 2026 19:47

planetf1 requested a review from a team as a code owner May 28, 2026 19:47

planetf1 marked this pull request as draft May 28, 2026 19:57

ajbozarth assigned planetf1 and unassigned planetf1 May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: cover malformed citation edge-case warning branches (#827)#1165

test: cover malformed citation edge-case warning branches (#827)#1165
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-827

planetf1 commented May 27, 2026 •

edited

Loading

Uh oh!

planetf1 commented May 28, 2026

Uh oh!

planetf1 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Branches covered

Test plan

Where this fits

Uh oh!

planetf1 commented May 28, 2026

Uh oh!

planetf1 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented May 27, 2026 •

edited

Loading