Skip to content

test: cover malformed citation edge-case warning branches (#827)#1165

Draft
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-827
Draft

test: cover malformed citation edge-case warning branches (#827)#1165
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-827

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 27, 2026

Summary

The Granite 3.2 and 3.3 citation parsers have ~15 logging.WARNING branches that fire when a model emits malformed output — unclosed <co> tags, non-numeric citation IDs, duplicate citation IDs in the same response, nested control tokens. Every one of those branches was a silent continue with no test coverage; a regression there would ship unnoticed, and real models do emit malformed <co> blocks in production.

This PR extends test_granite32_output.py (+127 lines) and test_granite33_output.py (+25 lines) with targeted warning-branch tests. Each test drives a module-private helper directly with adversarial input, asserts the expected logging.WARNING is emitted via caplog, and asserts the function returns the documented safe default rather than crashing. No production code is changed.

Coverage delta: granite32 78.4% → 83.5%, granite33 83.1% → 84.2%.

Branches covered

Function Trigger File
_parse_citations_text no <co> tags in input g32
_parse_citations_text inner Document regex misses g32
_parse_citations_text embedded \nDocument N in citation text g32
_get_docs_from_citations non-numeric doc_id g32
_get_docs_from_citations non-numeric citation_id g32
_add_citation_response_spans citation ID absent from response g32
_add_citation_response_spans citation at position 0 of first sentence g32
_add_citation_response_spans duplicate citation ID across two sentences g32
_parse_citations_text no N: numeric pattern g33
_validate_response nested CITE_START g33
_validate_response tag count ≠ citation_info length g33
_get_docs_from_citations non-numeric doc_id g33

Test plan

  • uv run pytest test/formatters/granite/test_granite32_output.py test/formatters/granite/test_granite33_output.py -v — 70 passed, 0 failed
  • uv run pytest test/formatters/granite/ -m "not qualitative and not slow" — all 216 pass, no regressions
  • All new tests run in <6 s; no backend, Ollama, or qualitative markers needed

Where this fits

Part of epic #726 (testing strategy overhaul), sub-issue #827. PR #818 added the baseline ~200 unit tests; this is PR A of a two-part #827 workstream (edge-case warning branches). PR B — granite-common base/types.py port + answer_relevance_* fixtures — is deferred to a follow-up per the split endorsed in the issue.

Kept as draft pending resolution of discussion #1166Does Granite 4.x need its own formatter, or does it use Granite33InputProcessor/Granite33OutputProcessor?. The tests here are correct regardless of the outcome, but the answer may inform whether a granite4 formatter (and corresponding tests) should be bundled with PR B.

…-computing#827)

Extends the granite32 and granite33 output-processor unit tests with
warning-branch coverage for malformed <co> citation tags and Granite 3.3
control tokens that were previously untested.

Branches now covered:
- _parse_citations_text: no <co> tags, regex-miss on inner Document pattern,
  nested Document token in citation text (granite32)
- _parse_citations_text: no numeric N: pattern (granite33)
- _get_docs_from_citations: non-numeric doc_id warning, non-numeric citation_id
  warning (granite32); non-numeric doc_id warning (granite33)
- _add_citation_response_spans: citation ID absent from response, citation at
  position 0 of first sentence, duplicate citation ID in two sentences (granite32)
- _validate_response: nested CITE_START, citation count mismatch (granite33)

Coverage delta: granite32 78.4%→83.5%, granite33 83.1%→84.2%.

This is PR A of the two-part generative-computing#827 workstream.  PR B (granite-common
base/types port + answer_relevance fixtures) is deferred.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Broaden _get_docs_from_citations signature to str | None; production
  code already handles None via the `if not docs` guard, but the type
  annotation was narrower than the runtime contract.
- Add `assert result.tool_calls is not None` before len() and subscript
  in test_tool_call_parsing (granite32 and granite33); tool_calls is
  typed list[ToolCall] | None on AssistantMessage.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 marked this pull request as ready for review May 28, 2026 19:47
@planetf1 planetf1 requested a review from a team as a code owner May 28, 2026 19:47
@planetf1
Copy link
Copy Markdown
Contributor Author

Moved to ready for review. This was held as draft pending Discussion #1166 — whether Granite 4.x needs its own formatter. The tests here cover Granite 3.2/3.3 warning branches only and are correct regardless of how that discussion resolves; any Granite 4.x work would land in PR B.

@planetf1 planetf1 marked this pull request as draft May 28, 2026 19:57
@planetf1
Copy link
Copy Markdown
Contributor Author

Reverting to draft pending a design decision surfaced in Discussion #1166.

@jakelorocco confirmed there that the Granite32/Granite33 InputProcessor/OutputProcessor classes are not actually used by Mellea — the framework routes all models through the Intrinsics input/output processors instead. A quick grep confirms nothing in core, backends, stdlib, cli, or docs/examples ever imports or instantiates these classes; they are exported in mellea/formatters/granite/__init__.py but are otherwise dead code in the current execution paths.

This doesn't make the tests incorrect, but it does mean they cover warning branches that no production path currently reaches. Before asking reviewers to spend time here, the right call is to first settle in #1166 whether this code is intended to be wired up eventually, deprecated, or removed. Holding as draft until that's resolved.

@ajbozarth ajbozarth assigned planetf1 and unassigned planetf1 May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants