test: cover malformed citation edge-case warning branches (#827)#1165
test: cover malformed citation edge-case warning branches (#827)#1165planetf1 wants to merge 2 commits into
Conversation
…-computing#827) Extends the granite32 and granite33 output-processor unit tests with warning-branch coverage for malformed <co> citation tags and Granite 3.3 control tokens that were previously untested. Branches now covered: - _parse_citations_text: no <co> tags, regex-miss on inner Document pattern, nested Document token in citation text (granite32) - _parse_citations_text: no numeric N: pattern (granite33) - _get_docs_from_citations: non-numeric doc_id warning, non-numeric citation_id warning (granite32); non-numeric doc_id warning (granite33) - _add_citation_response_spans: citation ID absent from response, citation at position 0 of first sentence, duplicate citation ID in two sentences (granite32) - _validate_response: nested CITE_START, citation count mismatch (granite33) Coverage delta: granite32 78.4%→83.5%, granite33 83.1%→84.2%. This is PR A of the two-part generative-computing#827 workstream. PR B (granite-common base/types port + answer_relevance fixtures) is deferred. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Broaden _get_docs_from_citations signature to str | None; production code already handles None via the `if not docs` guard, but the type annotation was narrower than the runtime contract. - Add `assert result.tool_calls is not None` before len() and subscript in test_tool_call_parsing (granite32 and granite33); tool_calls is typed list[ToolCall] | None on AssistantMessage. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
Moved to ready for review. This was held as draft pending Discussion #1166 — whether Granite 4.x needs its own formatter. The tests here cover Granite 3.2/3.3 warning branches only and are correct regardless of how that discussion resolves; any Granite 4.x work would land in PR B. |
|
Reverting to draft pending a design decision surfaced in Discussion #1166. @jakelorocco confirmed there that the This doesn't make the tests incorrect, but it does mean they cover warning branches that no production path currently reaches. Before asking reviewers to spend time here, the right call is to first settle in #1166 whether this code is intended to be wired up eventually, deprecated, or removed. Holding as draft until that's resolved. |
Summary
The Granite 3.2 and 3.3 citation parsers have ~15
logging.WARNINGbranches that fire when a model emits malformed output — unclosed<co>tags, non-numeric citation IDs, duplicate citation IDs in the same response, nested control tokens. Every one of those branches was a silentcontinuewith no test coverage; a regression there would ship unnoticed, and real models do emit malformed<co>blocks in production.This PR extends
test_granite32_output.py(+127 lines) andtest_granite33_output.py(+25 lines) with targeted warning-branch tests. Each test drives a module-private helper directly with adversarial input, asserts the expectedlogging.WARNINGis emitted viacaplog, and asserts the function returns the documented safe default rather than crashing. No production code is changed.Coverage delta: granite32 78.4% → 83.5%, granite33 83.1% → 84.2%.
Branches covered
_parse_citations_text<co>tags in input_parse_citations_text_parse_citations_text\nDocument Nin citation text_get_docs_from_citations_get_docs_from_citations_add_citation_response_spans_add_citation_response_spans_add_citation_response_spans_parse_citations_textN:numeric pattern_validate_responseCITE_START_validate_response_get_docs_from_citationsTest plan
uv run pytest test/formatters/granite/test_granite32_output.py test/formatters/granite/test_granite33_output.py -v— 70 passed, 0 faileduv run pytest test/formatters/granite/ -m "not qualitative and not slow"— all 216 pass, no regressionsWhere this fits
Part of epic #726 (testing strategy overhaul), sub-issue #827. PR #818 added the baseline ~200 unit tests; this is PR A of a two-part #827 workstream (edge-case warning branches). PR B — granite-common
base/types.pyport +answer_relevance_*fixtures — is deferred to a follow-up per the split endorsed in the issue.