fix(ollama): soft-fail on empty response in generate_from_raw#1161
Open
planetf1 wants to merge 2 commits into
Open
fix(ollama): soft-fail on empty response in generate_from_raw#1161planetf1 wants to merge 2 commits into
planetf1 wants to merge 2 commits into
Conversation
0a8a0ff to
6fbe1ce
Compare
…nerative-computing#599) Ollama returns HTTP 200 with an empty `response` field when the first sampled token is EOS (runner.go:546-552). This is real but vanishingly rare — 4 400 requests across 1 100 trials showed zero occurrences once the primary cause (return_exceptions=True in asyncio.gather, PR generative-computing#1163) was removed. This PR adds belt-and-braces handling for that genuine-but-rare path: warm the model in isolation runs, detect an HTTP-200-with-empty-body response and log a warning rather than silently returning an empty string. The stale xfail on test_generate_from_raw (which was passing cleanly after generative-computing#1163) is removed. Three unit tests cover the soft-fail branch directly without needing a live Ollama server. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
372a8ef to
42d840e
Compare
…issues Lesson from generative-computing#599 investigation: return_exceptions=True silently converts exceptions to empty values in batch backends. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #599
generate_from_rawwas intermittently returning empty-string results. After investigation (see #16326 which we have now closed), the root cause was in the mellea client:asyncio.gather(..., return_exceptions=True)silently converted any exception — timeout,ResponseError, connection reset while the model was loading — intoModelOutputThunk(value=""). This was fixed separately in #1163 (droppedreturn_exceptions=True).This PR adds a second, belt-and-braces layer for the rare but real case where Ollama genuinely returns HTTP 200 with
response: ""anddone: True(e.g. EOS sampled as the first token — runner.go:546–552 in Ollama, where a TODO comment acknowledges the issue). In 4400 requests across warm and cold-load conditions we never observed this path, but the code path exists and is worth handling explicitly.mellea/backends/ollama.py— adds anelif response.done and not response.response and not response.thinking:branch. Logs a warning and attaches aRuntimeErrorand the rawGenerateResponsetogenerate_log.extrafor operator inspection. Theand not response.thinkingguard avoids false-positives on thinking models that return an empty.responsealongside a non-empty.thinkingfield. Does not re-introducereturn_exceptions=True— that removal from #1163 is preserved.test/backends/test_ollama.py— adds a module-scoped_ensure_model_warmautouse fixture so that running this file in isolation (outside the full conftest warm-up path) doesn't start cold. Removes the@pytest.mark.xfailfromtest_generate_from_raw— the original failure was thereturn_exceptions=Truebug, now fixed; the test has passed consistently across 1000+ trials.test/backends/test_ollama_unit.py— unit tests for the new branch (no live Ollama required): empty done-response soft-fails withRuntimeErrorattached; thinking-model response withresponse=""is not flagged; one empty slot in a batch of three doesn't discard the other two results.Testing