Skip to content

fix(ollama): soft-fail on empty response in generate_from_raw#1161

Open
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-599
Open

fix(ollama): soft-fail on empty response in generate_from_raw#1161
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-599

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 27, 2026

Description

Fixes #599

generate_from_raw was intermittently returning empty-string results. After investigation (see #16326 which we have now closed), the root cause was in the mellea client: asyncio.gather(..., return_exceptions=True) silently converted any exception — timeout, ResponseError, connection reset while the model was loading — into ModelOutputThunk(value=""). This was fixed separately in #1163 (dropped return_exceptions=True).

This PR adds a second, belt-and-braces layer for the rare but real case where Ollama genuinely returns HTTP 200 with response: "" and done: True (e.g. EOS sampled as the first token — runner.go:546–552 in Ollama, where a TODO comment acknowledges the issue). In 4400 requests across warm and cold-load conditions we never observed this path, but the code path exists and is worth handling explicitly.

mellea/backends/ollama.py — adds an elif response.done and not response.response and not response.thinking: branch. Logs a warning and attaches a RuntimeError and the raw GenerateResponse to generate_log.extra for operator inspection. The and not response.thinking guard avoids false-positives on thinking models that return an empty .response alongside a non-empty .thinking field. Does not re-introduce return_exceptions=True — that removal from #1163 is preserved.

test/backends/test_ollama.py — adds a module-scoped _ensure_model_warm autouse fixture so that running this file in isolation (outside the full conftest warm-up path) doesn't start cold. Removes the @pytest.mark.xfail from test_generate_from_raw — the original failure was the return_exceptions=True bug, now fixed; the test has passed consistently across 1000+ trials.

test/backends/test_ollama_unit.py — unit tests for the new branch (no live Ollama required): empty done-response soft-fails with RuntimeError attached; thinking-model response with response="" is not flagged; one empty slot in a batch of three doesn't discard the other two results.

Testing

# Unit tests — no Ollama server required
uv run pytest test/backends/test_ollama_unit.py -v
# 19 passed

# 1000-trial soak: 4000 requests, 0 empty responses
uv run --with httpx python /tmp/hammer1000.py 1000
# DONE: 1000 trials, 4000 requests, 0 empty responses, 398s elapsed

# Verify xfail removal — test now passes clean
uv run pytest test/backends/test_ollama.py::test_generate_from_raw -v
# 1 passed

@github-actions github-actions Bot added the bug Something isn't working label May 27, 2026
@planetf1 planetf1 changed the title fix(ollama): raise on empty response from generate_from_raw fix(ollama): soft-fail on empty response in generate_from_raw May 27, 2026
@planetf1 planetf1 force-pushed the worktree-issue-599 branch from 0a8a0ff to 6fbe1ce Compare May 27, 2026 06:48
@planetf1 planetf1 marked this pull request as ready for review May 27, 2026 06:49
@planetf1 planetf1 requested a review from a team as a code owner May 27, 2026 06:49
…nerative-computing#599)

Ollama returns HTTP 200 with an empty `response` field when the first
sampled token is EOS (runner.go:546-552). This is real but vanishingly
rare — 4 400 requests across 1 100 trials showed zero occurrences once
the primary cause (return_exceptions=True in asyncio.gather, PR generative-computing#1163)
was removed.

This PR adds belt-and-braces handling for that genuine-but-rare path:
warm the model in isolation runs, detect an HTTP-200-with-empty-body
response and log a warning rather than silently returning an empty
string. The stale xfail on test_generate_from_raw (which was passing
cleanly after generative-computing#1163) is removed.

Three unit tests cover the soft-fail branch directly without needing a
live Ollama server.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-599 branch from 372a8ef to 42d840e Compare May 28, 2026 18:28
…issues

Lesson from generative-computing#599 investigation: return_exceptions=True silently converts
exceptions to empty values in batch backends.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ollama returns empty-body responses under sustained load in generate_from_raw

1 participant