feat: streaming support in m serve OpenAI API server by markstur · Pull Request #823 · generative-computing/mellea

markstur · 2026-04-11T01:41:37Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes m serve OpenAI API streaming support #822

Add OpenAI API compatible support for streaming in m serve app.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Fixes: generative-computing#822 Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

github-actions · 2026-04-11T01:41:51Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Added streaming support w/ setting system_fingerprint. Make it consistent. We are currently just setting it to None but now it is consistent for future use. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

planetf1

Some blocking findings

planetf1 · 2026-04-14T15:50:08Z

mellea/helpers/openai_compatible_helpers.py

+        Server-sent event payload strings representing OpenAI-compatible chat
+        completion chunks, including the terminating ``[DONE]`` event.
+    """
+    from cli.serve.models import (


There's a layering concern here worth discussing. mellea/helpers/ sits in the library layer, with cli/ as a consumer — so an import in this direction inverts that relationship. If someone imports mellea.helpers.openai_compatible_helpers outside the cli/ context, this will raise an ImportError at call time.

Two paths forward:

Move stream_chat_completion_chunks into cli/serve/ (perhaps cli/serve/streaming.py) — since it's really CLI-specific glue, it arguably belongs there anyway.

Move ChatCompletionChunk, ChatCompletionChunkChoice, and ChatCompletionChunkDelta into mellea/helpers/ alongside CompletionUsage, removing the need to reach into cli/ at all.

Option 1 is probably the simpler of the two.

planetf1 · 2026-04-14T15:50:08Z

mellea/helpers/openai_compatible_helpers.py

+        previous_length = 0
+        while not output.is_computed():
+            new_content = await output.astream()
+            previous_length += len(new_content)


previous_length is incremented here but doesn't appear to be read anywhere afterwards — worth double-checking whether this was intentional.

It would also be helpful to clarify the contract of astream(): does it return only the new fragment since the last call, or the full accumulated text so far? The answer changes what this function should do with the value. The helper-level tests in test_serve_streaming.py mock it to return deltas, while the endpoint-level tests (e.g. test_streaming_response_format) mock it to return accumulated text — so the two sets of tests appear to be working from different assumptions about the API. Might be worth aligning them.

planetf1 · 2026-04-14T15:50:08Z

mellea/helpers/openai_compatible_helpers.py

+        error_response = OpenAIErrorResponse(
+            error=OpenAIError(message=f"Streaming error: {e!s}", type="server_error")
+        )
+        yield f"data: {error_response.model_dump_json()}\n\n"


It looks like the error path exits without emitting data: [DONE]\n\n. Most SSE clients — including the official openai Python SDK — wait for that sentinel to consider the stream closed, so they'll block until timeout if an exception is raised during streaming.

Adding yield "data: [DONE]\n\n" after the error chunk yield should do it.

planetf1 · 2026-04-14T15:50:08Z

cli/serve/app.py

        "temperature": ModelOption.TEMPERATURE,
        "max_tokens": ModelOption.MAX_NEW_TOKENS,
        "seed": ModelOption.SEED,
+        "stream": ModelOption.STREAM,


One thing to be aware of: because ChatCompletionRequest.stream defaults to False, this mapping means every non-streaming request will now forward ModelOption.STREAM: False to the backend — even when streaming was never requested. That's a quiet behavioural change from the previous behaviour where stream was simply ignored.

Backends that don't recognise ModelOption.STREAM may handle the unexpected key in unexpected ways. A couple of options to consider:

Return "stream" to the excluded-fields set and handle it separately, as stream_options is handled just above.

Only inject STREAM when request.stream is True, leaving the non-streaming path as it was.

planetf1

noticed a few things to tighten up.

Minor, but looking at the tests - The TestStreamingEndpoint tests (e.g. line 166) are marked @pytest.mark.asyncio and declared async def but contain no await — TestClient is synchronous and doesn't need the marker. TestStreamingHelpers is fine.

Suspect it will still work but it implies the wrong behaviour?

planetf1 · 2026-04-14T16:11:25Z

mellea/helpers/openai_compatible_helpers.py

+                yield f"data: {chunk.model_dump_json()}\n\n"
+
+        # Include usage in final chunk if requested via stream_options
+        # Default to True (include usage) for backward compatibility


note that the openai spec default is False

From the OpenAI API reference at https://platform.openai.com/docs/api-reference/chat-streaming, the ChatCompletionChunk.usage field is documented as:

"An optional field that will only be present when you set stream_options: {"include_usage": true} in your request."

planetf1 · 2026-04-14T16:14:08Z

cli/serve/models.py

+    # included in the final streaming chunk. Defaults to True (include usage)
+    # when not specified for backward compatibility. For non-streaming requests
+    # (stream=False), usage is always included regardless of this parameter.
+    stream_options: dict[str, Any] | None = None


In openAI this is Pydantic typed object - but I presume you chose this so you don't do any enforcement/checks in case those options change in future - understandable (though it could mean typos are missed). Am ok either way.

planetf1 · 2026-04-14T16:15:20Z

docs/examples/m_serve/README.md

+# In another terminal, test with the non-streaming client
 python docs/examples/m_serve/client.py
+
+### Streaming


I think this is meant to be a heading - needs to have the code fencing terminated in the previous section and started again here (for bash)

planetf1 · 2026-04-14T16:28:51Z

docs/examples/m_serve/m_serve_example_streaming.py

+async def serve(
+    input: list[ChatMessage],
+    requirements: list[str] | None = None,
+    model_options: dict | None = None,


Suggested change

model_options: dict | None = None,

model_options: dict[str, Any] | None = None,

psschwei · 2026-04-14T17:17:16Z

docs/examples/m_serve/README.md

+for chunk in stream:
+    if chunk.choices[0].delta.content:
+        print(chunk.choices[0].delta.content, end="", flush=True)
 ```


Suggested change

```

psschwei · 2026-04-14T17:20:47Z

mellea/helpers/openai_compatible_helpers.py

+            choices=[
+                ChatCompletionChunkChoice(
+                    index=0,
+                    delta=ChatCompletionChunkDelta(role="assistant", content=""),


Suggested change

delta=ChatCompletionChunkDelta(role="assistant", content=""),

delta=ChatCompletionChunkDelta(role="assistant", content=None),

I believe this is more in line with how OpenAI sends it

psschwei · 2026-04-14T17:21:00Z

mellea/helpers/openai_compatible_helpers.py

+            choices=[
+                ChatCompletionChunkChoice(
+                    index=0,
+                    delta=ChatCompletionChunkDelta(content=""),


Suggested change

delta=ChatCompletionChunkDelta(content=""),

delta=ChatCompletionChunkDelta(content=None),

same as above

feat: streaming support in m serve OpenAI API server

a9cf69e

Fixes: generative-computing#822 Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from a team as a code owner April 11, 2026 01:41

markstur requested review from akihikokuroda and avinash2692 April 11, 2026 01:41

github-actions bot added the enhancement New feature or request label Apr 11, 2026

markstur added 2 commits April 10, 2026 18:51

test: add streaming test file

0a2e4d9

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: docstring fix for streaming

ca5205e

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

psschwei requested review from psschwei and removed request for akihikokuroda April 11, 2026 02:06

fix: make system_fingerprint consistent in m serve

417261b

Added streaming support w/ setting system_fingerprint. Make it consistent. We are currently just setting it to None but now it is consistent for future use. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

planetf1 requested changes Apr 14, 2026

View reviewed changes

psschwei reviewed Apr 14, 2026

View reviewed changes

	model_options: dict \| None = None,
	model_options: dict[str, Any] \| None = None,

	delta=ChatCompletionChunkDelta(role="assistant", content=""),
	delta=ChatCompletionChunkDelta(role="assistant", content=None),

	delta=ChatCompletionChunkDelta(content=""),
	delta=ChatCompletionChunkDelta(content=None),

Conversation

markstur commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Apr 11, 2026

Uh oh!

planetf1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markstur commented Apr 11, 2026 •

edited

Loading

planetf1 left a comment •

edited

Loading

planetf1 Apr 14, 2026 •

edited

Loading

planetf1 Apr 14, 2026 •

edited

Loading

planetf1 Apr 14, 2026 •

edited

Loading

planetf1 Apr 14, 2026 •

edited

Loading