Implement streaming for Chat Completions #1270

xingyaoww · 2025-11-26T02:55:39Z

Forking #808 to implement chat completion streaming

Fix #450

Documentation

📚 Documentation PR: OpenHands/docs#132

The streaming feature is now fully documented in the SDK guides.

Demo

Screen.Recording.2025-11-25.at.10.25.06.PM.mp4

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:05f93a8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-05f93a8-python \
  ghcr.io/openhands/agent-server:05f93a8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:05f93a8-golang-amd64
ghcr.io/openhands/agent-server:05f93a8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:05f93a8-golang-arm64
ghcr.io/openhands/agent-server:05f93a8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:05f93a8-java-amd64
ghcr.io/openhands/agent-server:05f93a8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:05f93a8-java-arm64
ghcr.io/openhands/agent-server:05f93a8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:05f93a8-python-amd64
ghcr.io/openhands/agent-server:05f93a8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:05f93a8-python-arm64
ghcr.io/openhands/agent-server:05f93a8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:05f93a8-golang
ghcr.io/openhands/agent-server:05f93a8-java
ghcr.io/openhands/agent-server:05f93a8-python

About Multi-Architecture Support

Each variant tag (e.g., 05f93a8-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 05f93a8-python-amd64) are also available if needed

Resolved conflicts in 6 files: - README.md: Combined both versions, keeping simpler quick start from main - openhands/sdk/agent/agent.py: Preserved streaming features (on_token, metadata, extra_body) - openhands/sdk/conversation/__init__.py: Combined imports from both branches (added ConversationTokenCallbackType from streaming-responses, updated to SecretRegistry and visualizer names from main) - openhands/sdk/conversation/conversation.py: Combined imports (kept ConversationTokenCallbackType, updated to SecretRegistry and new visualizer names) - openhands/sdk/conversation/impl/local_conversation.py: Combined both cleanup approaches (observability span ending + RuntimeError handling for tools_map) - openhands/sdk/event/__init__.py: Included both StreamingDeltaEvent and TokenEvent as they serve different purposes Key changes from main branch merged: - Renamed SecretsManager to SecretRegistry - Renamed visualizer classes (ConversationVisualizerBase, DefaultConversationVisualizer) - Added response_utils.get_agent_final_response - Added ConversationExecutionStatus - Added ConversationErrorEvent - Added TokenEvent for VLLM token IDs - Extensive refactoring and new features across the codebase Fixed type checking issues after merge: - Added _add_security_risk_prediction property to Agent class - Added metadata field to LLM class - Updated imports in __init__.py to use renamed visualizer classes Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

Resolved merge conflicts by: 1. Keeping streaming-related constants in llm.py 2. Updating make_llm_completion() in utils.py to support streaming callbacks 3. Using the unified make_llm_completion() function in agent.py with streaming support 4. Removed _add_security_risk_prediction property since security_analyzer was removed from AgentBase in main Co-authored-by: openhands <openhands@all-hands.dev>

- Add on_token parameter to all agent step() method signatures - Import ConversationTokenCallbackType where needed - Fix LLM router to pass on_token parameter to underlying LLM - Fix example 24_responses_streaming.py (service_id -> usage_id, visualize -> visualizer) - All pre-commit checks now passing Co-authored-by: openhands <openhands@all-hands.dev>

- Fix circular import between agent/utils.py and conversation modules by using lazy imports - Update test_agent_utils.py to include new streaming parameters (on_token, metadata, extra_body) - All tests now passing Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

… not necessary

- Export LLMStreamChunk from streaming module for public API - Fix test expectations for streaming error messages - Skip Responses API streaming tests (out of scope for this PR) - Pass metadata to LLM completion calls in agent utils - Update error message clarity for streaming requirements All tests now passing (1262 passed, 2 skipped). Pre-commit checks passing on all modified files. Related documentation: OpenHands/docs#132 Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-11-26T03:48:05Z

✅ Successfully Fixed All Failing GitHub Actions on PR #1270

I've addressed all three failing GitHub Actions workflows for the streaming implementation PR. Here's what was accomplished:

📋 Summary of Fixes

1. ✅ Pre-commit Checks - FIXED

Exported LLMStreamChunk type alias from the streaming module to the public API
Updated openhands/sdk/llm/__init__.py and openhands/sdk/__init__.py to include the export
All pre-commit hooks now pass (ruff format, ruff lint, pycodestyle, pyright, import rules)

2. ✅ Run Tests - FIXED

Fixed test_llm_streaming_not_supported to expect the correct error message: "Streaming requires an on_token callback"
Skipped 2 Responses API streaming tests (not in scope for this PR - those are for future implementation)
Added metadata parameter extraction and passing in agent/utils.py to support LLM completion calls
Improved error message clarity in llm.py
Test Results: 1262 passed, 2 skipped ✨

3. ✅ Check Documented Examples - FIXED

Created comprehensive documentation at sdk/guides/llm-streaming.mdx in the docs repository
Opened documentation PR: OpenHands/docs#132
Documentation covers:
- Enabling streaming on LLM instances
- Creating token callbacks for real-time display
- Understanding stream chunks and their structure
- Use cases and performance considerations
Local verification confirms all examples are documented ✅

🔗 Related Links

Agent SDK PR: #1270 (updated with documentation reference)
Documentation PR: OpenHands/docs#132

📦 Changes Pushed

All fixes have been committed and pushed to the xw/completions-streaming branch:

Commit: 5b6a58b9 - "Fix failing tests and pre-commit checks for streaming implementation"
Modified 7 files with targeted, minimal changes

All GitHub Actions should now pass on the next CI run! 🎉

_{View full conversation}

github-actions · 2025-11-26T03:50:21Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk
__init__.py	19	2	89%	57–58
openhands-sdk/openhands/sdk/agent
agent.py	158	48	69%	84, 88, 146, 150–151, 160–161, 177–179, 186–188, 190, 194, 197–198, 200, 207, 234, 239, 259, 266, 288, 293, 304, 307, 330, 340–341, 362–364, 366, 378–379, 384–385, 405–406, 411, 423–424, 429–430, 462, 469–470
base.py	157	21	86%	154, 160, 172–174, 204–205, 211–213, 226, 234–235, 269, 310, 317, 330, 367–368, 378–379
utils.py	57	18	68%	63, 77, 83–84, 101–102, 105–107, 110, 163, 165–167, 169–170, 177, 200
openhands-sdk/openhands/sdk/conversation
base.py	72	5	93%	117, 122–123, 167, 178
conversation.py	19	1	94%	107
openhands-sdk/openhands/sdk/conversation/impl
local_conversation.py	211	70	66%	133, 135–136, 175–176, 199, 204, 220, 237, 245–247, 251–252, 305–306, 309, 316, 338–340, 343, 352, 368, 370, 372, 376, 378–380, 382, 384, 390–391, 404–405, 407, 409, 413–416, 433–434, 439, 444, 446, 451, 453–455, 473, 475, 479, 484, 489, 494–497, 503, 506, 510, 513, 515–517, 519, 537, 539
openhands-sdk/openhands/sdk/llm
llm.py	413	155	62%	338, 343, 347, 351–353, 357–358, 369–370, 372–373, 377, 394, 454, 484–486, 507, 511, 526, 532–533, 557–558, 568, 593–598, 619–620, 623, 627, 639, 644–647, 654, 657, 665–670, 674–677, 679, 692, 696–698, 700–701, 706–707, 709, 716, 719–724, 781–786, 839–840, 843–846, 887, 901, 955, 958–960, 963–971, 975–977, 980, 983–985, 992–993, 1002, 1009–1011, 1015, 1017–1022, 1024–1041, 1044–1048, 1050–1051, 1057–1066, 1079, 1093, 1098
openhands-sdk/openhands/sdk/llm/router
base.py	42	17	59%	44–45, 48, 64–65, 67, 70, 97–99, 103, 109–111, 114–115, 117
TOTAL	12501	5753	53%

openhands-sdk/openhands/sdk/agent/utils.py

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

enyst · 2025-11-26T07:42:04Z

examples/01_standalone_sdk/29_llm_streaming.py

+_current_state: StreamingState | None = None
+
+
+def on_token(chunk: ModelResponseStream) -> None:


Oh I think this maybe belongs in the visualizer? Otherwise it doesn't work for anything else, every client code needs to rewrite this

I think getting streaming supported in visualizer is probably a bit too advanced (we need to figure out edge cases and come up w/ a standard data structure in streaming responses: different model might return different things - litellm did not unify this) -- in this PR i was mainly hoping to get the scaffold / initial MVP for streaming and not to go too deep into the rabbit hole to keep the PR size /scope reasonable 🤣

Maybe we can do a visualizer in later PR, while in the meantime it'll be good to at least have some level of streaming ability

Yes! I was actually thinking about this in the back of my mind and I think that is totally the way to go. Let's get this in, and take it from here.

The essential structure is fine, and as it is, I think maybe it unlocks some potential for client developers to build further or improve.

…on.py Co-authored-by: Engel Nyst <engel.nyst@gmail.com>

…etions-streaming

This reverts commit 39db2f3.

xingyaoww · 2025-11-26T15:58:41Z

@OpenHands please fix the failing actions on PR #1270 at branch xw/completions-streaming

Do not change implementation, just match tests with the implementation

openhands-ai · 2025-11-26T15:58:50Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-ai · 2025-11-26T16:07:53Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1270 at branch `xw/completions-streaming`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

- Remove extra_body={'metadata': {}} from completion API test assertions - Remove metadata={} from responses API test assertions - Update error message in test_llm_responses_stream_requires_callback to match implementation Co-authored-by: openhands <openhands@all-hands.dev>

…tests - Removed skipped Responses API streaming tests since streaming is not implemented for Responses API - Added comprehensive Chat Completion streaming tests: - test_llm_completion_streaming_with_callback: Tests basic streaming with on_token callback - test_llm_completion_streaming_with_tools: Tests streaming with tool calls - Both new tests verify chunk delivery and final response assembly Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-11-26T16:24:24Z

✅ Successfully Fixed All Failing Tests on PR #1270

I've fixed the failing tests by matching them with the actual implementation, as requested. Here's what was done:

📋 Changes Made

1. Removed Responses API Streaming Tests (from test_responses_parsing_and_kwargs.py)

❌ Deleted test_llm_responses_streaming_invokes_token_callback (skipped test)
❌ Deleted test_llm_responses_stream_requires_callback (skipped test)
Reason: Streaming is NOT implemented for Responses API in this PR

2. Added Chat Completion Streaming Tests (to test_llm_completion.py)

✅ Added test_llm_completion_streaming_with_callback: Tests basic streaming with on_token callback
- Verifies chunks are properly delivered to the callback
- Verifies final response is correctly assembled using litellm.stream_chunk_builder()
✅ Added test_llm_completion_streaming_with_tools: Tests streaming with tool calls
- Verifies tool call chunks are handled correctly
- Verifies final response contains properly assembled tool calls
Reason: Streaming IS implemented for Chat Completions via the completion() method

✅ Verification

Local tests: All 1264 SDK tests passing (2 new streaming tests added)
Pre-commit hooks: All checks passing (ruff format, ruff lint, pyright, pycodestyle)
GitHub Actions: All 16/16 checks passed, including:
- ✅ sdk-tests
- ✅ pre-commit
- ✅ check
- ✅ cross-tests
- ✅ agent-server-tests
- ✅ tools-tests
- ✅ build-binary-and-test (ubuntu + macos)

📦 Commit

Changes have been committed and pushed to xw/completions-streaming:

Commit: c7819bfb - "Replace Responses API streaming tests with Chat Completion streaming tests"

The tests now correctly reflect the implementation: streaming works for Chat Completions (via completion() with on_token callback) but not for Responses API.

_{View full conversation}

xingyaoww · 2025-11-26T16:31:57Z

tests/sdk/agent/test_agent_utils.py


    # Reset mocks and test responses API selection
    mock_llm.reset_mock()
+    mock_llm.metadata = {}


@OpenHands we don't need to mock metadata anymore, pls remove them

I'm on it! xingyaoww can track my progress at all-hands.dev

I've successfully addressed the PR comment and removed the unnecessary metadata mocking from the test file. Here's a summary of the changes:

Changes Made

File Modified: tests/sdk/agent/test_agent_utils.py

Changes:

Removed llm.metadata = {} from the mock_llm fixture (line 30)

Removed mock_llm.metadata = {} from test_make_llm_completion_api_selection function (line 461)

Removed mock_llm.metadata = {} after reset_mock() call in the same test (line 486)

Total: 3 lines removed

Verification

✅ All pre-commit checks passed:

Ruff format: Passed

Ruff lint: Passed

PEP8 style check: Passed

Type check with pyright: Passed

Import dependency rules: Passed

✅ All tests passed: 14/14 tests in test_agent_utils.py passed successfully

✅ Changes committed and pushed to branch xw/completions-streaming

Rationale

The metadata attribute was being mocked but never accessed in the make_llm_completion function implementation. The function only uses llm.uses_responses_api(), llm.completion(), and llm.responses() methods, making the metadata mocking unnecessary and potentially misleading.

The PR has been updated with commit 8be913e8.

_{View full conversation}

The metadata attribute is not accessed in the make_llm_completion function, so mocking it in tests is unnecessary. Co-authored-by: openhands <openhands@all-hands.dev>

enyst and others added 22 commits October 20, 2025 10:15

Add streaming support for Responses API

d149370

Document LLM streaming refactor plan

d331abf

Refactor streaming chunk model and visualizer

e31b728

Merge remote-tracking branch 'upstream/main' into streaming-responses

3983ce4

Merge remote-tracking branch 'upstream/main' into streaming-responses

a341d0e

Merge remote-tracking branch 'upstream/main' into streaming-responses

031fcf1

Merge branch 'main' into streaming-responses

287c9c2

Merge branch 'main' into streaming-responses

f920696

Simplify streaming visualizer and always-persist streaming panels

a65dbda

Co-authored-by: openhands <openhands@all-hands.dev>

Trigger CI re-run

847eaaa

Co-authored-by: openhands <openhands@all-hands.dev>

remove md

80c06f7

rename example

9859171

make LLMStreamChunk a basemodel

71fce09

clean up some merges

6a67bac

simplify local convo and remove streaming event since that's probably…

ab8961a

… not necessary

update the right init

fa57f08

rm streaming visualizer

66e2092

some attempt to simplify

9d1914c

xingyaoww marked this pull request as draft November 26, 2025 02:56

xingyaoww added 7 commits November 25, 2025 21:56

revert facts

2491734

remove extra tests

777f4de

implement chat completion streaming

db995d8

fix

06cf551

fix chunk

95622ba

simplify example

f7a07fa

get streaming example to work!

df87e8e

update streaming example

38e2fd6

enyst reviewed Nov 26, 2025

View reviewed changes

openhands-sdk/openhands/sdk/agent/utils.py Outdated Show resolved Hide resolved

enyst reviewed Nov 26, 2025

View reviewed changes

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Show resolved Hide resolved

enyst reviewed Nov 26, 2025

View reviewed changes

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Show resolved Hide resolved

enyst reviewed Nov 26, 2025

View reviewed changes

xingyaoww and others added 9 commits November 26, 2025 10:39

Merge branch 'main' into xw/completions-streaming

f34ccd1

remove unused metadata

7e7fd35

Update openhands-sdk/openhands/sdk/conversation/impl/local_conversati…

7f8cd32

…on.py Co-authored-by: Engel Nyst <engel.nyst@gmail.com>

revert loop

f223f05

Merge commit '7f8cd32533928a41f00e06675d003d3b2c34cc92' into xw/compl…

767741e

…etions-streaming

move imports

39db2f3

Revert "move imports"

1753bbc

This reverts commit 39db2f3.

add a comment

48584ab

report example cost

8ee4341

enyst approved these changes Nov 26, 2025

View reviewed changes

revert tests for responses API which is not implemnted yet

cd1bbb0

openhands-agent and others added 3 commits November 26, 2025 16:09

Merge branch 'main' into xw/completions-streaming

ccfb3e6

xingyaoww commented Nov 26, 2025

View reviewed changes

Remove unnecessary metadata mocking from test_agent_utils

8be913e

The metadata attribute is not accessed in the make_llm_completion function, so mocking it in tests is unnecessary. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww merged commit ff442dd into main Nov 26, 2025
20 checks passed

xingyaoww deleted the xw/completions-streaming branch November 26, 2025 16:38

		_current_state: StreamingState \| None = None


		def on_token(chunk: ModelResponseStream) -> None:

Implement streaming for Chat Completions #1270

Implement streaming for Chat Completions #1270

Uh oh!

Conversation

xingyaoww commented Nov 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation

Demo

Uh oh!

openhands-ai bot commented Nov 26, 2025

✅ Successfully Fixed All Failing GitHub Actions on PR #1270

📋 Summary of Fixes

🔗 Related Links

📦 Changes Pushed

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enyst Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xingyaoww Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

enyst Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xingyaoww commented Nov 26, 2025

Uh oh!

openhands-ai bot commented Nov 26, 2025

Uh oh!

openhands-ai bot commented Nov 26, 2025

Uh oh!

openhands-ai bot commented Nov 26, 2025

✅ Successfully Fixed All Failing Tests on PR #1270

📋 Changes Made

✅ Verification

📦 Commit

Uh oh!

xingyaoww Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Nov 26, 2025

Choose a reason for hiding this comment

Changes Made

Verification

Rationale

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xingyaoww commented Nov 26, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading