-
Notifications
You must be signed in to change notification settings - Fork 78
Implement streaming for Chat Completions #1270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
d149370
Add streaming support for Responses API
enyst d331abf
Document LLM streaming refactor plan
enyst e31b728
Refactor streaming chunk model and visualizer
enyst 3983ce4
Merge remote-tracking branch 'upstream/main' into streaming-responses
enyst a341d0e
Merge remote-tracking branch 'upstream/main' into streaming-responses
enyst 031fcf1
Merge remote-tracking branch 'upstream/main' into streaming-responses
enyst 287c9c2
Merge branch 'main' into streaming-responses
enyst 21bcaa5
Merge main branch into streaming-responses
openhands-agent f920696
Merge branch 'main' into streaming-responses
enyst a65dbda
Simplify streaming visualizer and always-persist streaming panels
enyst 27f9653
Merge main into streaming-responses and resolve conflicts
openhands-agent dbbd0cf
Fix merge conflicts and type errors after merging main
openhands-agent 7ac405d
Fix circular import and update tests for streaming API
openhands-agent 847eaaa
Trigger CI re-run
openhands-agent 80c06f7
remove md
xingyaoww 9859171
rename example
xingyaoww 71fce09
make LLMStreamChunk a basemodel
xingyaoww 6a67bac
clean up some merges
xingyaoww ab8961a
simplify local convo and remove streaming event since that's probably…
xingyaoww fa57f08
update the right init
xingyaoww 66e2092
rm streaming visualizer
xingyaoww 9d1914c
some attempt to simplify
xingyaoww 2491734
revert facts
xingyaoww 777f4de
remove extra tests
xingyaoww db995d8
implement chat completion streaming
xingyaoww 06cf551
fix
xingyaoww 95622ba
fix chunk
xingyaoww f7a07fa
simplify example
xingyaoww df87e8e
get streaming example to work!
xingyaoww d7734c6
ignore warnings
xingyaoww 5b6a58b
Fix failing tests and pre-commit checks for streaming implementation
openhands-agent 38e2fd6
update streaming example
xingyaoww f34ccd1
Merge branch 'main' into xw/completions-streaming
xingyaoww 7e7fd35
remove unused metadata
xingyaoww 7f8cd32
Update openhands-sdk/openhands/sdk/conversation/impl/local_conversati…
xingyaoww f223f05
revert loop
xingyaoww 767741e
Merge commit '7f8cd32533928a41f00e06675d003d3b2c34cc92' into xw/compl…
xingyaoww 39db2f3
move imports
xingyaoww 1753bbc
Revert "move imports"
xingyaoww 48584ab
add a comment
xingyaoww 8ee4341
report example cost
xingyaoww cd1bbb0
revert tests for responses API which is not implemnted yet
xingyaoww ca4418e
Fix failing tests to match streaming implementation
openhands-agent c7819bf
Replace Responses API streaming tests with Chat Completion streaming …
openhands-agent ccfb3e6
Merge branch 'main' into xw/completions-streaming
xingyaoww 8be913e
Remove unnecessary metadata mocking from test_agent_utils
openhands-agent File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| import os | ||
| import sys | ||
| from typing import Literal | ||
|
|
||
| from pydantic import SecretStr | ||
|
|
||
| from openhands.sdk import ( | ||
| Conversation, | ||
| get_logger, | ||
| ) | ||
| from openhands.sdk.llm import LLM | ||
| from openhands.sdk.llm.streaming import ModelResponseStream | ||
| from openhands.tools.preset.default import get_default_agent | ||
|
|
||
|
|
||
| logger = get_logger(__name__) | ||
|
|
||
|
|
||
| api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") | ||
| if not api_key: | ||
| raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") | ||
|
|
||
| model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") | ||
| base_url = os.getenv("LLM_BASE_URL") | ||
| llm = LLM( | ||
| model=model, | ||
| api_key=SecretStr(api_key), | ||
| base_url=base_url, | ||
| usage_id="stream-demo", | ||
| stream=True, | ||
| ) | ||
|
|
||
| agent = get_default_agent(llm=llm, cli_mode=True) | ||
|
|
||
|
|
||
| # Define streaming states | ||
| StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] | ||
| # Track state across on_token calls for boundary detection | ||
| _current_state: StreamingState | None = None | ||
|
|
||
|
|
||
| def on_token(chunk: ModelResponseStream) -> None: | ||
| """ | ||
| Handle all types of streaming tokens including content, | ||
| tool calls, and thinking blocks with dynamic boundary detection. | ||
| """ | ||
| global _current_state | ||
|
|
||
| choices = chunk.choices | ||
| for choice in choices: | ||
| delta = choice.delta | ||
| if delta is not None: | ||
| # Handle thinking blocks (reasoning content) | ||
| reasoning_content = getattr(delta, "reasoning_content", None) | ||
| if isinstance(reasoning_content, str) and reasoning_content: | ||
| if _current_state != "thinking": | ||
| if _current_state is not None: | ||
| sys.stdout.write("\n") | ||
| sys.stdout.write("THINKING: ") | ||
| _current_state = "thinking" | ||
| sys.stdout.write(reasoning_content) | ||
| sys.stdout.flush() | ||
|
|
||
| # Handle regular content | ||
| content = getattr(delta, "content", None) | ||
| if isinstance(content, str) and content: | ||
| if _current_state != "content": | ||
| if _current_state is not None: | ||
| sys.stdout.write("\n") | ||
| sys.stdout.write("CONTENT: ") | ||
| _current_state = "content" | ||
| sys.stdout.write(content) | ||
| sys.stdout.flush() | ||
|
|
||
| # Handle tool calls | ||
| tool_calls = getattr(delta, "tool_calls", None) | ||
| if tool_calls: | ||
| for tool_call in tool_calls: | ||
| tool_name = ( | ||
| tool_call.function.name if tool_call.function.name else "" | ||
| ) | ||
| tool_args = ( | ||
| tool_call.function.arguments | ||
| if tool_call.function.arguments | ||
| else "" | ||
| ) | ||
| if tool_name: | ||
| if _current_state != "tool_name": | ||
| if _current_state is not None: | ||
| sys.stdout.write("\n") | ||
| sys.stdout.write("TOOL NAME: ") | ||
| _current_state = "tool_name" | ||
| sys.stdout.write(tool_name) | ||
| sys.stdout.flush() | ||
| if tool_args: | ||
| if _current_state != "tool_args": | ||
| if _current_state is not None: | ||
| sys.stdout.write("\n") | ||
| sys.stdout.write("TOOL ARGS: ") | ||
| _current_state = "tool_args" | ||
| sys.stdout.write(tool_args) | ||
| sys.stdout.flush() | ||
|
|
||
|
|
||
| conversation = Conversation( | ||
| agent=agent, | ||
| workspace=os.getcwd(), | ||
| token_callbacks=[on_token], | ||
| ) | ||
|
|
||
| story_prompt = ( | ||
| "Tell me a long story about LLM streaming, write it a file, " | ||
| "make sure it has multiple paragraphs. " | ||
| ) | ||
| conversation.send_message(story_prompt) | ||
| print("Token Streaming:") | ||
| print("-" * 100 + "\n") | ||
| conversation.run() | ||
|
|
||
| cleanup_prompt = ( | ||
| "Thank you. Please delete the streaming story file now that I've read it, " | ||
| "then confirm the deletion." | ||
| ) | ||
| conversation.send_message(cleanup_prompt) | ||
| print("Token Streaming:") | ||
| print("-" * 100 + "\n") | ||
| conversation.run() | ||
|
|
||
| # Report cost | ||
| cost = llm.metrics.accumulated_cost | ||
| print(f"EXAMPLE_COST: {cost}") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think this maybe belongs in the visualizer? Otherwise it doesn't work for anything else, every client code needs to rewrite this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting streaming supported in visualizer is probably a bit too advanced (we need to figure out edge cases and come up w/ a standard data structure in streaming responses: different model might return different things - litellm did not unify this) -- in this PR i was mainly hoping to get the scaffold / initial MVP for streaming and not to go too deep into the rabbit hole to keep the PR size /scope reasonable 🤣
Maybe we can do a visualizer in later PR, while in the meantime it'll be good to at least have some level of streaming ability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I was actually thinking about this in the back of my mind and I think that is totally the way to go. Let's get this in, and take it from here.
The essential structure is fine, and as it is, I think maybe it unlocks some potential for client developers to build further or improve.