You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: llm_streaming_refactor_plan.md
+13-16Lines changed: 13 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,16 +74,12 @@ Keeping the raw LiteLLM payload inside each `LLMStreamChunk` means we do **not**
74
74
75
75
## Visualization strategy
76
76
77
-
1.**Track a hierarchy per conversation event.** When a LiteLLM stream begins we emit a placeholder `MessageEvent` (assistant message) or `ActionEvent` (function call). Each `LLMStreamChunk` should include a `response_id`/`item_id` so we can map to the owning conversation event:
78
-
-`output_text` → existing `MessageEvent` for the assistant response.
79
-
-`reasoning_summary_*` → reasoning area inside `MessageEvent`.
80
-
-`function_call_arguments_*` → arguments area inside `ActionEvent`.
81
-
2.**Use `Live` per section.** For each unique `(conversation_event_id, part_kind, item_id)` create a Rich `Live` instance that updates with concatenated text. When the part is terminal, stop the `Live` and leave the final text in place.
82
-
3.**Avoid newlines unless emitted by the model.** We’ll join chunks using plain string concatenation and only add newline characters when the delta contains `\n` or when we intentionally insert separators (e.g., between tool JSON arguments).
83
-
4.**Segregate sections:**
84
-
-`Reasoning:` header per `MessageEvent`. Each new reasoning item gets its own Live line under that message.
85
-
-`Assistant:` body for natural language output, appended inside the message panel.
86
-
-`Function Arguments:` block under each action panel, streaming JSON incrementally.
77
+
We will leave the existing `ConversationVisualizer` untouched for default/legacy usage and introduce a new `StreamingConversationVisualizer` that renders deltas directly inside the final panels:
78
+
79
+
1.**Create/update per-response panels.** The first chunk for a `(response_id, output_index)` pair creates (or reuses) a panel for the assistant message or tool call and immediately starts streaming into it.
80
+
2.**Route text into semantic sections.** Assistant text, reasoning summaries, function-call arguments, tool output, and refusals each update their own section inside the panel.
81
+
3.**Use Rich `Live` when interactive.** In a real terminal we keep the panel on screen and update it in place; when the console is not interactive (tests, logging) we fall back to static updates.
82
+
4.**Leave the panel in place when finished.** When the final chunk arrives we stop updating but keep the panel visible; the subsequent `MessageEvent`/`ActionEvent` is suppressed to avoid duplicate re-rendering.
87
83
88
84
## Implementation roadmap
89
85
@@ -95,11 +91,11 @@ Keeping the raw LiteLLM payload inside each `LLMStreamChunk` means we do **not**
95
91
- When we enqueue the initial `MessageEvent`/`ActionEvent`, cache a lookup (e.g., `inflight_streams[(response_id, output_index)] = conversation_event_id`).
96
92
- Update `LocalConversation` token callback wrapper to attach the resolved conversation event ID onto the `LLMStreamChunk` before emitting/persisting.
97
93
98
-
3.**Visualizer rewrite**
99
-
-Maintain `self._stream_views[(conversation_event_id, part_kind, item_id)] = LiveState` where `LiveState` wraps buffer, style, and a `Live` instance.
100
-
-On streaming updates: update buffer, `live.update(Text(buffer, style=...))` without printing newlines.
101
-
-On final chunk: stop `Live`, render final static text, and optionally record in conversation state for playback.
102
-
-Ensure replay (when visualizer processes stored events) converts stored parts into final text as well.
94
+
3.**Streaming visualizer**
95
+
-Implement `StreamingConversationVisualizer` with lightweight session tracking (keyed by response/output) that owns Rich panels for streaming sections.
96
+
-Stream updates into the same panel that will remain visible after completion; use `Live` only when running in an interactive terminal.
97
+
-Suppress duplicate rendering when the final `MessageEvent`/`ActionEvent` arrives, since the streamed panel already contains the content.
98
+
-Provide a factory helper (e.g., `create_streaming_visualizer`) for callers that want the streaming experience.
103
99
104
100
4.**Persistence / tests**
105
101
- Update tests to ensure:
@@ -117,5 +113,6 @@ Keeping the raw LiteLLM payload inside each `LLMStreamChunk` means we do **not**
117
113
-[ ] Refactor classifier to output `LLMStreamChunk` objects with clear `part_kind`.
118
114
-[ ] Track in-flight conversation events so parts know their owner.
119
115
-[ ] Replace print-based visualizer streaming with `Live` blocks per section.
120
-
-[ ] Extend unit tests to cover multiple messages, reasoning segments, and tool calls.
116
+
-[ ] Extend unit tests to cover multiple messages, reasoning segments, tool calls, and the new streaming visualizer.
117
+
-[ ] Update the standalone streaming example to wire in the streaming visualizer helper.
121
118
-[ ] Manually validate with long streaming example to confirm smooth in-place updates.
0 commit comments