feat(cerebras, xai): expose reasoning_format for parity with reasoning_effort#6106
feat(cerebras, xai): expose reasoning_format for parity with reasoning_effort#6106anshulkulhari7 wants to merge 3 commits into
Conversation
…g_effort Reasoning models such as gpt-oss-120b served by Cerebras or xAI (Grok) stream their thinking tokens as part of the response. Without a way to suppress them, the TTS pipeline reads the model's internal monologue aloud. Add a reasoning_format parameter to the OpenAI-compatible LLM, the Cerebras LLM, and LLM.with_x_ai, mirroring how reasoning_effort is plumbed. Because reasoning_format is a provider-specific body field (not an OpenAI SDK argument), it is forwarded via extra_body. Accepted values are 'parsed', 'raw', and 'hidden'. Closes livekit#5989
The with_cerebras() factory accepted reasoning_effort but not reasoning_format, so users could not pass it even though the LLM docstring lists Cerebras (gpt-oss-120b) as a supported provider. Add the parameter to the signature and forward it to LLM(), matching with_x_ai() and the standalone Cerebras plugin.
There was a problem hiding this comment.
🚩 Caller-provided extra_body in extra_kwargs is silently overridden by opts.extra_body
Pre-existing behavior: if a caller passes extra_kwargs={"extra_body": {...}} to chat() AND the LLM was constructed with an extra_body option, lines 975-980 first apply extra_kwargs then unconditionally overwrite extra["extra_body"] with self._opts.extra_body. This means the caller's extra_body is silently lost. This is not introduced by this PR (it's pre-existing), but the new reasoning_format feature makes it more likely users will interact with extra_body indirectly. Currently no callers in the codebase appear to hit this conflict, but it could surprise external users.
(Refers to lines 975-980)
Was this helpful? React with 👍 or 👎 to provide feedback.
| ``api_key`` must be set to your OpenAI API key, either using the argument or by setting the | ||
| ``OPENAI_API_KEY`` environmental variable. | ||
|
|
||
| ``reasoning_format`` controls how reasoning models (e.g. ``gpt-oss-120b`` served by |
There was a problem hiding this comment.
nit: reword the description. it makes it sound like gpt-oss-120b can be served by xAI
There was a problem hiding this comment.
Reworded in 258fde2 — now reads "gpt-oss-120b on Cerebras, or Grok on xAI", so it no longer implies xAI serves gpt-oss-120b.
| ``OPENAI_API_KEY`` environmental variable. | ||
|
|
||
| ``reasoning_format`` controls how reasoning models (e.g. ``gpt-oss-120b`` served by | ||
| Cerebras or xAI) return their thinking tokens. Set it to ``"hidden"`` or ``"parsed"`` to |
There was a problem hiding this comment.
how do we handle parsed? if we are parsing it, there should be a way to expose it to the end user.
There was a problem hiding this comment.
Good catch — right now the plugin doesn't surface the parsed reasoning. ChoiceDelta only has role/content/tool_calls/extra (no dedicated reasoning field), and the OpenAI stream handler doesn't read a reasoning/reasoning_content field, so with reasoning_format="parsed" the separated reasoning would currently be dropped.
My instinct is to surface it through delta.extra (the same provider-extra channel already used for xAI encrypted reasoning) rather than adding a new field — but how would you prefer reasoning to be exposed to the end user? Happy to wire it up that way in this PR, or drop "parsed" here and do the exposure as a focused follow-up if you'd rather keep this one to the request-param plumbing.
…b on Cerebras, Grok on xAI)
Summary
Closes #5989.
Reasoning models such as
gpt-oss-120bserved by Cerebras or xAI (Grok) stream their thinking tokens as part of the chat-completions response. Today there is no way to tell those providers to keep that internal monologue out of the messagecontent, so the TTS pipeline ends up reading the model's raw reasoning aloud.Both providers support a
reasoning_formatrequest field ("parsed","raw","hidden") that controls where reasoning ends up. This PR exposes it on the relevant LLM constructors, mirroring exactly howreasoning_effortis already plumbed.Changes
livekit-plugins-openai(llm.py)ReasoningFormat = Literal["parsed", "raw", "hidden"]type.reasoning_formatto_LLMOptions, theLLM.__init__signature, and theLLM.with_x_ai(...)factory (xAI/Grok chat completions path).chat(), forward it to the request. Becausereasoning_formatis a provider-specific body field and not an OpenAI SDK keyword argument (passing it top-level raisesTypeError: unexpected keyword argument), it is merged intoextra_body— the same mechanism already used for OpenRouter-specific body fields. The user-suppliedextra_bodydict is never mutated in place.livekit-plugins-cerebras(llm.py)reasoning_formatto the CerebrasLLM.__init__and forward it to the base class.The change is additive and backward-compatible: when
reasoning_formatis not set, nothing is added to the request.How
reasoning_formatreaches the APIreasoning_effortis a named param in the OpenAI Python SDK (AsyncCompletions.create), so it is passed top-level.reasoning_formatis not, so it must travel throughextra_body. Verified empirically againstopenai==2.40.0:Accepted values follow the Cerebras reasoning docs (
parsed/raw/hidden); xAI/Grok uses the same field.Testing
Added
tests/test_plugin_reasoning_format.py(unit, no live keys — mocks nothing beyond constructing the LLM and inspecting the request kwargs the stream would send):reasoning_formatset on CerebrasLLMlands inextra_bodyof the outgoing request.reasoning_formatset viaLLM.with_x_ai(...)lands inextra_body.reasoning_formatis added.Quality gates on the changed files:
uv run ruff check— All checks passeduv run ruff format --check— already formatteduv run mypy -p livekit.plugins.openai -p livekit.plugins.cerebras(strict) — Success: no issues founduv run pytest --unitgate: no new failures (the pre-existing harness errors intests/concurrency.pyreproduce identically on a cleanupstream/maincheckout).AI-assisted: this change was prepared with AI assistance and reviewed by the author.