Skip to content

feat(cerebras, xai): expose reasoning_format for parity with reasoning_effort#6106

Open
anshulkulhari7 wants to merge 3 commits into
livekit:mainfrom
anshulkulhari7:feat/5989-reasoning-format
Open

feat(cerebras, xai): expose reasoning_format for parity with reasoning_effort#6106
anshulkulhari7 wants to merge 3 commits into
livekit:mainfrom
anshulkulhari7:feat/5989-reasoning-format

Conversation

@anshulkulhari7

Copy link
Copy Markdown
Contributor

Summary

Closes #5989.

Reasoning models such as gpt-oss-120b served by Cerebras or xAI (Grok) stream their thinking tokens as part of the chat-completions response. Today there is no way to tell those providers to keep that internal monologue out of the message content, so the TTS pipeline ends up reading the model's raw reasoning aloud.

Both providers support a reasoning_format request field ("parsed", "raw", "hidden") that controls where reasoning ends up. This PR exposes it on the relevant LLM constructors, mirroring exactly how reasoning_effort is already plumbed.

Changes

  • livekit-plugins-openai (llm.py)
    • Add a ReasoningFormat = Literal["parsed", "raw", "hidden"] type.
    • Add reasoning_format to _LLMOptions, the LLM.__init__ signature, and the LLM.with_x_ai(...) factory (xAI/Grok chat completions path).
    • In chat(), forward it to the request. Because reasoning_format is a provider-specific body field and not an OpenAI SDK keyword argument (passing it top-level raises TypeError: unexpected keyword argument), it is merged into extra_body — the same mechanism already used for OpenRouter-specific body fields. The user-supplied extra_body dict is never mutated in place.
  • livekit-plugins-cerebras (llm.py)
    • Add reasoning_format to the Cerebras LLM.__init__ and forward it to the base class.

The change is additive and backward-compatible: when reasoning_format is not set, nothing is added to the request.

How reasoning_format reaches the API

reasoning_effort is a named param in the OpenAI Python SDK (AsyncCompletions.create), so it is passed top-level. reasoning_format is not, so it must travel through extra_body. Verified empirically against openai==2.40.0:

top-level reasoning_format: TypeError -> AsyncCompletions.create() got an unexpected keyword argument 'reasoning_format'
extra_body call body has reasoning_format: True

Accepted values follow the Cerebras reasoning docs (parsed / raw / hidden); xAI/Grok uses the same field.

Testing

Added tests/test_plugin_reasoning_format.py (unit, no live keys — mocks nothing beyond constructing the LLM and inspecting the request kwargs the stream would send):

  • reasoning_format set on Cerebras LLM lands in extra_body of the outgoing request.
  • reasoning_format set via LLM.with_x_ai(...) lands in extra_body.
  • When unset, no reasoning_format is added.
$ uv run pytest tests/test_plugin_reasoning_format.py --unit
tests/test_plugin_reasoning_format.py::test_cerebras_reasoning_format_in_request PASSED
tests/test_plugin_reasoning_format.py::test_cerebras_reasoning_format_omitted_by_default PASSED
tests/test_plugin_reasoning_format.py::test_xai_reasoning_format_in_request PASSED
3 passed

Quality gates on the changed files:

  • uv run ruff check — All checks passed
  • uv run ruff format --check — already formatted
  • uv run mypy -p livekit.plugins.openai -p livekit.plugins.cerebras (strict) — Success: no issues found
  • Full uv run pytest --unit gate: no new failures (the pre-existing harness errors in tests/concurrency.py reproduce identically on a clean upstream/main checkout).

AI-assisted: this change was prepared with AI assistance and reviewed by the author.

…g_effort

Reasoning models such as gpt-oss-120b served by Cerebras or xAI (Grok) stream
their thinking tokens as part of the response. Without a way to suppress them,
the TTS pipeline reads the model's internal monologue aloud.

Add a reasoning_format parameter to the OpenAI-compatible LLM, the Cerebras
LLM, and LLM.with_x_ai, mirroring how reasoning_effort is plumbed. Because
reasoning_format is a provider-specific body field (not an OpenAI SDK
argument), it is forwarded via extra_body. Accepted values are 'parsed',
'raw', and 'hidden'.

Closes livekit#5989
@anshulkulhari7 anshulkulhari7 requested a review from a team as a code owner June 15, 2026 09:58
devin-ai-integration[bot]

This comment was marked as resolved.

The with_cerebras() factory accepted reasoning_effort but not
reasoning_format, so users could not pass it even though the LLM
docstring lists Cerebras (gpt-oss-120b) as a supported provider.
Add the parameter to the signature and forward it to LLM(), matching
with_x_ai() and the standalone Cerebras plugin.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Caller-provided extra_body in extra_kwargs is silently overridden by opts.extra_body

Pre-existing behavior: if a caller passes extra_kwargs={"extra_body": {...}} to chat() AND the LLM was constructed with an extra_body option, lines 975-980 first apply extra_kwargs then unconditionally overwrite extra["extra_body"] with self._opts.extra_body. This means the caller's extra_body is silently lost. This is not introduced by this PR (it's pre-existing), but the new reasoning_format feature makes it more likely users will interact with extra_body indirectly. Currently no callers in the codebase appear to hit this conflict, but it could surprise external users.

(Refers to lines 975-980)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

``api_key`` must be set to your OpenAI API key, either using the argument or by setting the
``OPENAI_API_KEY`` environmental variable.

``reasoning_format`` controls how reasoning models (e.g. ``gpt-oss-120b`` served by

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reword the description. it makes it sound like gpt-oss-120b can be served by xAI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded in 258fde2 — now reads "gpt-oss-120b on Cerebras, or Grok on xAI", so it no longer implies xAI serves gpt-oss-120b.

``OPENAI_API_KEY`` environmental variable.

``reasoning_format`` controls how reasoning models (e.g. ``gpt-oss-120b`` served by
Cerebras or xAI) return their thinking tokens. Set it to ``"hidden"`` or ``"parsed"`` to

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we handle parsed? if we are parsing it, there should be a way to expose it to the end user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — right now the plugin doesn't surface the parsed reasoning. ChoiceDelta only has role/content/tool_calls/extra (no dedicated reasoning field), and the OpenAI stream handler doesn't read a reasoning/reasoning_content field, so with reasoning_format="parsed" the separated reasoning would currently be dropped.

My instinct is to surface it through delta.extra (the same provider-extra channel already used for xAI encrypted reasoning) rather than adding a new field — but how would you prefer reasoning to be exposed to the end user? Happy to wire it up that way in this PR, or drop "parsed" here and do the exposure as a focused follow-up if you'd rather keep this one to the request-param plumbing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose reasoning_format for Cerebras and xAI (Grok) — parity with reasoning_effort

2 participants