feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI by audreyt · Pull Request #91 · antirez/ds4

audreyt · 2026-05-12T11:14:16Z

Implements the Responses API endpoint that Codex CLI (and other modern OpenAI tooling) speaks instead of /v1/chat/completions. The wire format is documented in OpenAI's Responses API; this implementation has been iterated against the Codex CLI binary's SSE parser shape until no remaining schema gaps were found.

Request parsing (parse_responses_request, parse_responses_input):

Accepts the typed input array (message, function_call, function_call_output, reasoning, custom_tool_call(_output), local_shell_call(_output), web_search_call(_output), tool_search_call(_output), image_generation_call(_output), compaction, context_compaction).
Maps hosted-tool history to function_call/function_call_output so prior actions survive across turns; rejects unknown item types and non-completed status with 400 to avoid silent context loss.
Strict content-array parsing: only string|null|array of recognized text blocks (input_text/output_text/text/summary_text/ reasoning_text); rejects non-text modalities (input_image/file/ audio) instead of accepting an empty prompt.
Merges adjacent function_call items into the preceding assistant message so text + tool-call turns render as a single assistant block.
Honors reasoning.effort (incl. "minimal"/"none") and gates reasoning summary surface on reasoning.summary opt-in.
Rejects previous_response_id, conversation, and forced tool_choice explicitly (constrained decoding / persisted state not supported).

Output (responses_sse_*, responses_final_response):

Emits the full streaming lifecycle: response.created, output_item.added/.done, reasoning_summary_part.added/.done, reasoning_summary_text.delta/.done, content_part.added/.done, output_text.delta/.done, function_call_arguments.delta/.done, response.completed.
Branches the terminal event by finish reason: response.failed for errors and response.incomplete with reason "max_tokens" for length.
Every event carries sequence_number; every output_text part carries annotations:[]; function_call output_item.added ships with an empty arguments string (full args arrive via function_call_arguments.done and output_item.done), and item ids are stable across added/done.
Tracks whether was actually observed so a truncated stream marks the reasoning item incomplete instead of "completed".
Recovers gracefully when the DSML tool parse fails after the model was suppressed at the tool marker: the suppressed tail is flushed as additional output_text deltas so the streamed message matches output_item.done.

Tested by 25 rounds of /codex:adversarial-review against the same client this is meant to feed.

Implements the Responses API endpoint that Codex CLI (and other modern OpenAI tooling) speaks instead of /v1/chat/completions. The wire format is documented in OpenAI's Responses API; this implementation has been iterated against the Codex CLI binary's SSE parser shape until no remaining schema gaps were found. Request parsing (parse_responses_request, parse_responses_input): - Accepts the typed input array (message, function_call, function_call_output, reasoning, custom_tool_call(_output), local_shell_call(_output), web_search_call(_output), tool_search_call(_output), image_generation_call(_output), compaction, context_compaction). - Maps hosted-tool history to function_call/function_call_output so prior actions survive across turns; rejects unknown item types and non-completed status with 400 to avoid silent context loss. - Strict content-array parsing: only string|null|array of recognized text blocks (input_text/output_text/text/summary_text/ reasoning_text); rejects non-text modalities (input_image/file/ audio) instead of accepting an empty prompt. - Merges adjacent function_call items into the preceding assistant message so text + tool-call turns render as a single assistant block. - Honors reasoning.effort (incl. "minimal"/"none") and gates reasoning summary surface on reasoning.summary opt-in. - Rejects previous_response_id, conversation, and forced tool_choice explicitly (constrained decoding / persisted state not supported). Output (responses_sse_*, responses_final_response): - Emits the full streaming lifecycle: response.created, output_item.added/.done, reasoning_summary_part.added/.done, reasoning_summary_text.delta/.done, content_part.added/.done, output_text.delta/.done, function_call_arguments.delta/.done, response.completed. - Branches the terminal event by finish reason: response.failed for errors and response.incomplete with reason "max_tokens" for length. - Every event carries sequence_number; every output_text part carries annotations:[]; function_call output_item.added ships with an empty arguments string (full args arrive via function_call_arguments.done and output_item.done), and item ids are stable across added/done. - Tracks whether </think> was actually observed so a truncated stream marks the reasoning item incomplete instead of "completed". - Recovers gracefully when the DSML tool parse fails after the model was suppressed at the tool marker: the suppressed tail is flushed as additional output_text deltas so the streamed message matches output_item.done. Tested by 25 rounds of /codex:adversarial-review against the same client this is meant to feed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

simonw · 2026-05-12T14:56:46Z

Just came here to feature-request this because I wanted to try Codex CLI.

With this patch applied I think we should be able to add this to ~/.codex/config.toml:

[model_providers.ds4]
name = "DS4"
base_url = "http://127.0.0.1:8000/v1"
wire_api = "responses"
stream_idle_timeout_ms = 1000000

And then run:

codex --model deepseek-v4-flash -c model_provider=ds4

fry69 · 2026-05-12T15:08:21Z

FYI: There is the issue #22 which also contains a branch with Responses API implementation (but no PR).

mitsuhiko · 2026-05-12T21:48:46Z

I think it's a bit unfortunate if this ends up with many different protocol versions in the core server because it will be tricky to support them all at the same level of quality without blowing up the complexity. I wonder if it would not be better to have a separate proxy that translates to one common protocol.

audreyt mentioned this pull request May 12, 2026

feat(loader): support stock-recipe (Q8_0/F32) abliterated GGUFs end-to-end on Metal #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91

feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91
audreyt wants to merge 1 commit into
antirez:mainfrom
audreyt:feat/v1-responses

audreyt commented May 12, 2026

Uh oh!

simonw commented May 12, 2026 •

edited

Loading

Uh oh!

fry69 commented May 12, 2026

Uh oh!

mitsuhiko commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

audreyt commented May 12, 2026

Uh oh!

simonw commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fry69 commented May 12, 2026

Uh oh!

mitsuhiko commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simonw commented May 12, 2026 •

edited

Loading