Skip to content

feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91

Open
audreyt wants to merge 1 commit into
antirez:mainfrom
audreyt:feat/v1-responses
Open

feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91
audreyt wants to merge 1 commit into
antirez:mainfrom
audreyt:feat/v1-responses

Conversation

@audreyt
Copy link
Copy Markdown

@audreyt audreyt commented May 12, 2026

Implements the Responses API endpoint that Codex CLI (and other modern OpenAI tooling) speaks instead of /v1/chat/completions. The wire format is documented in OpenAI's Responses API; this implementation has been iterated against the Codex CLI binary's SSE parser shape until no remaining schema gaps were found.

Request parsing (parse_responses_request, parse_responses_input):

  • Accepts the typed input array (message, function_call, function_call_output, reasoning, custom_tool_call(_output), local_shell_call(_output), web_search_call(_output), tool_search_call(_output), image_generation_call(_output), compaction, context_compaction).
  • Maps hosted-tool history to function_call/function_call_output so prior actions survive across turns; rejects unknown item types and non-completed status with 400 to avoid silent context loss.
  • Strict content-array parsing: only string|null|array of recognized text blocks (input_text/output_text/text/summary_text/ reasoning_text); rejects non-text modalities (input_image/file/ audio) instead of accepting an empty prompt.
  • Merges adjacent function_call items into the preceding assistant message so text + tool-call turns render as a single assistant block.
  • Honors reasoning.effort (incl. "minimal"/"none") and gates reasoning summary surface on reasoning.summary opt-in.
  • Rejects previous_response_id, conversation, and forced tool_choice explicitly (constrained decoding / persisted state not supported).

Output (responses_sse_*, responses_final_response):

  • Emits the full streaming lifecycle: response.created, output_item.added/.done, reasoning_summary_part.added/.done, reasoning_summary_text.delta/.done, content_part.added/.done, output_text.delta/.done, function_call_arguments.delta/.done, response.completed.
  • Branches the terminal event by finish reason: response.failed for errors and response.incomplete with reason "max_tokens" for length.
  • Every event carries sequence_number; every output_text part carries annotations:[]; function_call output_item.added ships with an empty arguments string (full args arrive via function_call_arguments.done and output_item.done), and item ids are stable across added/done.
  • Tracks whether was actually observed so a truncated stream marks the reasoning item incomplete instead of "completed".
  • Recovers gracefully when the DSML tool parse fails after the model was suppressed at the tool marker: the suppressed tail is flushed as additional output_text deltas so the streamed message matches output_item.done.

Tested by 25 rounds of /codex:adversarial-review against the same client this is meant to feed.

Implements the Responses API endpoint that Codex CLI (and other modern
OpenAI tooling) speaks instead of /v1/chat/completions. The wire format
is documented in OpenAI's Responses API; this implementation has been
iterated against the Codex CLI binary's SSE parser shape until no
remaining schema gaps were found.

Request parsing (parse_responses_request, parse_responses_input):
- Accepts the typed input array (message, function_call,
  function_call_output, reasoning, custom_tool_call(_output),
  local_shell_call(_output), web_search_call(_output),
  tool_search_call(_output), image_generation_call(_output),
  compaction, context_compaction).
- Maps hosted-tool history to function_call/function_call_output so
  prior actions survive across turns; rejects unknown item types and
  non-completed status with 400 to avoid silent context loss.
- Strict content-array parsing: only string|null|array of recognized
  text blocks (input_text/output_text/text/summary_text/
  reasoning_text); rejects non-text modalities (input_image/file/
  audio) instead of accepting an empty prompt.
- Merges adjacent function_call items into the preceding assistant
  message so text + tool-call turns render as a single assistant
  block.
- Honors reasoning.effort (incl. "minimal"/"none") and gates
  reasoning summary surface on reasoning.summary opt-in.
- Rejects previous_response_id, conversation, and forced tool_choice
  explicitly (constrained decoding / persisted state not supported).

Output (responses_sse_*, responses_final_response):
- Emits the full streaming lifecycle: response.created,
  output_item.added/.done, reasoning_summary_part.added/.done,
  reasoning_summary_text.delta/.done, content_part.added/.done,
  output_text.delta/.done, function_call_arguments.delta/.done,
  response.completed.
- Branches the terminal event by finish reason: response.failed for
  errors and response.incomplete with reason "max_tokens" for length.
- Every event carries sequence_number; every output_text part carries
  annotations:[]; function_call output_item.added ships with an empty
  arguments string (full args arrive via function_call_arguments.done
  and output_item.done), and item ids are stable across added/done.
- Tracks whether </think> was actually observed so a truncated stream
  marks the reasoning item incomplete instead of "completed".
- Recovers gracefully when the DSML tool parse fails after the model
  was suppressed at the tool marker: the suppressed tail is flushed
  as additional output_text deltas so the streamed message matches
  output_item.done.

Tested by 25 rounds of /codex:adversarial-review against the same
client this is meant to feed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonw
Copy link
Copy Markdown

simonw commented May 12, 2026

Just came here to feature-request this because I wanted to try Codex CLI.

With this patch applied I think we should be able to add this to ~/.codex/config.toml:

[model_providers.ds4]
name = "DS4"
base_url = "http://127.0.0.1:8000/v1"
wire_api = "responses"
stream_idle_timeout_ms = 1000000

And then run:

codex --model deepseek-v4-flash -c model_provider=ds4

@fry69
Copy link
Copy Markdown

fry69 commented May 12, 2026

FYI: There is the issue #22 which also contains a branch with Responses API implementation (but no PR).

@mitsuhiko
Copy link
Copy Markdown
Contributor

I think it's a bit unfortunate if this ends up with many different protocol versions in the core server because it will be tricky to support them all at the same level of quality without blowing up the complexity. I wonder if it would not be better to have a separate proxy that translates to one common protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants