Add Codex CLI backend and reasoning-effort support#3
Conversation
|
Can you have opus be the planner?And 5.4, be the worker? |
|
hey thanks for this ! just a question, is there any reason you're using I feel, it is built for use cases exactly this. Also, |
|
Yes I'm using opus planner and missing the streaming on my codex worker. Submit your pr |
|
Do you think it makes sense to have separate reasoning level for the verifier? I'm thinking using gpt 5.4 higher worker with extra high verifier might be useful and cost effective? Xhigh will often run into timeouts and context limit lengths in my experience on some hard prompts... Verification is easier though, and we want to be more sure it is correct. |
|
@rnbguy Thanks. The main reason was scope and integration cost, not that I think openprover already had a fairly simple "one call in/one result out" CLI wrapper shape for Claude, with per-call archiving, subprocess isolation and no long-lived backend process. I agree with your point about the tradeoff though: I have not reviewed your refactor in detail yet but that direction seems very plausible. If maintainers prefer the |
|
@jjoshua2 I think that makes sense. Conceptually So I could definitely see value in something like That being said, if the verifier is mostly returning Still, your result is useful signal. I would be in favor of treating verifier effort as a separate setting rather than coupling it to |
|
I've actually had it find a lot of mistakes now that I've been using xhigh
on verifier for 20 rounds. I only tested 5 without it so with my small
sample size it had helped.
…On Wed, Mar 25, 2026, 11:06 AM Mounir IDRASSI ***@***.***> wrote:
*idrassi* left a comment (Kripner/openprover#3)
<#3 (comment)>
@jjoshua2 <https://github.com/jjoshua2> I think that makes sense.
Conceptually verifier_reasoning_effort seems reasonable for exactly the
reason you mentioned: worker search can be broad/expensive, verifier
prompts are usually narrower and spending more reasoning budget on
verification than on generation can be a good tradeoff.
So I could definitely see value in something like
--verifier-reasoning-effort, maybe later even --verifier-model.
That being said, if the verifier is mostly returning true both before and
after the change, then the main bottleneck may not be effort alone. It may
also mean we need a stricter verifier prompt/protocol, because extra
reasoning only helps if the verifier is actually incentivized to search for
flaws rather than mostly confirm.
Still, your result is useful signal. I would be in favor of treating
verifier effort as a separate setting rather than coupling it to
the worker long term.
—
Reply to this email directly, view it on GitHub
<#3?email_source=notifications&email_token=ADXIQNGSTDJEZBBV75GGFXT4SPYYJA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJSG4ZTIOJUGM32M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4127349437>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXIQNEHNYURUJN3JVGGBD34SPYYJAVCNFSM6AAAAACW5ADJKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMRXGM2DSNBTG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
# Conflicts: # README.md # openprover/cli.py # openprover/llm/__init__.py # openprover/llm/claude.py # openprover/prover.py
|
I have pushed changes to solve conflicts that were blocking the merge |
|
@idrassi do you want to colloborate on the proof constant limit improvements for 838 I found? |
|
@jjoshua2 Sorry for the late answer. No issues to collaborate, it is just that I cannot guarantee a bandwidth because of other activities. How to want to manage collaboration? Maybe a private repo to exchange dataset and progress? |

Closes #2.
Summary
This PR adds OpenAI Codex CLI as a new OpenProver backend and separates backend selection from model selection, so Codex can use explicit model ids
such as
gpt-5.4andgpt-5.2.It also adds reasoning-effort support for Claude and Codex backends.
This branch has also been merged with the current
master, and the Codex changes were reconciled with the latest CLI/prover updates.What changed
CodexClientbased oncodex exec --json--provider,--planner-provider, and--worker-provider--provider codex --model <model>and--model codex:<model>--reasoning-effort,--planner-reasoning-effort, and--worker-reasoning-effortliterature_searchNotes
--model codexselects the Codex backend and uses the Codex CLI default configured modelexec --jsondoes not stream partial assistant text, so Codex soft interrupt is advisory and preserves the current response instead of truncating itmaster, this PR now coexists with the current backend set and CLI behaviorValidation
Tested with targeted pytest coverage and Ubuntu Linux 24.04 / WSL smoke runs, including:
and Lean formalization / verification flow:
python -m openprover --theorem examples/cauchy_schwarz.md \ --lean-project ~/mathlib4 \ --lean-theorem examples/cauchy_schwarz.lean \ --proof runs/for-any-real-numbers-a-1-ldots-a-n-20260324-113109/PROOF.md \ --model codex:gpt-5.4 \ --headless \ --reasoning-effort xhighAlso re-validated after merging latest master with: