perf(evaluate): optimize reference product evaluation with concurrency#4
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58099d3a7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) | ||
|
|
||
| with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor: | ||
| evaluated = list(executor.map(process_case, suite["cases"])) |
There was a problem hiding this comment.
Stop scheduling cases after the first adapter failure
In a multi-case reference-product suite where the endpoint times out or returns an invalid contract, this executor.map call submits the whole suite before any result is inspected, and list(...) exits the with only after those queued HTTP calls finish. That regresses the fail-closed path from stopping on the first bad response to spending up to one timeout per batch, and it can send later prompts even after an earlier case has already made the run invalid.
Useful? React with 👍 / 👎.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This PR fixes an unoptimized path in reference_product.py where cases were being evaluated sequentially. It introduces concurrent.futures.ThreadPoolExecutor to evaluate cases concurrently.