Skip to content

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Jan 3, 2026

Description

Introduces proper support for rollouts to include final_env_response in state, which can be set in env_response, resulting in early termination mid-step.

We treat these as cosmetic/metadata, accessible by Rubric classes + for logging, but not included in trajectory (as token representations are never materialized).

Resolves #671 ; replaces #673

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Implements a clean termination path for multi-turn rollouts when the environment produces a final message.

  • Core changes: add state["final_env_response"] (initialized in Environment.init_state), new stop condition has_final_env_response, rollout loop skips extra model call when set, and _render_completion appends the final env message for scoring while keeping it out of the trajectory
  • Wordle: on game completion, sets final_env_response and returns it; minor typing/casting cleanups
  • Docs: refine MultiTurnEnv example, make ToolEnv sample tool async, and add a "Final Environment Responses" section describing the pattern and its effects
  • Tests: add coverage for early stop and completion inclusion; mark env package tests as slow; minor warning-filter tweak in pyproject.toml

Written by Cursor Bugbot for commit 6fcf399. This will update automatically on new commits. Configure here.

@willccbb willccbb marked this pull request as ready for review January 3, 2026 04:11
@willccbb willccbb changed the title final_messages pattern; updated docs/tests/example final_env_response pattern; updated docs/tests/example Jan 3, 2026
@willccbb willccbb merged commit 2d532a2 into main Jan 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop condition checked before tool execution causes one extra model turn

3 participants