Skip to content

chore(deps-dev): update trl requirement from >=0.8 to >=1.0.0#29

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-gte-1.0.0
Open

chore(deps-dev): update trl requirement from >=0.8 to >=1.0.0#29
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-gte-1.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Apr 11, 2026

Updates the requirements on trl to permit the latest version.

Release notes

Sourced from trl's releases.

v1.0.0

Read our blog post for an overview of TRL v1.

Features

Asynchronous GRPO

Asynchronous GRPO decouples generation from the gradient update loop by offloading rollouts to an external vLLM server. Generation runs in parallel while training continues, eliminating idle GPU time and improving hardware utilization.

from trl.experimental.async_grpo import AsyncGRPOTrainer
from trl.rewards import accuracy_reward
from datasets import load_dataset
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = AsyncGRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()

by @​qgallouedec in huggingface/trl#5293

Variational Sequence-Level Soft Policy Optimization (VESPO)

VESPO addresses training instability in off-policy RL caused by policy staleness, asynchronous updates, and train-inference mismatches. Rather than relying on heuristic token-level clipping (GRPO) or sequence-length normalization (GSPO), VESPO derives a principled reshaping kernel from a variational framework. In practice, this yields a smooth, asymmetric Gamma weighting function that gracefully suppresses extreme sequence-level importance weights without introducing length bias. It can be enabled via the loss_type parameter of GRPOConfig:

from trl import GRPOConfig, GRPOTrainer
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(loss_type="vespo"),
...
)

by @​casinca in huggingface/trl#5199

Divergence Proximal Policy Optimization (DPPO)

... (truncated)

Commits
  • f3e9ac1 Release: v1.0 (#5409)
  • e8d5dfc Add second version of Qwen 3.5 chat template to chat_template_utils (#5405)
  • 71ff6a2 Add HF_TOKEN environment variable to workflow files (#5397)
  • 1ee3975 Add vLLM inference to the Base Self-Distillation Trainer (#5388)
  • 79e6e79 Move disable_config=True from generate to GenerationConfig (#5384)
  • 83d68dd chore: update pr_template_check.yml (#5393)
  • 4cb7ab1 Enhance PR template check to exclude reopened PRs from first-time contributor...
  • 32a40bf Enforce PR template for first-time contributors and document AI usage policy ...
  • 8e69b68 Mark test_rloo[fsdp2] as xfail for transformers 5.4.0 (#5387)
  • c264266 Remove deprecated TRACKIO_SPACE_ID env var from all scripts (#5365)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version.
- [Release notes](https://github.com/huggingface/trl/releases)
- [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md)
- [Commits](huggingface/trl@v0.8.0...v1.0.0)

---
updated-dependencies:
- dependency-name: trl
  dependency-version: 1.0.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 11, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 11, 2026

Greptile Summary

This PR bumps the trl lower bound from >=0.8 to >=1.0.0. The code in src/hippofloop/training/trainer.py passes a deprecated keyword argument to SFTTrainer that was removed in TRL v1.0.0, so every training run will raise a TypeError until the call site is updated (see inline comment).

Confidence Score: 4/5

Not safe to merge without fixing the removed SFTTrainer argument in trainer.py, which breaks all training runs under the new TRL version.

One P1 defect: the dependency bump removes an API the codebase actively uses, causing an immediate runtime error on every training invocation. The fix is a one-word rename and carries low overall risk once addressed.

src/hippofloop/training/trainer.py — the SFTTrainer call uses a removed argument name that must be updated.

Important Files Changed

Filename Overview
pyproject.toml Bumps trl lower bound from >=0.8 to >=1.0.0; the new minimum version removes the deprecated tokenizer argument from SFTTrainer, which the codebase still uses.
src/hippofloop/training/trainer.py Passes old tokenizer keyword to SFTTrainer, which was removed in TRL v1.0.0 (now processing_class); breaks training at runtime under the new dependency constraint.

Sequence Diagram

sequenceDiagram
    participant User
    participant UnslothTrainer
    participant TRL as SFTTrainer (trl v1.0.0)

    User->>UnslothTrainer: train(train_data, val_data)
    UnslothTrainer->>TRL: SFTTrainer(model, old_kwarg, ...)
    TRL-->>UnslothTrainer: TypeError: unexpected keyword argument
    Note over TRL: Removed in v1.0.0 — use processing_class param
Loading

Comments Outside Diff (1)

  1. src/hippofloop/training/trainer.py, line 99 (link)

    P1 Removed argument: use processing_class instead

    The tokenizer parameter was deprecated in TRL v0.12.0 and removed in v1.0.0. Under the new trl>=1.0.0 constraint, this line raises TypeError: __init__() got an unexpected keyword argument 'tokenizer' at runtime, breaking every training run. Rename the keyword to processing_class.

Reviews (1): Last reviewed commit: "chore(deps-dev): update trl requirement ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants