chore(deps-dev): update trl requirement from >=0.8 to >=1.0.0 by dependabot[bot] · Pull Request #29 · nvandessel/hippofloop

dependabot · 2026-04-11T05:53:33Z

Updates the requirements on trl to permit the latest version.

Release notes

v1.0.0

Read our blog post for an overview of TRL v1.

Features

Asynchronous GRPO

Asynchronous GRPO decouples generation from the gradient update loop by offloading rollouts to an external vLLM server. Generation runs in parallel while training continues, eliminating idle GPU time and improving hardware utilization.
from trl.experimental.async_grpo import AsyncGRPOTrainer
from trl.rewards import accuracy_reward
from datasets import load_dataset
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = AsyncGRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()
by @qgallouedec in huggingface/trl#5293

Variational Sequence-Level Soft Policy Optimization (VESPO)

VESPO addresses training instability in off-policy RL caused by policy staleness, asynchronous updates, and train-inference mismatches. Rather than relying on heuristic token-level clipping (GRPO) or sequence-length normalization (GSPO), VESPO derives a principled reshaping kernel from a variational framework. In practice, this yields a smooth, asymmetric Gamma weighting function that gracefully suppresses extreme sequence-level importance weights without introducing length bias. It can be enabled via the loss_type parameter of GRPOConfig:
from trl import GRPOConfig, GRPOTrainer
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(loss_type="vespo"),
...
)
by @casinca in huggingface/trl#5199

Divergence Proximal Policy Optimization (DPPO)

... (truncated)

Commits

f3e9ac1 Release: v1.0 (#5409)
e8d5dfc Add second version of Qwen 3.5 chat template to chat_template_utils (#5405)
71ff6a2 Add HF_TOKEN environment variable to workflow files (#5397)
1ee3975 Add vLLM inference to the Base Self-Distillation Trainer (#5388)
79e6e79 Move disable_config=True from generate to GenerationConfig (#5384)
83d68dd chore: update pr_template_check.yml (#5393)
4cb7ab1 Enhance PR template check to exclude reopened PRs from first-time contributor...
32a40bf Enforce PR template for first-time contributors and document AI usage policy ...
8e69b68 Mark test_rloo[fsdp2] as xfail for transformers 5.4.0 (#5387)
c264266 Remove deprecated TRACKIO_SPACE_ID env var from all scripts (#5365)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version. - [Release notes](https://github.com/huggingface/trl/releases) - [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md) - [Commits](huggingface/trl@v0.8.0...v1.0.0) --- updated-dependencies: - dependency-name: trl dependency-version: 1.0.0 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>

greptile-apps · 2026-04-11T05:58:12Z

Greptile Summary

This PR bumps the trl lower bound from >=0.8 to >=1.0.0. The code in src/hippofloop/training/trainer.py passes a deprecated keyword argument to SFTTrainer that was removed in TRL v1.0.0, so every training run will raise a TypeError until the call site is updated (see inline comment).

Confidence Score: 4/5

Not safe to merge without fixing the removed SFTTrainer argument in trainer.py, which breaks all training runs under the new TRL version.

One P1 defect: the dependency bump removes an API the codebase actively uses, causing an immediate runtime error on every training invocation. The fix is a one-word rename and carries low overall risk once addressed.

src/hippofloop/training/trainer.py — the SFTTrainer call uses a removed argument name that must be updated.

Important Files Changed

Filename	Overview
pyproject.toml	Bumps `trl` lower bound from `>=0.8` to `>=1.0.0`; the new minimum version removes the deprecated `tokenizer` argument from `SFTTrainer`, which the codebase still uses.
src/hippofloop/training/trainer.py	Passes old `tokenizer` keyword to SFTTrainer, which was removed in TRL v1.0.0 (now `processing_class`); breaks training at runtime under the new dependency constraint.

Sequence Diagram

sequenceDiagram
    participant User
    participant UnslothTrainer
    participant TRL as SFTTrainer (trl v1.0.0)

    User->>UnslothTrainer: train(train_data, val_data)
    UnslothTrainer->>TRL: SFTTrainer(model, old_kwarg, ...)
    TRL-->>UnslothTrainer: TypeError: unexpected keyword argument
    Note over TRL: Removed in v1.0.0 — use processing_class param

Comments Outside Diff (1)

src/hippofloop/training/trainer.py, line 99 (link)

Removed argument: use processing_class instead

The tokenizer parameter was deprecated in TRL v0.12.0 and removed in v1.0.0. Under the new trl>=1.0.0 constraint, this line raises TypeError: __init__() got an unexpected keyword argument 'tokenizer' at runtime, breaking every training run. Rename the keyword to processing_class.

_{Reviews (1): Last reviewed commit: "chore(deps-dev): update trl requirement ..." | Re-trigger Greptile}

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps-dev): update trl requirement from >=0.8 to >=1.0.0#29

chore(deps-dev): update trl requirement from >=0.8 to >=1.0.0#29
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-gte-1.0.0

dependabot Bot commented on behalf of github Apr 11, 2026

Uh oh!

greptile-apps Bot commented Apr 11, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github Apr 11, 2026

v1.0.0

Features

Asynchronous GRPO

Variational Sequence-Level Soft Policy Optimization (VESPO)

Divergence Proximal Policy Optimization (DPPO)

Uh oh!

greptile-apps Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

greptile-apps Bot commented Apr 11, 2026 •

edited

Loading