Fix RL LR-schedule regression: default schedule length to max_train_steps by py4 · Pull Request #4225 · AI-Hypercomputer/maxtext

py4 · 2026-06-22T20:19:32Z

What

PR #4029 made get_optimizer (post_train/rl) size the LR warmup/decay schedule from learning_rate_schedule_steps, with a <= 0 fallback to max_train_steps for the default (-1) case. That fallback is dead code: MaxTextConfig.set_derived_and_validate_values rewrites learning_rate_schedule_steps == -1 to steps (base.yml default 150_001) before get_optimizer runs. So a default RL run sizes warmup to 0.1 * 150_001 = 15_000 steps. On a 500-step run the LR never finishes warming up and is ~300x too low at the same step. rl.yml and rl_mt_jt.yml inherit these defaults and are affected.

Fix

RL's run length is max_train_steps (num_batches * num_iterations * train_fraction * num_epoch), not steps (a pretraining concept). Default the schedule to max_train_steps, and honor learning_rate_schedule_steps only when it diverges from steps (the validator makes them equal exactly when the user left it unset). This restores correct default behavior while preserving the deliberate-override capability #4029 added. The change is RL-local; pretrain/SFT/DPO use create_learning_rate_schedule, where steps is the real run length, and are unaffected.

Tests

Adds tests/post_training/unit/rl_lr_schedule_test.py, which builds the config through the real pyconfig.initialize_pydantic path so the validator runs. The existing TestGetOptimizer uses a SimpleNamespace, which bypasses the validator and is why the regression shipped.

Verified on CPU against post-#4029 code: the regression test fails before the fix (LR reached only 1.08e-08 by step 55 ... learning_rate_schedule_steps in effect=150001) and passes after; the full rl_utils_test.py (28 tests) stays green.

Checklist

Restores pre-get_optimizer: respect learning_rate_schedule_steps config knob #4029 default behavior (schedule tracks the RL run length)
Preserves explicit learning_rate_schedule_steps override
New CPU-only unit test added (tests/post_training/unit/rl_lr_schedule_test.py)
No change to non-RL paths (pretrain/SFT/DPO unaffected)
pyink and pylint clean

google-cla · 2026-06-22T20:20:01Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

codecov · 2026-06-22T20:26:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…teps PR AI-Hypercomputer#4029 made get_optimizer (post_train/rl) size the LR warmup/decay schedule from learning_rate_schedule_steps, with a `<= 0` fallback to max_train_steps for the default (-1) case. That fallback is dead code: MaxTextConfig's validator (set_derived_and_validate_values) rewrites learning_rate_schedule_steps == -1 to `steps` (base.yml default 150_001) before get_optimizer runs. So a default RL run sizes warmup to 0.1 * 150_001 = 15_000 steps; on a 500-step run the LR never finishes warming up and is ~300x too low at the same step. rl.yml and rl_mt_jt.yml inherit these defaults and are affected. RL's run length is max_train_steps (num_batches * num_iterations * train_fraction * num_epoch), not `steps` (a pretraining concept). Default the schedule to max_train_steps and honor learning_rate_schedule_steps only when it diverges from `steps` (the validator makes them equal exactly when the user left it unset), which preserves the deliberate-override capability AI-Hypercomputer#4029 added. The change is RL-local; pretrain/SFT/DPO use create_learning_rate_schedule and are unaffected. Adds tests/post_training/unit/rl_lr_schedule_test.py, which builds the config through the real pyconfig path so the validator runs. The existing TestGetOptimizer uses a SimpleNamespace, which bypasses the validator and is why the regression shipped.

py4 requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, darisoy, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners June 22, 2026 20:19

py4 force-pushed the fix/rl-lr-schedule-steps branch from 6df89c2 to 016a47f Compare June 22, 2026 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RL LR-schedule regression: default schedule length to max_train_steps#4225

Fix RL LR-schedule regression: default schedule length to max_train_steps#4225
py4 wants to merge 1 commit into
AI-Hypercomputer:mainfrom
py4:fix/rl-lr-schedule-steps

py4 commented Jun 22, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 22, 2026

Uh oh!

codecov Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

py4 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Fix

Tests

Checklist

Uh oh!

google-cla Bot commented Jun 22, 2026

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

py4 commented Jun 22, 2026 •

edited

Loading

codecov Bot commented Jun 22, 2026 •

edited

Loading