Conversation
There was a problem hiding this comment.
💡 Codex Review
CloverLM/quartet2/python/quartet2/linear.py
Lines 695 to 696 in bb81554
When WUSH is enabled, weight is first transformed (apply_block_transform) and then quantized, but this branch reuses the cached weight_amax whenever it is set. In training, register_optimizer_hook populates weight_abs_max from the untransformed parameter, so after the first optimizer step the forward path quantizes transformed weights with a stale/unrelated max value. This mis-scales FP4 quantization and can materially distort training results whenever --wush is on.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| current_batch = checkpoint_dict["checkpoint"].train_batch | ||
| if current_batch > 1 and ((current_batch % args.checkpoint_freq == 0) or (current_batch == last_batch)): | ||
| if args.checkpoint_freq != utils.INF and current_batch > 1 and ((current_batch % args.checkpoint_freq == 0) or (current_batch == last_batch)): |
There was a problem hiding this comment.
Preserve final checkpoint save when checkpoint_freq is INF
This condition now gates all checkpoint writes behind args.checkpoint_freq != utils.INF. Because the parser default is utils.INF, runs with default settings no longer save even the final-step checkpoint (current_batch == last_batch), which breaks expected resumability/artifact persistence for normal training invocations.
Useful? React with 👍 / 👎.
|
ClimbMix5B training status (2026-04-26 UTC):
|
Summary
Note: this branch preserves the WUSH/train changes that were already present in the working tree before the GridFlip quantizer work.
Validation
uv pip install --python .venv/bin/python -e ./quartet2CUDA_VISIBLE_DEVICES=0 .venv/bin/python -m pytest quartet2/test/test_qutlass_backend.py -q-> 10 passedCUDA_VISIBLE_DEVICES=0 .venv/bin/python -m pytest quartet2/test/test_linear.py -k "compile_fwd or result_fwd or autocast" -q-> 3 passed, 7 deselectedgit diff --checkBenchmark Smoke
4096x4096 weight quantization wall time:
2048x2048 matmul wall time: