Add IS evaluator #432

tscholak · 2025-12-21T05:21:39Z

No description provided.

Adds a new evaluator type that computes forward KL divergence by comparing student log-probs against pre-computed teacher log-probs from a HuggingFace dataset of traces. The evaluator bypasses Fast-LLM's data pipeline and loads traces directly, making it suitable for monitoring distillation quality during training. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Replace HuggingFace wrapper with native Fast-LLM inference path: - Use InferenceRunner for forward passes instead of HF model wrapper - Create LanguageModelBatch from trace data with proper padding - Handle variable-length sequences via TokenSample lengths - Use preprocess_batch for attention mask handling This approach works for all model types including linear attention. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add max_sequence_length config field (defaults to model's position embedding limit) - Skip traces exceeding max length with warning and count - Set global_logits=True for correct tensor-parallel behavior - Report number of skipped traces in output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add full support for TP, SP, PP, and DP parallelism modes - Use training's sequence_length instead of separate max_sequence_length - Use GPTBatchConfig for proper SP sequence splitting - Add HuggingFace dataset sharding for efficient DP distribution - Add all_reduce across data_group and pipeline_group - Fix device mismatch bug (move targets to GPU) - Use AttentionKwargs.sequence_first constant 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Store raw logits unconditionally when global_logits=True in _logits_cross_entropy_forward_backward, fixing ForwardKL evaluation during distillation training where targets is never None. Also cleaned up ForwardKL evaluator: - Use GPTInferenceRunner instead of generic InferenceRunner - Add shuffle with configurable seed for reproducibility - Add split/seed config fields (replaced task field) - Proper padding via get_padding() and from_documents() - Remove memory tracking tooling, keep gc.collect cleanup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace forward KL with importance-weighted accuracy and effective sample size - Shard by problem_id hash (not trace index) so each rank gets complete problems - Add TraceTensors dataclass with smart constructors (empty, from_traces) - Vectorize log prob computation using F.cross_entropy with completion mask - Add _scatter_logsumexp for numerically stable grouped reductions - Use allreduce_scalar for cleaner distributed reduction - Pre-tensorize all trace data for efficient batch slicing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tscholak and others added 5 commits December 21, 2025 03:29

Make max_sequence_length mandatory with default 2048

96baac6

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tscholak requested a review from oleksost December 21, 2025 05:21

tscholak changed the base branch from main to feature/cache-refactor-and-qwen2 December 21, 2025 05:21

tscholak and others added 2 commits December 22, 2025 22:36

tscholak changed the title ~~Add forward kl evaluator~~ Add IS evaluator Dec 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add IS evaluator #432

Add IS evaluator #432

Uh oh!

tscholak commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add IS evaluator #432

Are you sure you want to change the base?

Add IS evaluator #432

Uh oh!

Conversation

tscholak commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants