Skip to content

transformerless_lm: continuous self-distillation + cycle checkpointing#5

Draft
RandomCoder-lab wants to merge 1 commit into
masterfrom
claude/find-claude-md-arn0F
Draft

transformerless_lm: continuous self-distillation + cycle checkpointing#5
RandomCoder-lab wants to merge 1 commit into
masterfrom
claude/find-claude-md-arn0F

Conversation

@RandomCoder-lab
Copy link
Copy Markdown
Owner

Summary

Follow-up to PR #4 (omniweight loss on training data). After closing the train/inference asymmetry, the natural question was whether the self-distillation ratchet (active_base ← seed + appended best refined outputs) compounds further than the bounded 6-cycle window allows. This PR makes the loop unbounded and Ctrl-C-resumable.

Changes

Two new flags on train_self_recursive.py:

  • --continuous — replaces for cycle in range(n_cycles) with an unbounded loop. n_cycles still controls steps_per_cycle = args.steps // n_cycles so per-cycle training budget stays calibrated; the cycle counter just keeps going. K-shrink schedule clamps to K_min once global_step exceeds args.steps, which is the standard end state of the curriculum.

  • --checkpoint PATH — serializes the entire distillation state every cycle:

    • model state_dict + FibAdamW optimizer state
    • active_base tensor (the growing self-distilled corpus)
    • cycle counter, global_step, best_creativity, best_val/step
    • cycle_summary, rejection counters, best_refined_seq

    Atomic write via tmp + os.replace so an interrupt mid-save can't corrupt the file. If the checkpoint exists at startup, training resumes from saved_cycle + 1 with active_base fully intact.

Behavior

  • Default (no flags): byte-identical to v88 + omniweight-loss bounded 6-cycle run.
  • --continuous --checkpoint X.pt: runs forever, saves after every cycle. Ctrl-C and resume works.

Usage

# Forever self-distillation with omniweight loss
python3 train_self_recursive.py --omniweight-loss \
    --continuous --checkpoint omniweight_distill.pt

# Stop with Ctrl-C, resume by re-running the same command:
python3 train_self_recursive.py --omniweight-loss \
    --continuous --checkpoint omniweight_distill.pt
# Output: "resuming from checkpoint: omniweight_distill.pt"
#         "resumed at cycle N, active_base=M tokens, best_creativity=X"

Test plan

  • Smoke: 2 cycles → Ctrl-C → resume → verify cycle counter, active_base size, best_creativity all restored
  • Long run: continuous + omniweight-loss with --steps 6000 --n-cycles 6 and let it ride 20+ cycles, log creativity trajectory
  • Compare cycle-by-cycle creativity in continuous mode vs the bounded v89 numbers (cycle 1 mean 0.6460, cycle 2 mean 0.6426 from the killed run)

Generated by Claude Code

After PR #4 closed the train/inference omniweight asymmetry, the
natural follow-up: don't stop at cycle 6. The active_base ratchet
(seed + appended best refined outputs) is exactly the kind of process
where compounding past a fixed budget might find regimes the 6-cycle
window can't reach.

--continuous: replaces `for cycle in range(n_cycles)` with an
unbounded loop. n_cycles still controls steps_per_cycle (args.steps
// n_cycles) so per-cycle training budget stays calibrated; the cycle
counter just keeps going. K-shrink schedule clamps to K_min once
global_step exceeds args.steps, which is the standard end state of
the curriculum anyway.

--checkpoint PATH: serializes the entire distillation state every
cycle (model state_dict, FibAdamW optimizer state, active_base,
cycle counter, global_step, best_creativity, best_val/step,
cycle_summary, rejection counters, best_refined_seq). Atomic write
via tmp+os.replace so an interrupt mid-save can't corrupt the file.
If the checkpoint exists at startup, training resumes from the
saved cycle+1 with the active_base fully intact -- the ratchet picks
up exactly where it stopped.

Default behavior unchanged: omitting both flags reproduces the
v88 + omniweight-loss bounded 6-cycle run.

Run a forever-distillation with omniweight-loss:
  python3 train_self_recursive.py --omniweight-loss \\
      --continuous --checkpoint omniweight_distill.pt

Resume after Ctrl-C: re-run the same command. Checkpoint state
restored, next cycle is start_cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants