Hi, thanks for the great work!
I'd like to confirm the intended workflow for train_adapter. Per the paper, the pipeline reads as:
specialists → DAgger student (generalist) → adapter
So I expected the adapter to fine-tune on top of the DAgger generalist student. But the released code seems to only support loading from a PPO specialist teacher (logs/track/), not from a DAgger student (logs/dagger/).
The pretrained-checkpoint path is hardcoded to logs/track/ in train_adapter.py:
load_root = Path(WANDB_PATH_LOG) / "track" / args.load_exp_name / "checkpoints"
Even if I patch the path to logs/dagger/, the loader only accepts orbax-format numbered subdirs:
ckpts = [p for p in ckpt_root.glob("*") if p.is_dir() and p.name.isdigit()]
but train_dagger only saves PyTorch .pth files plus an ONNX export — no orbax checkpoint is ever written.
Could you clarify whether the adapter is intended to be fine-tuned from the teacher policy (PPO specialist) or the student policy (DAgger generalist)?Thanks!
Hi, thanks for the great work!
I'd like to confirm the intended workflow for train_adapter. Per the paper, the pipeline reads as:
specialists → DAgger student (generalist) → adapter
So I expected the adapter to fine-tune on top of the DAgger generalist student. But the released code seems to only support loading from a PPO specialist teacher (logs/track/), not from a DAgger student (logs/dagger/).
The pretrained-checkpoint path is hardcoded to logs/track/ in train_adapter.py:
load_root = Path(WANDB_PATH_LOG) / "track" / args.load_exp_name / "checkpoints"
Even if I patch the path to logs/dagger/, the loader only accepts orbax-format numbered subdirs:
ckpts = [p for p in ckpt_root.glob("*") if p.is_dir() and p.name.isdigit()]
but train_dagger only saves PyTorch .pth files plus an ONNX export — no orbax checkpoint is ever written.
Could you clarify whether the adapter is intended to be fine-tuned from the teacher policy (PPO specialist) or the student policy (DAgger generalist)?Thanks!