Should the adapter be trained on top of the teacher (PPO specialist) or the student (DAgger generalist)?

Hi, thanks for the great work!

I'd like to confirm the intended workflow for train_adapter. Per the paper, the pipeline reads as:
specialists  →  DAgger student (generalist)  →  adapter
So I expected the adapter to fine-tune on top of the DAgger generalist student. But the released code seems to only support loading from a PPO specialist teacher (logs/track/), not from a DAgger student (logs/dagger/).

The pretrained-checkpoint path is hardcoded to logs/track/ in train_adapter.py:
load_root = Path(WANDB_PATH_LOG) / "track" / args.load_exp_name / "checkpoints"

Even if I patch the path to logs/dagger/, the loader only accepts orbax-format numbered subdirs:
ckpts = [p for p in ckpt_root.glob("*") if p.is_dir() and p.name.isdigit()]

but train_dagger only saves PyTorch .pth files plus an ONNX export — no orbax checkpoint is ever written. 

Could you clarify whether the adapter is intended to be fine-tuned from the teacher policy (PPO specialist) or the student policy (DAgger generalist)?Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the adapter be trained on top of the teacher (PPO specialist) or the student (DAgger generalist)? #28

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Should the adapter be trained on top of the teacher (PPO specialist) or the student (DAgger generalist)? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions