Skip to content

Should the adapter be trained on top of the teacher (PPO specialist) or the student (DAgger generalist)? #28

@benexyw-arch

Description

@benexyw-arch

Hi, thanks for the great work!

I'd like to confirm the intended workflow for train_adapter. Per the paper, the pipeline reads as:
specialists → DAgger student (generalist) → adapter
So I expected the adapter to fine-tune on top of the DAgger generalist student. But the released code seems to only support loading from a PPO specialist teacher (logs/track/), not from a DAgger student (logs/dagger/).

The pretrained-checkpoint path is hardcoded to logs/track/ in train_adapter.py:
load_root = Path(WANDB_PATH_LOG) / "track" / args.load_exp_name / "checkpoints"

Even if I patch the path to logs/dagger/, the loader only accepts orbax-format numbered subdirs:
ckpts = [p for p in ckpt_root.glob("*") if p.is_dir() and p.name.isdigit()]

but train_dagger only saves PyTorch .pth files plus an ONNX export — no orbax checkpoint is ever written.

Could you clarify whether the adapter is intended to be fine-tuned from the teacher policy (PPO specialist) or the student policy (DAgger generalist)?Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions