[Feature] Add BC Loss for behavior cloning#3667
[Feature] Add BC Loss for behavior cloning#3667ParamThakkar123 wants to merge 28 commits intopytorch:mainfrom
Conversation
This reverts commit 1f6f327.
- Add BCLoss module in torchrl/objectives/bc.py - Supports both stochastic and deterministic policies - Auto-detects policy type based on log_prob output - Configurable loss functions for deterministic policies - Add comprehensive tests in test/objectives/test_bc.py - Update documentation and module exports
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3667
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 6 New Failures, 1 Cancelled JobAs of commit 06b7d0b with merge base 33475e3 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
I haven't run the pre-commit on this because the pre-commit setup fails in my env with some compat issues |
|
Thanks for the work, eager to use this. I just happen to read this and saw that you use I'll make sure to put it to the tests in the coming days/weeks. |
|
Sure I will make this correction to it 🫡 . |
|
@ParamThakkar123 also, look into ensuring that the MSE path works well and the module structure is clean. Running a quick reproduction script against the three standard BC scenarios surfaces some issues worth fixing before merge. Thanks for the PR! |
|
Thanks for the insights @theap06 , I will look into it and fix this 🫡 |
| class BCLoss(LossModule): | ||
| """Behavior Cloning Loss Module. | ||
|
|
||
| Implements behavior cloning loss for both stochastic and deterministic policies. |
There was a problem hiding this comment.
Let's add the ref to Arxiv if we have it
There was a problem hiding this comment.
This is the paper:
"Integrating Behavior Cloning and Reinforcement Learning for Improved
Performance in Dense and Sparse Reward Environments"
https://arxiv.org/abs/1910.04281
|
On it |
|
@vmoens implemented all the fixes as per reviews 🫡 |
|
@vmoens SOTA and one Unit tests seems to fail but those seem unrelated to my changes |
Summary
This PR implements the BC Loss module for behavior cloning as requested in issue #3635.
Changes
Features
Tests
All tests pass including:
Closes #3635