[Feature] Add BC Loss for behavior cloning by ParamThakkar123 · Pull Request #3667 · pytorch/rl

ParamThakkar123 · 2026-04-23T09:04:17Z

Summary

This PR implements the BC Loss module for behavior cloning as requested in issue #3635.

Changes

torchrl/objectives/bc.py: New BCLoss module that supports both stochastic and deterministic policies
torchrl/objectives/init.py: Add BCLoss to module exports
test/objectives/test_bc.py: Comprehensive test suite covering all functionality
docs/source/reference/objectives_other.rst: Add BCLoss to documentation

Features

Auto-detects policy type based on whether actor outputs log_prob
For stochastic policies: minimizes -E[log π(a_expert | s)]
For deterministic policies: minimizes distance(a_pred, a_expert) with configurable loss functions (l1, l2, smooth_l1)
Follows standard LossModule pattern with proper keys, dispatch, and reduction
Integrates cleanly with existing offline RL stack

Tests

All tests pass including:

Forward/backward passes for both policy types
Different loss functions and reduction modes
Training convergence verification
Custom key configurations

Closes #3635

This reverts commit 1f6f327.

- Add BCLoss module in torchrl/objectives/bc.py - Supports both stochastic and deterministic policies - Auto-detects policy type based on log_prob output - Configurable loss functions for deterministic policies - Add comprehensive tests in test/objectives/test_bc.py - Update documentation and module exports

pytorch-bot · 2026-04-23T09:04:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3667

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 6 New Failures, 1 Cancelled Job

As of commit 06b7d0b with merge base 33475e3 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Libs Tests on Linux / unittests-sklearn (3.10, 12.8) / linux-job (gh)
test/libs/test_datasets.py::TestOpenML::test_data[magic]
Lint / lint-done (gh)
Process completed with exit code 1.
Lint / python-source-and-configs / linux-job (gh)
torchrl/objectives/bc.py:8:1: F401 'typing.Any' imported but unused
SOTA Tests on Linux / tests (3.10, 13.0) / linux-job (gh)
RuntimeError: Command docker exec -t 6a6063217e87644aa470d871af1f5c34e3048948dbe3bbd0d67c3ec8d64ad543 /exec failed with exit code 1
Unit-tests on Linux / tests-olddeps (3.10, 11.8) / linux-job (gh)
test/test_modules.py::TestMultiAgent::test_multiagent_reset_mlp[True-False-3]

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Cannot read properties of undefined (reading 'ops')

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-23T09:04:28Z

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Add BC Loss for behavior cloning

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

ParamThakkar123 · 2026-04-23T16:30:36Z

I haven't run the pre-commit on this because the pre-commit setup fails in my env with some compat issues

Xmaster6y · 2026-04-23T17:32:22Z

Thanks for the work, eager to use this.

I just happen to read this and saw that you use log_prob key as a switch and it seems fragile to me. In some modules the key can be action_log_prob, or custom in general. Maybe we want something more explicit and compute dist / log probs in the loss directly using something closer to CQLLoss.actor_bc_loss.

I'll make sure to put it to the tests in the coming days/weeks.

ParamThakkar123 · 2026-04-23T19:25:18Z

Sure I will make this correction to it 🫡 .

theap06 · 2026-04-25T00:30:54Z

@ParamThakkar123 also, look into ensuring that the MSE path works well and the module structure is clean. Running a quick reproduction script against the three standard BC scenarios surfaces some issues worth fixing before merge. Thanks for the PR!

import torch, torch.nn as nn
from tensordict import TensorDict
from tensordict.nn import TensorDictModule, ProbabilisticTensorDictModule, ProbabilisticTensorDictSequential
from torchrl.modules import Actor
from torchrl.modules.distributions import NormalParamExtractor, TanhNormal
from torchrl.objectives.bc import BCLoss

torch.manual_seed(0)

# Failure 1 — cross_entropy not in string dispatch
actor = Actor(nn.Linear(4, 4))
loss = BCLoss(actor, loss_function='cross_entropy')
td = TensorDict({'observation': torch.randn(8,4), 'action': torch.randint(0,4,(8,)).long()}, [8])
loss(td)
# ValueError: Unsupported loss_function: cross_entropy

# Failure 2 — integer action labels crash
loss2 = BCLoss(Actor(nn.Linear(4, 4)))
td2 = TensorDict({'observation': torch.randn(8,4), 'action': torch.randint(0,4,(8,)).long()}, [8])
loss2(td2)
# RuntimeError: size of tensor a (4) must match tensor b (8)

# Failure 3 — stochastic actor silently computes MSE instead of NLL
net = nn.Sequential(nn.Linear(4, 8), NormalParamExtractor())
mod = TensorDictModule(net, in_keys=['observation'], out_keys=['loc','scale'])
stoch = ProbabilisticTensorDictSequential(mod,
    ProbabilisticTensorDictModule(['loc','scale'], ['action'], TanhNormal, return_log_prob=True))
loss3 = BCLoss(stoch)
td3 = TensorDict({'observation': torch.randn(8,4), 'action': torch.randn(8,4)}, [8])
print(loss3(td3)['loss_bc'].item())   # 0.9426  — MSE, always >= 0
# Real NLL for same inputs: 68.21 — completely different signal

ParamThakkar123 · 2026-04-25T07:12:00Z

Thanks for the insights @theap06 , I will look into it and fix this 🫡

vmoens · 2026-04-25T08:18:35Z

+class BCLoss(LossModule):
+    """Behavior Cloning Loss Module.
+
+    Implements behavior cloning loss for both stochastic and deterministic policies.


Let's add the ref to Arxiv if we have it

This is the paper:

"Integrating Behavior Cloning and Reinforcement Learning for Improved
Performance in Dense and Sparse Reward Environments"
https://arxiv.org/abs/1910.04281

ParamThakkar123 · 2026-04-25T20:54:19Z

On it

ParamThakkar123 · 2026-04-25T21:01:03Z

@vmoens implemented all the fixes as per reviews 🫡

vmoens

LGTM thanks!

ParamThakkar123 · 2026-04-27T06:03:51Z

@vmoens SOTA and one Unit tests seems to fail but those seem unrelated to my changes

ParamThakkar123 added 24 commits January 20, 2026 00:23

Fixed MultiSyncCollector set_seed and split_trajs issue

1f6f327

Merge branch 'main' of https://github.com/pytorch/rl

e2aaf6b

Revert "Fixed MultiSyncCollector set_seed and split_trajs issue"

40642d5

This reverts commit 1f6f327.

Merge branch 'main' of https://github.com/pytorch/rl

efdc89c

Merge branch 'main' of https://github.com/pytorch/rl

628f44b

Merge branch 'main' of https://github.com/pytorch/rl

a476a77

Merge branch 'main' of https://github.com/pytorch/rl

0f565c5

Merge branch 'main' of https://github.com/pytorch/rl

7fb086b

Merge branch 'main' of https://github.com/pytorch/rl

ff72793

Added Support for index_select in TensorSpec

69001ed

Merge branch 'main' of https://github.com/pytorch/rl

4ab13be

rebase

2e8face

Merge branch 'main' of https://github.com/pytorch/rl

56e1529

Merge branch 'main' of https://github.com/pytorch/rl

ba6a19f

Merge branch 'main' of https://github.com/pytorch/rl

8be545b

Merge branch 'main' of https://github.com/pytorch/rl

54abe29

Merge branch 'main' of https://github.com/pytorch/rl

78dd00a

Merge branch 'main' of https://github.com/pytorch/rl

94fe080

Merge branch 'main' of https://github.com/pytorch/rl

1619008

Merge branch 'main' of https://github.com/pytorch/rl

3aeaefe

Rebase fixes

5b76682

Merge branch 'main' of https://github.com/pytorch/rl

6371981

Merge branch 'main' of https://github.com/pytorch/rl

eb4a19f

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2026

github-actions Bot added Documentation Improvements or additions to documentation Objectives labels Apr 23, 2026

ParamThakkar123 changed the title ~~Add BC Loss for behavior cloning~~ [Feature] Add BC Loss for behavior cloning Apr 23, 2026

github-actions Bot added the Feature New feature label Apr 23, 2026

Add BC Loss for behavior cloning (pytorch#3635)

19096b8

github-actions Bot added Modules Integrations/torch_geometric Integrations labels Apr 23, 2026

ParamThakkar123 added 2 commits April 25, 2026 12:48

Fixed BC loss failures

e34b86a

Merge branch 'main' of https://github.com/pytorch/rl into bc_loss

c420336

vmoens reviewed Apr 25, 2026

View reviewed changes

Fixes

06b7d0b

ParamThakkar123 requested a review from vmoens April 25, 2026 21:01

vmoens approved these changes Apr 26, 2026

View reviewed changes

Conversation

ParamThakkar123 commented Apr 23, 2026

Summary

Changes

Features

Tests

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3667

❗ 1 Active SEVs

❌ 6 New Failures, 1 Cancelled Job

Uh oh!

github-actions Bot commented Apr 23, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

ParamThakkar123 commented Apr 23, 2026

Uh oh!

Xmaster6y commented Apr 23, 2026

Uh oh!

ParamThakkar123 commented Apr 23, 2026

Uh oh!

theap06 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParamThakkar123 commented Apr 25, 2026

Uh oh!

Uh oh!

vmoens Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

ParamThakkar123 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ParamThakkar123 commented Apr 25, 2026

Uh oh!

ParamThakkar123 commented Apr 25, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

ParamThakkar123 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

theap06 commented Apr 25, 2026 •

edited

Loading