Puffer doesn't beat my simple env - credit assignment testing by eitanporat · Pull Request #478 · PufferAI/PufferLib

eitanporat · 2026-01-30T19:27:30Z

Summary

Simple environment where agent must pick action 0 on step 1 to win
Episode terminates at step 128, reward given only at termination
2 discrete actions, 50% random baseline
PufferLib with bptt_horizon=64 cannot beat the credit assignment for 128 length episodes

Results:

Agent that always picks action 0: 100%
Random agent: 50%
PufferLib with bptt_horizon=64: 50%

The core issue is that truncated BPTT cuts off gradients at segment boundaries, preventing credit from flowing back to early actions. A potential fix would be to perform two forward passes per rollout: the first to collect experiences, and the second (after seeing more of the trajectory) to compute improved bootstrap value estimates at segment boundaries. This would allow the value function to incorporate information beyond the BPTT horizon without requiring full backpropagation through the entire episode.

I leave this as an open problem for other contributors.

Simple environment where agent must pick action 0 on step 1 to win. Episode terminates at step 128, reward given only at termination. - 2 discrete actions - 50% random baseline - Tests long-horizon credit assignment with BPTT

Add Action0 environment for credit assignment testing

8af716f

Simple environment where agent must pick action 0 on step 1 to win. Episode terminates at step 128, reward given only at termination. - 2 discrete actions - 50% random baseline - Tests long-horizon credit assignment with BPTT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puffer doesn't beat my simple env - credit assignment testing#478

Puffer doesn't beat my simple env - credit assignment testing#478
eitanporat wants to merge 1 commit intoPufferAI:3.0from
eitanporat:action0-env

eitanporat commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eitanporat commented Jan 30, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant