Deep cfr jax refactor #1408

alexunderch · 2025-12-14T16:11:05Z

Hey! It's not the main PR, so it might be deleted later but to summarise:

Replaced Sonnet's initialisation with native Pytorch's. Implementations with jax and torch had different structures, now they're the same --- i.e. a common MLP parametrisation and LayerNorm after base layers
Rewrote jax implementation using flax.nnx to match the implementations of torch and tf
Deleted all tf and tensorflow_datasets stuff to simplify the code.

However,

torch and jax differ with loss coefficients: in torch/tf it's $\sqrt{t}$, whereas in jax it is $\frac{t}{2T}$, not as the same but still different
jax implementation used to use masking, however, as said in torch implementation, it's not as valuable because in traversal only legal actions are selected
jax implementation is initialised with pytorch parameters to secure reproducibility

The goal is to make sure that both implementations converge to close values at the same time -- it'll allow to delete the tf implementation

fuyuan-li · 2025-12-15T16:30:06Z

At first glance, like your t/2T in Jax implementation! I think this is what original paper proposed.

Thank you! Will follow up more

alexunderch · 2025-12-15T16:33:33Z

@fuyuan-li , I found out that $\frac{t}{2T}$ is for the Linear CFR specifically, which shouldn't work with non-linear MLPs, so I decided to stick with $\sqrt{t}$ as in the original TF impl

Like, the only difference is masking used in the tf2 and jax implementations -- I am not really sure if it's good, but have kept it for consistency

alexunderch · 2025-12-15T19:06:40Z

@fuyuan-li, I added print_nash_convs argument for DeepCFRSolver to print exploitability of each optimisation step

can harm performance, but can help with tracking the progress

fuyuan-li · 2025-12-16T14:05:35Z

Quick update @alexunderch, the refactored torch impl cannot converge as the original impl, using the same hyper set as in #1287 (#1287)

Will dig in and keep you posted!

alexunderch · 2025-12-16T17:43:03Z

@fuyuan-li, I erroneously had an odd relu in the network. Sorry!
Now, both impls should converge.

alexunderch · 2025-12-17T00:48:10Z

@lanctot both versions now converge for kuhn, look at the #1406

However, without additional code improvements (e.g. jitting the buffer which causes constant recompilation due to changing size thereof), the jax implementations is like 25 times slower than for pytorch. Maybe should tackle it as a separate issue.

When the results for the leduc are in, notify me if you want to merge, I will clean up some stuff.

fuyuan-li · 2025-12-17T15:48:56Z

@alexunderch probably a late update -- Yes confirmed too, both impl converged in kuhn poker! exploitability drop to 0.05, policy value converge to theoretical value (-0.06 for player0), tested with several random seeds (for both torch version and jax).

The only reason I didn't run a simulation over 30+ seeds is because jax is running super slow -- 70mins for 1 simulation, on kuhn poker.

Another update -- Leduc simulation (in pytorch impl) is running:

on 1 result, exploitability drop to 0.6, policy value goes to -0.12 for player0.

Very happy to continue work on jax impl's performance issue too.(otherwise I doube a simulations for multiple seeds are feasible) Do you think it's a good time (i.e., working code is ready?) to start this new thread? Let me know!

alexunderch · 2025-12-17T15:57:33Z

@fuyuan-li , thank you for your testing. Good that the results are reproducible. The newest commit should lower the comp. time like twice (on my mac, at least). The further improvement, I think, should require some additional modifications for the ReservoirBuffer.

I reckon, that it's up to Dr. Lanctot, if he is okay with the slow but consistently looking implementation, or he thinks that we need to have compatible running times

We can have all the discussion in this thread, if you feel comfortable.

alexunderch · 2025-12-17T22:12:45Z

@fuyuan-li check it out now. jax version should be only 4-6 times slower than pytorch

It's partly because of jax-ml/jax#16587 (at append_to_reservior function) and because I allocate the whole buffer right away

lanctot · 2025-12-18T18:06:18Z

Hi guys, great work on this.. I'm super impressed to see the community collaboration here!

lanctot · 2025-12-18T18:08:22Z

I reckon, that it's up to Dr. Lanctot, if he is okay with the slow but consistently looking implementation, or he thinks that we need to have compatible running times

No strong preferences here, I'm mostly happy to see that we can retain these implementations thanks to both of you working on this.

I'm ok with slightly incompatible running times if one of them is just faster / more efficient. How cryptic is it? Will still want it to be readable. So as long as you have enough comments explaining any non-obvious code, I think it's ok to have a more efficient version that is slightly inconsistent with the other one.

alexunderch · 2025-12-18T18:15:08Z

Pytorch implementation hasn't really changed in terms of readability.
In jax implementation, I replaced the buffer with a set of functions and made a jittable training loop. Should still stay readable.

if soon @fuyuan-li reports that their testing is fine, I can clean tf and reference implementations and add a couple of comments, and we should be good to go.

P.s. if we continue with refactoring, I will replace the networks and buffers with corresponding utility imports.

fuyuan-li · 2025-12-19T14:29:40Z

Thank you @alexunderch and @lanctot
Quick comments for us:

Convergence and consistency on kuhn in both pytorch and jax are confirmed.
Convergence on Leduc in pytorch, based on 40 simulations: exploitability trained to 0.67, on average. Policy value (for player 0) arrived -0.14 (with std 0.01) across 40 simulations. (Thinking it's a convergence)
Convergence on Leduc in jax is running. re @lanctot (How cryptic is it?): about 65 mins per simulation given the default paramters. (By "default hyper parameters", they are the hyper param sets confirmed convergence for pytorch and kuhn)
@alexunderch : A small glitch: in your jax impl, the jax decorator for init_reservoir(), do you want to change from @jax.jit(static_argnames=("capacity",)) to @partial(jax.jit, static_argnames=("capacity",))? I updated it on my local to have the simulation run, but don't think I have to submit a PR on this -- defer to you to update in the branch HEAD to keep things simple.
Given the runnning time on Leduc in jax, expected to get results this weekends (it's running now), but feel free to go ahead if we are all happy about the kuhn's convergence result. I'll come back to log the simulation results when it's ready (regardless whether this PR is already merged).

alexunderch · 2025-12-19T14:37:15Z

@fuyuan-li thank you for your updates. Yes, I will update it locally. When everything is confirmed, we can merge.

Just for the sake of interest, can you compare GPU performances? Because as I mentioned, because of the array copying, jax performance may be worse than numpy. On paper, should be much faster. No code modifications needed, just install cuda versions...

alexunderch · 2026-01-13T12:51:21Z

@lanctot I merged as you told me to but there is that scipy installation error again (for arm...)

lanctot · 2026-01-13T13:07:31Z

@lanctot I merged as you told me to but there is that scipy installation error again (for arm...)

Can you try removing scipy from python_extra_deps.sh ? All instances of it. Since it's in requirements.txt we shouldn't need it in python_extra_deps.sh

lanctot · 2026-01-13T13:12:15Z

@lanctot I merged as you told me to but there is that scipy installation error again (for arm...)

Can you try removing scipy from python_extra_deps.sh ? All instances of it. Since it's in requirements.txt we shouldn't need it in python_extra_deps.sh

Nvm.. that won't do anything. It seems to be failing from the installation via requirements.txt...

I'm not sure what's going on.

Let me trigger a custom wheels tests on master.. done: https://github.com/google-deepmind/open_spiel/actions/runs/20957953304 (see if this one fails with the same reason or not..)

lanctot · 2026-01-13T13:20:09Z

Wait, it appears that you and @visheshrwl are running into the same problem (he's working on #1426). So my guess is the arm wheels are broken due to the scipy upgrade a few days ago. I expect my wheels test to fail. Can you try commenting out the Linux arm64 tests in wheels.yml?

alexunderch · 2026-01-13T13:23:58Z

I think it's a cool solution but let's try a diff thing beforehand: openblas
it's marked as a solution for both, windows and linux scipy/scipy#21562

lanctot · 2026-01-13T13:25:12Z

Confirmed: https://github.com/google-deepmind/open_spiel/actions/runs/20957953304/job/60227228831

lanctot · 2026-01-13T13:26:50Z

I think it's a cool solution but let's try a diff thing beforehand: openblas it's marked as a solution for both, windows and linux scipy/scipy#21562

Sorry don't have any more time to work on this. I will disable them for now and open a bug. Feel free to work on it independently. We can't leave master broken because the CI doesn't work.

alexunderch · 2026-01-13T13:27:45Z

If it doesn't work, I'll revert the change and just disable the wheels

alexunderch · 2026-01-13T13:45:17Z

There was a conflict because of the PR you've just merged, fixed it. Technically, doesn't change anyhting.

Updated with the freshest comments of yours.

alexunderch · 2026-01-13T14:10:16Z

it's not only ARM https://github.com/google-deepmind/open_spiel/actions/runs/20958630257/job/60230374482

alexunderch · 2026-01-13T14:23:15Z

wait @lanctot

you (we) use manylinux_2014 which is based on Centos7

EOL of the distribution is like 2024, no? pypa/cibuildwheel#1772

maybe to upate it? when I ran with manylinux_2_28, that wheel passed (question mark?)

lanctot · 2026-01-13T14:33:48Z

it's not only ARM https://github.com/google-deepmind/open_spiel/actions/runs/20958630257/job/60230374482

Yeah, extended the disabling: #1443

lanctot · 2026-01-13T15:17:20Z

wait @lanctot

you (we) use manylinux_2014 which is based on Centos7

EOL of the distribution is like 2024, no? pypa/cibuildwheel#1772

maybe to upate it? when I ran with manylinux_2_28, that wheel passed (question mark?)

Yes, we should do that, but will likely raise a few other issues that we should deal with separately. I don't think it's the cause of the current issue.

alexunderch · 2026-01-13T15:20:06Z

merging #1443 here?

alexunderch · 2026-01-13T15:29:04Z

Quoting Google's AI:

The Crash: Compiling SciPy 1.17.0 from source requires C++17 headers and a modern version of glibc to handle the new ARPACK/PROPACK C-conversions. manylinux2014 is based on CentOS 7 (glibc 2.17), which is too old. The build fails during the C++ compilation of SciPy, long before it ever gets to your "disabled" tests.

lanctot · 2026-01-13T17:20:11Z

Note that there are conflicts that need to be resolved; can't rerun the rests until they get resolved.

alexunderch · 2026-01-13T17:24:13Z

I feel really sorry but I really found out that I can't run test just in my fork and not bother you as much...

I think I make tests for linux (Ubuntu and ARM) pass. Will send you the jobs, maybe you'd want to revert patch-81 and patch-82 because they won't be needed

lanctot · 2026-01-13T17:28:39Z

It's not a problem... let's just do it separately :)

lanctot · 2026-01-13T18:16:20Z

Replaced by #1445

alexunderch added 2 commits December 14, 2025 18:58

Cleaned and added refactored cfr versions for torch and jax

b6e80d8

Misinput

8bda550

alexunderch mentioned this pull request Dec 14, 2025

Fix Deep CFR PyTorch convergence on Kuhn Poker #1406

Merged

alexunderch added 2 commits December 15, 2025 03:35

Annotations, improvement fixes

730dd30

Annotations, improvement fixes

4b45250

alexunderch added 3 commits December 15, 2025 19:59

Minor changes, added degugging

bc554fc

A typo

2f98560

Performance improvements

a20799f

Fixed a mistake in the network

05d61ea

Speed improvements

bcb85e4

Jax perf improvements

01775ec

Clean code. Should pass the tests

c69a211

alexunderch added 4 commits December 19, 2025 19:31

Decorator details, type hints

c8abc91

Structured Loops, added numpy backend as well

8b25d84

delete tqdm

ed025d9

Fixed failing tests due to the cornercase

8e26def

alexunderch force-pushed the dee_cfr_jax_refactor branch from 3915744 to cc3a6b1 Compare January 13, 2026 13:02

lanctot mentioned this pull request Jan 13, 2026

Windows pip support #1426

Closed

openblas attempt

7a600fd

Disabled ARM wheels

3e24895

alexunderch mentioned this pull request Jan 13, 2026

Pytorch RCFR issue with Python 3.13 #1438

Closed

manylinux change

26161df

alexunderch force-pushed the dee_cfr_jax_refactor branch from ace0bb4 to 26161df Compare January 13, 2026 14:28

Dont run tests. To be reverted

39c250a

alexunderch force-pushed the dee_cfr_jax_refactor branch from 841f6d8 to 39c250a Compare January 13, 2026 16:53

lanctot mentioned this pull request Jan 13, 2026

CI fix and Deep CFR refactor #1445

Merged

lanctot closed this Jan 13, 2026

Deep cfr jax refactor #1408

Deep cfr jax refactor #1408

Uh oh!

Conversation

alexunderch commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuyuan-li commented Dec 15, 2025

Uh oh!

alexunderch commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Dec 15, 2025

Uh oh!

fuyuan-li commented Dec 16, 2025

Uh oh!

alexunderch commented Dec 16, 2025

Uh oh!

alexunderch commented Dec 17, 2025

Uh oh!

fuyuan-li commented Dec 17, 2025

Uh oh!

alexunderch commented Dec 17, 2025

Uh oh!

alexunderch commented Dec 17, 2025

Uh oh!

lanctot commented Dec 18, 2025

Uh oh!

lanctot commented Dec 18, 2025

Uh oh!

alexunderch commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuyuan-li commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Dec 19, 2025

Uh oh!

alexunderch commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

alexunderch commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

lanctot commented Jan 13, 2026

Uh oh!

Reviewers

alexunderch commented Dec 14, 2025 •

edited

Loading

alexunderch commented Dec 15, 2025 •

edited

Loading

alexunderch commented Dec 18, 2025 •

edited

Loading

fuyuan-li commented Dec 19, 2025 •

edited

Loading

alexunderch commented Jan 13, 2026 •

edited

Loading

alexunderch commented Jan 13, 2026 •

edited

Loading

alexunderch commented Jan 13, 2026 •

edited

Loading