Skip to content

fix: data race on last_storage_ptr_ cache in gil_safe_call_once_and_store#6087

Merged
henryiii merged 1 commit into
pybind:masterfrom
henryiii:fix/gil-safe-call-once-atomic-cache
Jun 24, 2026
Merged

fix: data race on last_storage_ptr_ cache in gil_safe_call_once_and_store#6087
henryiii merged 1 commit into
pybind:masterfrom
henryiii:fix/gil-safe-call-once-atomic-cache

Conversation

@henryiii

Copy link
Copy Markdown
Collaborator

🤖 AI text below 🤖

Problem

In include/pybind11/gil_safe_call_once.h, the PYBIND11_HAS_SUBINTERPRETER_SUPPORT branch of gil_safe_call_once_and_store caches the per-interpreter storage pointer in T *last_storage_ptr_ as a fast path for the single-interpreter case.

This plain pointer is:

  • written in call_once_and_store_result() (inside the std::call_once lambda) and in get_stored(), and
  • read in get_stored().

Under free-threaded CPython (Py_GIL_DISABLED), gil_scoped_acquire provides no mutual exclusion, so these concurrent unsynchronized reads/writes of a plain pointer are a C++ data race (undefined behavior).

Additionally, get_stored() loaded the cached pointer before calling is_last_storage_valid(). The writer publishes the pointer (last_storage_ptr_) before setting the validity flag (is_initialized_by_at_least_one_interpreter_), so a reader must observe the flag first and only then load the pointer to get correct acquire/release ordering.

Fix

  • Make last_storage_ptr_ a std::atomic<T *> (<atomic> is already included in this branch). Default seq_cst operations pair with the existing atomic validity flag.
  • Restructure get_stored() to check is_last_storage_valid() first, and only load last_storage_ptr_ on the fast path; on the slow path, look up the per-interpreter storage and update the cache as before.

The change is minimal and behavior-preserving on the non-free-threaded build.

Not addressed here

The separate embedded finalize / re-init staleness concern (a stale last_storage_ptr_ surviving interpreter finalize + re-init within the same process) is not fixed by this PR. It is tracked separately in the issue.

Verification

Compiled a scratch translation unit including <pybind11/gil_safe_call_once.h> and instantiating pybind11::gil_safe_call_once_and_store<int> with c++ -std=c++17 -fsyntax-only against Python 3.14 (the subinterpreter branch is active by default for Python >= 3.12). Compiles cleanly. clang-format applied.

Part of #6084

@henryiii henryiii mentioned this pull request Jun 11, 2026
5 tasks
@henryiii henryiii marked this pull request as ready for review June 15, 2026 19:08
…tore

Under free-threaded CPython (Py_GIL_DISABLED) the GIL provides no mutual
exclusion, so the plain pointer last_storage_ptr_ was read and written
concurrently without synchronization, a C++ data race. Make it a
std::atomic<T *>.

Also reorder get_stored() to check is_last_storage_valid() before loading
the cached pointer. The writer publishes the pointer before setting the
validity flag, so the flag must be observed first for correct
acquire/release ordering.

Assisted-by: ClaudeCode:claude-fable-5
@henryiii henryiii force-pushed the fix/gil-safe-call-once-atomic-cache branch from 3f3c573 to faf4b5d Compare June 17, 2026 12:14

@rwgk rwgk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt-5.5:

I reviewed the change in gil_safe_call_once.h: making last_storage_ptr_ atomic and loading it only after the validity flag is observed matches the intended seq-cst publish/observe ordering. The change stays scoped to the subinterpreter-support branch and preserves the existing slow-path behavior.

The functional bug being fixed is a C++ data race that matters when Py_GIL_DISABLED is active, because the GIL no longer serializes reads/writes to last_storage_ptr_.

On normal GIL builds, the same code is compiled in the subinterpreter-support branch on Python 3.12+, but the existing GIL discipline already prevents the problematic concurrent unsynchronized access in the intended usage. So the atomic pointer and reordered load are mostly harmless correctness hardening there, not a user-visible behavioral fix.

One nuance: the reorder in get_stored() is conceptually tied to the atomic publish protocol. Once last_storage_ptr_ becomes atomic, checking is_initialized_by_at_least_one_interpreter_ before reading the pointer is the right way to make the memory ordering argument valid. So both changes belong together, but the reason they are needed is free-threaded Python.

@henryiii henryiii merged commit 085b660 into pybind:master Jun 24, 2026
86 checks passed
@henryiii henryiii deleted the fix/gil-safe-call-once-atomic-cache branch June 24, 2026 01:48
@github-actions github-actions Bot added the needs changelog Possibly needs a changelog entry label Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Possibly needs a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants