Skip to content

DuckDBPyResult destructor releases GIL before Python C deallocator completes — single-threaded + concurrent both trigger #456

@S11al

Description

@S11al

Summary

DuckDBPyResult::~DuckDBPyResult releases the GIL via py::gil_scoped_release BEFORE result.reset() / current_chunk.reset() complete the cleanup of pybind-owned Python references in the result graph. When those references' tp_free callbacks invoke PyObject_Free from the GIL-released window, PyObject_Free accesses _PyRuntime.obmalloc state without a valid PyThreadState and crashes Python.

Empirically the bug fires on single-threaded code (not just concurrent execute on shared parent connections) and on both DuckDB 1.4.4 LTS and 1.5.2.

Environment

  • OS: Windows Server 2022 (build 10.0.20348.4648), Windows 11 also affected
  • Python: 3.12.9 (cpython, official build, python.exe from python.org)
  • DuckDB Python binding: tested on duckdb==1.4.4 AND duckdb==1.5.2
  • Host: dedicated server, 64 GB RAM (not memory-bound), AMD64

Three empirically-observed Windows fault surfaces (same root)

Same root cause (destructor releases GIL → pybind tp_free hits PyObject_Free without tstate), three distinct Windows fault types depending on which heap structure the corruption hits:

Path Exception code Faulting site Process outcome Diagnostic capture
1 0xC0000005 EXCEPTION_ACCESS_VIOLATION python312.dll + 0x45886 (= PyObject_Free + 0x46) Terminated immediately WER LocalDumps (DumpType=1 mini OR DumpType=2 full); SEH-visible
2 0xc0000409 STATUS_STACK_BUFFER_OVERRUN ucrtbase.dll + 0x7caee (__fastfail thunk) Terminated immediately Bypasses normal SEH; WER captures sometimes; schtasks Last Result shows 0xC0000374 STATUS_HEAP_CORRUPTION
3 (no exception raised) Wedged inside con.execute("COMMIT") (DuckDB native code) Process ALIVE, writer thread in 100%-CPU spin loop, holds Python-level locks → other threads starved py-spy native + threads stack trace

Each path is reproducible. Path 3 is the silent-killer surface — the listener appears alive in schtasks /Query (Status: Running) but is functionally dead.

Minimal reproducer (Path 2)

This crashes DuckDB 1.5.2 in ~1m52s of single-threaded code on Windows 10/11 + Python 3.12:

# Save as repro.py; run as:
#   python -X utf8 -u repro.py
import duckdb
import faulthandler
import random
import string

faulthandler.enable()

con = duckdb.connect("scratch.db", read_only=False)
con.execute("SET memory_limit='4GB'")
con.execute("SET threads=4")
con.execute("SET preserve_insertion_order=false")

# 40-column table with PK + 3 secondary indexes (matches a real workload)
COLS = [("channel", "VARCHAR"), ("message_id", "BIGINT"),
        ("ts", "TIMESTAMP"), ("token_address", "VARCHAR"),
        ("token_name", "VARCHAR"), ("chain", "VARCHAR"),
        ("parser_version", "VARCHAR"), ("raw_text_sha256", "VARCHAR")] + [
        (f"col_d_{i}", "DOUBLE") for i in range(12)] + [
        (f"col_b_{i}", "BOOLEAN") for i in range(3)] + [
        (f"col_v_{i}", "VARCHAR") for i in range(17)]

col_sql = ", ".join(f"{n} {t}" for n, t in COLS)
con.execute(f"CREATE TABLE t ({col_sql}, PRIMARY KEY (channel, message_id))")
con.execute("CREATE INDEX idx_token_ts ON t (chain, token_address, ts)")
con.execute("CREATE INDEX idx_channel_ts ON t (channel, ts)")
con.execute("CREATE INDEX idx_wallet ON t (col_v_3)")

col_names = ", ".join(n for n, _ in COLS)
placeholders = ", ".join(["?"] * len(COLS))

def gen_row(i):
    return [
        f"channel_{i % 48}", i,
        f"2026-05-13 14:00:{(i % 60):02d}",
        f"0x{i:040x}"[:42], f"token_{i % 1000}",
        random.choice(["ETH", "BSC", "SOL"]), "v1",
        "".join(random.choices(string.hexdigits, k=64)),
    ] + [random.random() for _ in range(12)] \
      + [random.choice([True, False]) for _ in range(3)] \
      + [f"v_{i % 1000}_{j}" for j in range(17)]

# 50K-row batch via Python tuples — triggers the bug
rows = [gen_row(i) for i in range(50_000)]
con.executemany(
    f"INSERT INTO t ({col_names}) VALUES ({placeholders}) "
    f"ON CONFLICT (channel, message_id) DO NOTHING",
    rows,
)
print("did not crash — surprise!")

Expected outcome:

  • ~1-3 minutes wall time, then Fatal Python error: PyEval_SaveThread: the function must be called with the GIL held, ... (the current Python thread state is NULL)
  • WER captures dump at ucrtbase.dll+0x7caee (Path 2)

Empirical bypass — Arrow ingestion

Replacing the executemany(SQL, py_tuples) pattern with PyArrow Table + con.register('name', table) + INSERT FROM bypasses the bug:

import pyarrow as pa

# Build Arrow table from the same data (no per-cell pybind py::* wrapping)
arrow_table = pa.Table.from_pydict({
    "channel": [...], "message_id": [...], "ts": [...],
    # ... all 40 columns as native numpy/arrow arrays ...
})
con.register("_incoming", arrow_table)
try:
    con.execute(
        "INSERT INTO t SELECT * FROM _incoming "
        "ON CONFLICT (channel, message_id) DO NOTHING"
    )
finally:
    con.unregister("_incoming")

Empirical comparison (identical hardware, identical DuckDB 1.5.2 build, identical workload shape):

  • executemany path: CRASHED at 1m52s wall (single-threaded)
  • Arrow register + INSERT FROM path: SURVIVED 30 min of MULTI-threaded stress (3 worker threads, 28,448 batches, 39 large merges, 585 COUNT(*) polls). Zero faults.

The Arrow path is ~16x++ longer survival on the same workload. This strongly suggests the bug is specifically in code paths that wrap Python primitives (py::str / py::int / py::float) into the DuckDBPyResult graph — Arrow reads from native C++ ArrowArray structures and avoids those wrappers entirely.

Suspected root cause location

Per source inspection in https://github.com/duckdb/duckdb-python (relevant file tools/pythonpkg/src/pyresult.cpp historically, or its equivalent in the current binding):

DuckDBPyResult::~DuckDBPyResult() {
    // ...
    py::gil_scoped_release release;   // <-- GIL released here
    result.reset();                   // <-- pybind-owned py::* members freed AFTER GIL release
    current_chunk.reset();            // <-- same problem
}

If result or current_chunk reference any object whose destructor calls into the Python C API (PyObject_Free, Py_DECREF, etc.), that call now executes without a valid PyThreadState → SIGSEGV / heap corruption / wedge.

Suggested fix

Move the GIL release AFTER pybind-owned Python references are dropped:

DuckDBPyResult::~DuckDBPyResult() {
    {
        // Drop all pybind-owned references BEFORE releasing GIL
        result.reset();
        current_chunk.reset();
    }
    // Now safe to release GIL for any remaining native-only cleanup
    py::gil_scoped_release release;
    // ... native-only destruction work ...
}

OR don't release the GIL in the destructor at all. The destructor is called from Python code (Py_DECREF / refcount=0), which already holds the GIL. The release was presumably an optimization for letting other Python threads run during heavy native cleanup. The safer pattern is to release only inside specific known-safe scopes.

Workaround (interim, for production users)

Until upstream fixes lands: replace all con.executemany(SQL, py_tuples) and per-row con.execute(SQL, [params]) write paths with pa.Table + con.register + INSERT FROM register. Empirically bypasses all 3 fault surfaces.

For SELECT result handling on LARGE result sets, prefer .arrow() / .fetchnumpy() over .fetchall(). Small result sets (<100 rows) appear to NOT trigger the bug empirically — the destructor surface is too small.

Diagnostic checklist for users encountering this

  1. Crash with Fatal Python error: PyEval_SaveThread: ... (the current Python thread state is NULL) AND fault offset in python312.dll near PyObject_Free? → Path 1
  2. Crash with no obvious Python traceback, exit code 0xC0000374 (heap corruption)? → Path 2 (look for ucrtbase.dll+0x7caee in WER dump if captured)
  3. Process alive at 100% CPU, no log output for minutes, py-spy stack shows wedge inside con.execute(...)? → Path 3
  4. Same workload survives if you replace executemany/execute(SQL, [params]) with Arrow register + INSERT FROM? → Confirms this bug class

Attachments / references

  • Mini dump from a Path 1 reproducer: available on request (large)
  • Mini dump from a Path 2 reproducer: available on request (~2 MB)
  • py-spy stack from a Path 3 reproducer: available on request

Happy to attach any of these to the issue if maintainers want them.

Severity

For applications doing per-row INSERT/UPDATE via DuckDB Python on Windows + Python 3.12, this is catastrophic: the application chronically crashes or hangs in production. We had observed MTBF of 1-2 hours in a production listener. The Arrow workaround restores stability immediately.

The same root cause produces 3 distinct Windows fault surfaces, making diagnosis non-obvious — operators see crashes with different exit codes and hangs and treat them as separate bugs.


(Reported by an external operator who debugged this in production on 2026-05-13. Cross-reference: linked detailed RCA in a follow-up comment if maintainers request.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions