DuckDBPyResult destructor releases GIL before Python C deallocator completes — single-threaded + concurrent both trigger

## Summary

`DuckDBPyResult::~DuckDBPyResult` releases the GIL via `py::gil_scoped_release` BEFORE `result.reset()` / `current_chunk.reset()` complete the cleanup of pybind-owned Python references in the result graph. When those references' `tp_free` callbacks invoke `PyObject_Free` from the GIL-released window, `PyObject_Free` accesses `_PyRuntime.obmalloc` state without a valid `PyThreadState` and crashes Python.

Empirically the bug fires on **single-threaded code** (not just concurrent execute on shared parent connections) and on **both DuckDB 1.4.4 LTS and 1.5.2**.

## Environment

- OS: Windows Server 2022 (build 10.0.20348.4648), Windows 11 also affected
- Python: 3.12.9 (cpython, official build, `python.exe` from python.org)
- DuckDB Python binding: tested on `duckdb==1.4.4` AND `duckdb==1.5.2`
- Host: dedicated server, 64 GB RAM (not memory-bound), AMD64

## Three empirically-observed Windows fault surfaces (same root)

Same root cause (destructor releases GIL → pybind tp_free hits `PyObject_Free` without `tstate`), three distinct Windows fault types depending on which heap structure the corruption hits:

| Path | Exception code | Faulting site | Process outcome | Diagnostic capture |
|---:|---|---|---|---|
| **1** | `0xC0000005 EXCEPTION_ACCESS_VIOLATION` | `python312.dll + 0x45886` (= `PyObject_Free + 0x46`) | Terminated immediately | WER LocalDumps (DumpType=1 mini OR DumpType=2 full); SEH-visible |
| **2** | `0xc0000409 STATUS_STACK_BUFFER_OVERRUN` | `ucrtbase.dll + 0x7caee` (`__fastfail` thunk) | Terminated immediately | Bypasses normal SEH; WER captures sometimes; schtasks Last Result shows `0xC0000374 STATUS_HEAP_CORRUPTION` |
| **3** | (no exception raised) | Wedged inside `con.execute("COMMIT")` (DuckDB native code) | **Process ALIVE, writer thread in 100%-CPU spin loop**, holds Python-level locks → other threads starved | py-spy native + threads stack trace |

Each path is reproducible. Path 3 is the silent-killer surface — the listener appears alive in `schtasks /Query` (`Status: Running`) but is functionally dead.

## Minimal reproducer (Path 2)

This crashes DuckDB 1.5.2 in ~1m52s of single-threaded code on Windows 10/11 + Python 3.12:

```python
# Save as repro.py; run as:
#   python -X utf8 -u repro.py
import duckdb
import faulthandler
import random
import string

faulthandler.enable()

con = duckdb.connect("scratch.db", read_only=False)
con.execute("SET memory_limit='4GB'")
con.execute("SET threads=4")
con.execute("SET preserve_insertion_order=false")

# 40-column table with PK + 3 secondary indexes (matches a real workload)
COLS = [("channel", "VARCHAR"), ("message_id", "BIGINT"),
        ("ts", "TIMESTAMP"), ("token_address", "VARCHAR"),
        ("token_name", "VARCHAR"), ("chain", "VARCHAR"),
        ("parser_version", "VARCHAR"), ("raw_text_sha256", "VARCHAR")] + [
        (f"col_d_{i}", "DOUBLE") for i in range(12)] + [
        (f"col_b_{i}", "BOOLEAN") for i in range(3)] + [
        (f"col_v_{i}", "VARCHAR") for i in range(17)]

col_sql = ", ".join(f"{n} {t}" for n, t in COLS)
con.execute(f"CREATE TABLE t ({col_sql}, PRIMARY KEY (channel, message_id))")
con.execute("CREATE INDEX idx_token_ts ON t (chain, token_address, ts)")
con.execute("CREATE INDEX idx_channel_ts ON t (channel, ts)")
con.execute("CREATE INDEX idx_wallet ON t (col_v_3)")

col_names = ", ".join(n for n, _ in COLS)
placeholders = ", ".join(["?"] * len(COLS))

def gen_row(i):
    return [
        f"channel_{i % 48}", i,
        f"2026-05-13 14:00:{(i % 60):02d}",
        f"0x{i:040x}"[:42], f"token_{i % 1000}",
        random.choice(["ETH", "BSC", "SOL"]), "v1",
        "".join(random.choices(string.hexdigits, k=64)),
    ] + [random.random() for _ in range(12)] \
      + [random.choice([True, False]) for _ in range(3)] \
      + [f"v_{i % 1000}_{j}" for j in range(17)]

# 50K-row batch via Python tuples — triggers the bug
rows = [gen_row(i) for i in range(50_000)]
con.executemany(
    f"INSERT INTO t ({col_names}) VALUES ({placeholders}) "
    f"ON CONFLICT (channel, message_id) DO NOTHING",
    rows,
)
print("did not crash — surprise!")
```

Expected outcome:
- ~1-3 minutes wall time, then `Fatal Python error: PyEval_SaveThread: the function must be called with the GIL held, ... (the current Python thread state is NULL)`
- WER captures dump at `ucrtbase.dll+0x7caee` (Path 2)

## Empirical bypass — Arrow ingestion

Replacing the `executemany(SQL, py_tuples)` pattern with PyArrow Table + `con.register('name', table)` + `INSERT FROM` bypasses the bug:

```python
import pyarrow as pa

# Build Arrow table from the same data (no per-cell pybind py::* wrapping)
arrow_table = pa.Table.from_pydict({
    "channel": [...], "message_id": [...], "ts": [...],
    # ... all 40 columns as native numpy/arrow arrays ...
})
con.register("_incoming", arrow_table)
try:
    con.execute(
        "INSERT INTO t SELECT * FROM _incoming "
        "ON CONFLICT (channel, message_id) DO NOTHING"
    )
finally:
    con.unregister("_incoming")
```

Empirical comparison (identical hardware, identical DuckDB 1.5.2 build, identical workload shape):
- **executemany path**: CRASHED at 1m52s wall (single-threaded)
- **Arrow register + INSERT FROM path**: SURVIVED 30 min of MULTI-threaded stress (3 worker threads, 28,448 batches, 39 large merges, 585 COUNT(*) polls). Zero faults.

The Arrow path is ~16x++ longer survival on the same workload. This strongly suggests the bug is specifically in code paths that wrap Python primitives (`py::str` / `py::int` / `py::float`) into the DuckDBPyResult graph — Arrow reads from native C++ `ArrowArray` structures and avoids those wrappers entirely.

## Suspected root cause location

Per source inspection in https://github.com/duckdb/duckdb-python (relevant file `tools/pythonpkg/src/pyresult.cpp` historically, or its equivalent in the current binding):

```cpp
DuckDBPyResult::~DuckDBPyResult() {
    // ...
    py::gil_scoped_release release;   // <-- GIL released here
    result.reset();                   // <-- pybind-owned py::* members freed AFTER GIL release
    current_chunk.reset();            // <-- same problem
}
```

If `result` or `current_chunk` reference any object whose destructor calls into the Python C API (PyObject_Free, Py_DECREF, etc.), that call now executes without a valid `PyThreadState` → SIGSEGV / heap corruption / wedge.

## Suggested fix

Move the GIL release AFTER pybind-owned Python references are dropped:

```cpp
DuckDBPyResult::~DuckDBPyResult() {
    {
        // Drop all pybind-owned references BEFORE releasing GIL
        result.reset();
        current_chunk.reset();
    }
    // Now safe to release GIL for any remaining native-only cleanup
    py::gil_scoped_release release;
    // ... native-only destruction work ...
}
```

OR don't release the GIL in the destructor at all. The destructor is called from Python code (Py_DECREF / refcount=0), which already holds the GIL. The release was presumably an optimization for letting other Python threads run during heavy native cleanup. The safer pattern is to release only inside specific known-safe scopes.

## Workaround (interim, for production users)

Until upstream fixes lands: replace all `con.executemany(SQL, py_tuples)` and per-row `con.execute(SQL, [params])` write paths with `pa.Table + con.register + INSERT FROM register`. Empirically bypasses all 3 fault surfaces.

For SELECT result handling on LARGE result sets, prefer `.arrow()` / `.fetchnumpy()` over `.fetchall()`. Small result sets (<100 rows) appear to NOT trigger the bug empirically — the destructor surface is too small.

## Diagnostic checklist for users encountering this

1. Crash with `Fatal Python error: PyEval_SaveThread: ... (the current Python thread state is NULL)` AND fault offset in `python312.dll` near `PyObject_Free`? → Path 1
2. Crash with no obvious Python traceback, exit code `0xC0000374` (heap corruption)? → Path 2 (look for ucrtbase.dll+0x7caee in WER dump if captured)
3. Process alive at 100% CPU, no log output for minutes, py-spy stack shows wedge inside `con.execute(...)`? → Path 3
4. Same workload survives if you replace `executemany`/`execute(SQL, [params])` with Arrow `register + INSERT FROM`? → Confirms this bug class

## Attachments / references

- Mini dump from a Path 1 reproducer: available on request (large)
- Mini dump from a Path 2 reproducer: available on request (~2 MB)
- py-spy stack from a Path 3 reproducer: available on request

Happy to attach any of these to the issue if maintainers want them.

## Severity

For applications doing per-row INSERT/UPDATE via DuckDB Python on Windows + Python 3.12, this is **catastrophic**: the application chronically crashes or hangs in production. We had observed MTBF of 1-2 hours in a production listener. The Arrow workaround restores stability immediately.

The same root cause produces 3 distinct Windows fault surfaces, making diagnosis non-obvious — operators see crashes with different exit codes and hangs and treat them as separate bugs.

---

(Reported by an external operator who debugged this in production on 2026-05-13. Cross-reference: linked detailed RCA in a follow-up comment if maintainers request.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DuckDBPyResult destructor releases GIL before Python C deallocator completes — single-threaded + concurrent both trigger #456

Summary

Environment

Three empirically-observed Windows fault surfaces (same root)

Minimal reproducer (Path 2)

Empirical bypass — Arrow ingestion

Suspected root cause location

Suggested fix

Workaround (interim, for production users)

Diagnostic checklist for users encountering this

Attachments / references

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Path	Exception code	Faulting site	Process outcome	Diagnostic capture
1	`0xC0000005 EXCEPTION_ACCESS_VIOLATION`	`python312.dll + 0x45886` (= `PyObject_Free + 0x46`)	Terminated immediately	WER LocalDumps (DumpType=1 mini OR DumpType=2 full); SEH-visible
2	`0xc0000409 STATUS_STACK_BUFFER_OVERRUN`	`ucrtbase.dll + 0x7caee` (`__fastfail` thunk)	Terminated immediately	Bypasses normal SEH; WER captures sometimes; schtasks Last Result shows `0xC0000374 STATUS_HEAP_CORRUPTION`
3	(no exception raised)	Wedged inside `con.execute("COMMIT")` (DuckDB native code)	Process ALIVE, writer thread in 100%-CPU spin loop, holds Python-level locks → other threads starved	py-spy native + threads stack trace

DuckDBPyResult destructor releases GIL before Python C deallocator completes — single-threaded + concurrent both trigger #456

Description

Summary

Environment

Three empirically-observed Windows fault surfaces (same root)

Minimal reproducer (Path 2)

Empirical bypass — Arrow ingestion

Suspected root cause location

Suggested fix

Workaround (interim, for production users)

Diagnostic checklist for users encountering this

Attachments / references

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions