Skip to content

Conversation

@sigmaSd
Copy link
Contributor

@sigmaSd sigmaSd commented Oct 23, 2025

This PR is a +10 hour debugging with + 200 requests to claude, The motivation is I had a "stress" program that discovered many issues, and so this pr:

This commit resolves memory management issues with Python callbacks, including segfaults, premature callback destruction, and resource leaks.

Callbacks had multiple lifetime management issues:

  1. Callbacks were freed prematurely when JS PyObject wrappers were GC'd, even though Python still held references (causing segfaults)
  2. The same Python object could have multiple PyObject wrappers, each registering with refregistry and causing duplicate Py_DecRef calls
  3. Resource leaks occurred when callbacks were never properly cleaned up
  4. Auto-created callbacks (from JS functions) had no cleanup mechanism
  5. Handle python callbacks that are never passed to python

Implement PyCapsule-based callback cleanup:

  1. PyCapsule destructor: When creating a callback, we:

    • Create a PyCapsule containing the callback's handle
    • Use the capsule as the m_self parameter of PyCFunction_NewEx
    • Register a destructor that Python calls when freeing the PyCFunction
    • In the destructor, remove the callback from callbackRegistry and destroy it
  2. Separate lifetime tracking: Callbacks use callbackRegistry instead of refregistry. They are NOT registered with refregistry because:

    • The initial reference from PyCFunction_NewEx is sufficient
    • Cleanup happens via the capsule destructor, not FinalizationRegistry
    • This prevents artificial refcount inflation
  3. Handle-based deduplication: Track registered handles in a Set (not PyObject instances) to prevent duplicate registration when multiple PyObject wrappers exist for the same Python object

  4. Auto-created callback cleanup: callbackCleanupRegistry handles callbacks created from raw JS functions that are never passed to Python

  • src/python.ts: Implement PyCapsule-based callback cleanup
  • src/symbols.ts: Add PyCapsule_New and PyCapsule_GetPointer symbols
  • test/test_with_gc.ts: Add proper cleanup to tests

All existing tests pass with resource sanitization on

This commit resolves memory management issues with Python callbacks,
including segfaults, premature callback destruction, and resource leaks.

Callbacks had multiple lifetime management issues:
1. Callbacks were freed prematurely when JS PyObject wrappers were GC'd,
   even though Python still held references (causing segfaults)
2. The same Python object could have multiple PyObject wrappers, each
   registering with refregistry and causing duplicate Py_DecRef calls
3. Resource leaks occurred when callbacks were never properly cleaned up
4. Auto-created callbacks (from JS functions) had no cleanup mechanism

Implement PyCapsule-based callback cleanup:

1. **PyCapsule destructor**: When creating a callback, we:
   - Create a PyCapsule containing the callback's handle
   - Use the capsule as the m_self parameter of PyCFunction_NewEx
   - Register a destructor that Python calls when freeing the PyCFunction
   - In the destructor, remove the callback from callbackRegistry and destroy it

2. **Separate lifetime tracking**: Callbacks use callbackRegistry instead of
   refregistry. They are NOT registered with refregistry because:
   - The initial reference from PyCFunction_NewEx is sufficient
   - Cleanup happens via the capsule destructor, not FinalizationRegistry
   - This prevents artificial refcount inflation

3. **Handle-based deduplication**: Track registered handles in a Set (not
   PyObject instances) to prevent duplicate registration when multiple
   PyObject wrappers exist for the same Python object

4. **Auto-created callback cleanup**: callbackCleanupRegistry handles
   callbacks created from raw JS functions that are never passed to Python

- src/python.ts: Implement PyCapsule-based callback cleanup
- src/symbols.ts: Add PyCapsule_New and PyCapsule_GetPointer symbols
- test/test_with_gc.ts: Add proper cleanup to tests

All existing tests pass with resource sanitization on
…tration

Callbacks are now only added to callbackRegistry when .owned is called
(i.e., when actually passed to Python), not during creation. This allows
callbackCleanupRegistry to properly clean up callbacks that are created
but never passed to Python.

Changes:
- Store Callback reference in PyObject.#callback during creation
- Register with callbackRegistry in .owned getter when callback is passed to Python
- Add test case for auto-created callbacks that are never passed to Python
@sigmaSd
Copy link
Contributor Author

sigmaSd commented Oct 23, 2025

This is a mini version of the stress test: ( an important part is spamming GC)

import {
  kw,
  NamedArgument,
  type PyObject,
  python,
} from "jsr:@denosaurs/python@0.4.6";

import {
  type Adw1_ as Adw_,
  DenoGLibEventLoop,
  type Gtk4_ as Gtk_,
} from "jsr:@sigma/gtk-py@0.7.0";

const gi = python.import("gi");
gi.require_version("Gtk", "4.0");
gi.require_version("Adw", "1");
const Gtk: Gtk_.Gtk = python.import("gi.repository.Gtk");
const Adw: Adw_.Adw = python.import("gi.repository.Adw");
const GLib = python.import("gi.repository.GLib");
const gcp = python.import("gc");
const el = new DenoGLibEventLoop(GLib); // this is important so setInterval works (by unblockig deno async event loop)

const gcInterval = setInterval(() => {
  gcp.collect();
  gc();
}, 100);

class MainWindow extends Gtk.ApplicationWindow {
  #state = false;
  #f?: PyObject;
  constructor(kwArg: NamedArgument) {
    // deno-lint-ignore no-explicit-any
    super(kwArg as any);
    this.set_default_size(300, 150);
    this.set_title("Awaker");
    this.connect("close-request", () => {
      el.stop();
      clearInterval(gcInterval);
      return false;
    });

    const button = Gtk.ToggleButton(
      new NamedArgument("label", "OFF"),
    );
    const f = python.callback(this.onClick);
    button.connect("clicked", f);
    const vbox = Gtk.Box(
      new NamedArgument("orientation", Gtk.Orientation.VERTICAL),
    );
    vbox.append(button);
    this.set_child(vbox);
  }

  // deno-lint-ignore no-explicit-any
  onClick = (_: any, button: Gtk_.ToggleButton) => {
    this.#state = !this.#state;
    (this.#state) ? button.set_label("ON") : button.set_label("OFF");
  };
}

class App extends Adw.Application {
  #win: MainWindow | undefined;
  constructor(kwArg: NamedArgument) {
    super(kwArg);
    this.connect("activate", this.onActivate);
  }
  onActivate = python.callback((_kwarg, app: Gtk_.Application) => {
    new MainWindow(new NamedArgument("application", app)).present();
  });
}

const app = new App(kw`application_id=${"com.example.com"}`);
app.register();
app.activate();
el.start();

@sigmaSd
Copy link
Contributor Author

sigmaSd commented Oct 23, 2025

The above example is what motivated this:
If you run the above code with deno run -A --v8-flags=--expose-gc it will segfault, because this pyobject const f = python.callback(this.onClick); is Gced by JS because it have no way of know that python still needs it,

I couldn't make this test more minimal or make it a standalone test with no gtk involved, but the point being in a relativey complex project, JS might not see that some callbacks are still needed by python

So to fix this I see 2 solutions:

  • either never free callbacks (by saving them to a global object)
  • or what this PR does, use python capsule to detect when python doesn't need the object anymore and synchronize the 2 Gcs

The pr seems to work well so far

Copy link
Member

@eliassjogreen eliassjogreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks! I looked through the code and can't see anything that's obviously wrong, but it's as you say not the simplest issue nor piece of code... Well documented and easy to follow too! <3

@sigmaSd
Copy link
Contributor Author

sigmaSd commented Oct 25, 2025

Thanks I added a small cleanup, turns out bun tests runs with each commit that's pretty cool, I think its ready to merge

@sigmaSd sigmaSd marked this pull request as draft November 1, 2025 21:05
@sigmaSd
Copy link
Contributor Author

sigmaSd commented Nov 1, 2025

ok unfortunately I tested more complex stuff and I get a segfault when misusing some python libraries both in the latest version of this library and in this PR, here is a breakdown in case someone-else want to tackle this in the future

to reproduce the segfault:

image

it segfaults when calling eq after a while (reproducible 99% of the time)

by commenting out these 2 lines

py.Py_DecRef(args);
it doesn't segfault anymore, but it leaks callbacks, which defeat the whole point of this PR

so what happened , the gist seems to be that z3 internal bindings to c++ and python make this complexity 2 folds, anyhow here is my discussion with claude https://claude.ai/share/09c47928-4fcb-497a-9a48-44c18f4d51d3

this is the last summery of it

  Summary of what we know for certain:
  
  The crash: Segfault at cell 15 when calling cell.__eq__(intVal) during blocking constraints loop
  The fix: Creating fresh Z3 variables with z3.Int(cellNames[i]) instead of reusing cells[i] prevents the crash
  The root cause clue: When first cell is made unconstrained, we get AttributeError: 'ArithRef' object has no attribute 'ctx' instead of segfault - proving Z3 objects lose their context after solver.check()
  The Py_DecRef observation: Commenting out py.Py_DecRef(args) and py.Py_DecRef(kwargs) in the call() method prevents the segfault
  What this means: The decrefs are freeing Python objects that Z3's internal C++ code still holds references to, creating dangling pointers that cause segfaults when accessed later
  
  Most likely explanation: Z3 variables become invalid after solver.check() modifies internal state. Reusing them + the argument cleanup causes Z3 to access freed memory when it tries to perform operations on stale AST nodes.

So I'm going to open a pr that makes auto-callbacks just leak always , so we can just document that its a convenience function and if the user want to control memory they should use python.callback

@sigmaSd sigmaSd mentioned this pull request Nov 2, 2025
@sigmaSd
Copy link
Contributor Author

sigmaSd commented Nov 2, 2025

closed in favor of #89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants