Skip to content

_hal: pyhalitem holds stale halunion pointer after hal_exit (use-after-free crash on Ubuntu 24.04) #4063

@grandixximo

Description

@grandixximo

Summary

_hal.component returns pyhalitem Python wrappers (from newpin, newparam, getpin) whose internal halitem.u pointer is a value-snapshot into HAL shared memory taken at creation time. When the owning component is exited (comp.exit() / hal_exit()), HAL detaches its shared memory via rtapi_shmem_delete, but the pyhalitem Python objects remain "live" with u pointers that now reference unmapped pages. Any subsequent read/write on those pins through _hal dereferences the freed memory: SIGSEGV on glibc 2.39 (Ubuntu 24.04), silent garbage read on older runtimes.

It is a use-after-free in the caller ("you closed the door and walked through it") but it is invisible from Python: no exception, no warning, just a SIGSEGV.

Where

  • src/hal/halmodule.cc pyhal_pin_new: pypin->pin = *pin; snapshots halitem by value. No back-ref to the owning halobject.
  • src/hal/halmodule.cc pyhal_read_common / pyhal_write_common: dereference item->u unconditionally.
  • src/hal/halmodule.cc pyhal_exit_impl: tears down the component without invalidating outstanding pyhalitem wrappers.
  • src/hal/hal_lib.c:354: on last hal_exit, rtapi_shmem_delete actually detaches the pages, which is when the bug turns into a hard SIGSEGV.

Reproducer

Branch debug/4054-segfault on grandixximo/linuxcnc carries an instrumented halmodule.cc that logs u, ownership, and a mincore() mapping check on every pyhal_read_common. Running the qtdragon ui-smoke test from #4054 in an ubuntu:24.04 docker yields:

[268780.891937 pid=12426] PYHAL_EXIT_CALLED halobject=0x... hal_id=84 name=qtdragon
  Python stack:
    File "/work/bin/qtvcp", line 527, in shutdown
      HAL.exit()
[268780.892578] hal_exit name=qtdragon hal_id=84 map_size=59
[268780.892604] hal_exit marked 59 u's as dead for hal_id=84
[268780.900233] READ u=0x7f4537a97fb0 known=1 alive=0 u_mapped=0
                                                      ^^^^^^^^^^^^ shmem gone
[268780.900240] READ UNSAFE - aborting deref

u_mapped=0 is mincore() reporting the page no longer mapped. Without the safety guard the deref is a SIGSEGV. Happy to share the instrumented diff and the docker repro script.

Real-world callers that hit this

  • qtvcp's QPin.REGISTRY + its update_all QTimer ticks once more after HAL.exit() and reads dead pins. Worked around for qtdragon by stopping the timer first in qtvcp: stop QPin update timer before hal_exit to prevent shutdown SIGSEGV #4062. Other qtvcp screens (gmoccapy, axis, touchy) follow similar lifecycle patterns and may be latent.
  • Any other long-lived Python list of pin handles around process shutdown / component lifecycle change.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions