Skip to content

dialog: fix use-after-free and race in cluster replication#3860

Open
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:fix/dialog-cluster-sipi-crash
Open

dialog: fix use-after-free and race in cluster replication#3860
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:fix/dialog-cluster-sipi-crash

Conversation

@NormB
Copy link
Copy Markdown
Member

@NormB NormB commented Mar 31, 2026

Summary

Fix three bugs triggered when SIP-I messages with binary ISUP data are replicated across a dialog cluster with reinvite pinging enabled.

  • use-after-free in dlg_replicated_create: after _link_dlg_unsafe() links the dialog into the hash table, DLG_BIN_POP failures jumped to pre_linking_error which calls destroy_dlg() without unlinking — leaves a dangling pointer in the hash chain
  • TOCTOU race in write_dialog_vars: read lock released between the sizing pass and the write pass, allowing concurrent store_dlg_value() to corrupt the buffer
  • OOB read in strip_esc: *(c+1) read past string end when last byte is backslash

Reproduction

2-node cluster, 16 workers, reinvite_ping_interval=5, 300 CPS with multipart SDP+ISUP bodies and concurrent re-INVITEs. Unpatched: SIGSEGV in free_dlg_dlg()shm_free(0xabcdefedabcdefed) (freed-memory poison). Patched: same load, zero crashes.

Closes #3858

Fix three bugs triggered when SIP-I messages with binary ISUP data
are replicated across a dialog cluster with reinvite pinging enabled.

1. dlg_replicated_create: after _link_dlg_unsafe() links the dialog
   into the hash table, subsequent DLG_BIN_POP failures jumped to
   pre_linking_error which calls destroy_dlg() without unlinking.
   This leaves a dangling pointer in the hash chain — other workers
   dereference freed memory (GPF). Add post_linking_error label that
   calls unlink_unsafe_dlg() before destroy.

2. write_dialog_vars: the read lock on vals_lock was released between
   the sizing pass and the write pass. A concurrent store_dlg_value()
   (e.g. from persist_reinvite_pinging storing multipart SDP+ISUP
   bodies) can modify the vals list in between, causing a buffer
   overflow and corrupted serialization. Hold the read lock through
   both passes.

3. strip_esc: when len==1 and *c is backslash, *(c+1) reads one byte
   past the string. Add len>1 guard.

Closes OpenSIPS#3858
@NormB NormB requested a review from liviuchircu March 31, 2026 17:39
@NormB NormB marked this pull request as ready for review April 1, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CRASH]Clusterer + SIP-I + Dialog (sharing)

1 participant