10.11 mdev 38843 new#5310
Conversation
Issue: When an applier fails to apply a write set and its rollback also fails, log_dummy_write_set() was skipped, leaving commit order stuck and locking the cluster (fixed in wsrep-lib). Solution: Add a 3-node test that injects an applier rollback failure via simulate_rollback_failure_in_applier and verifies the node loses the inconsistency vote and disconnects instead of hanging.
Issue: After losing an inconsistency vote the node leaves the primary component and an applier can be torn down with an active transaction. client_state::close() then calls transaction::after_statement(), which asserts m_local, aborting the server in debug builds. Solution: Bump wsrep-lib to route high priority transactions through after_applying() in close() instead of after_statement().
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-38843, where a brute-force applier failure on a node could cause a complete cluster lockup. It introduces a new test case simulating this scenario, updates the wsrep-lib submodule, and adds a DBUG injection to simulate rollback failures in the applier. Feedback on the changes suggests explicitly casting the volatile m_thd->killed enum to int when passing it to the variadic WSREP_WARN function to prevent potential compiler warnings or undefined behavior.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (ret) | ||
| WSREP_WARN("Wsrep_high_priority_service::rollback: trans_rollback " | ||
| "returned %d for thd %lu (killed=%d, seqno=%lld)", | ||
| ret, thd_get_thread_id(m_thd), m_thd->killed, | ||
| (long long) wsrep_thd_trx_seqno(m_thd)); |
There was a problem hiding this comment.
The member m_thd->killed is a volatile enum of type THD::killed_state. Passing a volatile enum directly to a variadic function like WSREP_WARN (which uses printf formatting) can trigger compiler warnings (such as -Wformat or warnings regarding passing volatile-qualified objects to variadic functions) or lead to undefined behavior on some platforms. Casting it explicitly to int using static_cast<int> ensures type safety, portability, and prevents potential compiler warnings. Additionally, ensure %lu is used for thd_get_thread_id() as it returns unsigned long.
if (ret)
WSREP_WARN("Wsrep_high_priority_service::rollback: trans_rollback "
"returned %d for thd %lu (killed=%d, seqno=%lld)",
ret, thd_get_thread_id(m_thd), static_cast<int>(m_thd->killed),
(long long) wsrep_thd_trx_seqno(m_thd));References
- Use the
%luformat specifier forthd_get_thread_id()as it returnsunsigned long.
No description provided.