Skip to content

10.11 mdev 38843 new#5310

Open
hemantdangi-gc wants to merge 2 commits into
MariaDB:10.11from
mariadb-corporation:10.11_MDEV-38843_new
Open

10.11 mdev 38843 new#5310
hemantdangi-gc wants to merge 2 commits into
MariaDB:10.11from
mariadb-corporation:10.11_MDEV-38843_new

Conversation

@hemantdangi-gc

Copy link
Copy Markdown
Contributor

No description provided.

Issue: When an applier fails to apply a write set and its rollback also
fails, log_dummy_write_set() was skipped, leaving commit order stuck and
locking the cluster (fixed in wsrep-lib).

Solution: Add a 3-node test that injects an applier rollback failure via
simulate_rollback_failure_in_applier and verifies the node loses the
inconsistency vote and disconnects instead of hanging.
Issue: After losing an inconsistency vote the node leaves the primary
component and an applier can be torn down with an active transaction.
client_state::close() then calls transaction::after_statement(), which
asserts m_local, aborting the server in debug builds.

Solution: Bump wsrep-lib to route high priority transactions through
after_applying() in close() instead of after_statement().

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-38843, where a brute-force applier failure on a node could cause a complete cluster lockup. It introduces a new test case simulating this scenario, updates the wsrep-lib submodule, and adds a DBUG injection to simulate rollback failures in the applier. Feedback on the changes suggests explicitly casting the volatile m_thd->killed enum to int when passing it to the variadic WSREP_WARN function to prevent potential compiler warnings or undefined behavior.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +387 to +391
if (ret)
WSREP_WARN("Wsrep_high_priority_service::rollback: trans_rollback "
"returned %d for thd %lu (killed=%d, seqno=%lld)",
ret, thd_get_thread_id(m_thd), m_thd->killed,
(long long) wsrep_thd_trx_seqno(m_thd));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The member m_thd->killed is a volatile enum of type THD::killed_state. Passing a volatile enum directly to a variadic function like WSREP_WARN (which uses printf formatting) can trigger compiler warnings (such as -Wformat or warnings regarding passing volatile-qualified objects to variadic functions) or lead to undefined behavior on some platforms. Casting it explicitly to int using static_cast<int> ensures type safety, portability, and prevents potential compiler warnings. Additionally, ensure %lu is used for thd_get_thread_id() as it returns unsigned long.

  if (ret)
    WSREP_WARN("Wsrep_high_priority_service::rollback: trans_rollback "
               "returned %d for thd %lu (killed=%d, seqno=%lld)",
               ret, thd_get_thread_id(m_thd), static_cast<int>(m_thd->killed),
               (long long) wsrep_thd_trx_seqno(m_thd));
References
  1. Use the %lu format specifier for thd_get_thread_id() as it returns unsigned long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant