MDEV-39600 Reduce contention in buf_flush_ahead()#5078
Conversation
When calling buf_flush_ahead() with furious=false, the function updates the async target buf_flush_async_lsn and, if necessary notifies the page cleaner thread to start flushing more eagerly, to avoid a long stall later when checkpoint will be required. When many threads call buf_flush_ahead() with furious=false, if the update of buf_flush_async_lsn is gated inside buf_pool.flush_list_mutex, this can cause significant contention, just to deliver the new buf_flush_async_lsn value and a potential notification to the page cleaner thread. This commit enables, for the non furious case, lock-free update of buf_flush_async_lsn and lock-free check of the page cleaner idle flag, to avoid the contention on buf_pool.flush_list_mutex. The lock is acquired by the CAS-loop winner thread and only after the page cleaner is observed idle. Since the the reads are not protected by the mutex, there is a chance that the decision to not signal the page cleaner thread is wrong. In the worst case though, under sustained redo-log pressure, eventually a wakeup signal will be delivered to the page cleaner thread anyway, so the risk of a long stall is limited.
|
|
There was a problem hiding this comment.
Pull request overview
This PR reduces contention in buf_flush_ahead() by making the non-furious async flush target update lock-free while preserving the mutex-protected furious checkpoint path.
Changes:
- Splits page cleaner idle state into a separate relaxed atomic flag.
- Changes non-furious
buf_flush_ahead()to updatebuf_flush_async_lsnvia CAS. - Adds
compare_exchange_weak()toAtomic_relaxed.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
storage/innobase/include/buf0buf.h |
Separates page cleaner idle state from page_cleaner_status. |
storage/innobase/buf/buf0flu.cc |
Implements lock-free async LSN bumping and updates page-cleaner wakeup handling. |
include/my_atomic_wrapper.h |
Adds weak compare-exchange support to the atomic wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /* Non-furious: lock-free monotonic-max via CAS-loop. */ | ||
| lsn_t cur= buf_flush_async_lsn; | ||
| while (cur < lsn) | ||
| if (buf_flush_async_lsn.compare_exchange_weak(cur, lsn)) | ||
| goto bumped; | ||
| return; | ||
|
|
||
| bumped: | ||
| /* In non-furious mode, concurrent writes to the log will remain | ||
| possible, and we are gently requesting buf_flush_page_cleaner() | ||
| to do more work to avoid a later call with furious=true. | ||
| We will only wake the buf_flush_page_cleaner() from an indefinite | ||
| my_cond_wait(), but we will not disturb the regular 1-second sleep. | ||
|
|
||
| To reduce contention for a gentle request case, the idle flag of | ||
| buf_pool is read outside of a buf_pool.flush_list_mutex critical | ||
| section, allowing early return. | ||
|
|
||
| If the decision to return early was (unlikely) wrong, subsequent | ||
| wake-side paths (including further buf_flush_ahead() calls under | ||
| redo-log pressure) are expected to perform the wake eventually. */ | ||
| if (!buf_pool.page_cleaner_idle()) | ||
| return; | ||
|
|
||
| /* Signal under buf_pool.flush_list_mutex. */ | ||
| mysql_mutex_lock(&buf_pool.flush_list_mutex); | ||
| goto wake; |
There was a problem hiding this comment.
Code Review
This pull request refactors the page cleaner's idle state management by replacing a bitmask flag with an atomic boolean and optimizes the buf_flush_ahead function using a lock-free CAS loop for non-furious LSN updates. These changes aim to reduce mutex contention by allowing lock-free idle status checks. However, the refactoring introduced a critical compilation error because the wake label was removed while goto wake statements remain in the code. The feedback highlights the need to restore this label to maintain the shared signaling logic.
| @@ -2133,16 +2137,38 @@ ATTRIBUTE_COLD void buf_flush_ahead(lsn_t lsn, bool furious) noexcept | |||
| buf_pool.page_cleaner_set_idle(false); | |||
There was a problem hiding this comment.
The wake label is missing in the refactored buf_flush_ahead function, which will cause a compilation error because goto wake is used later in the non-furious path (line 2171). The label should be placed here to allow both paths to share the signaling logic.
wake:
buf_pool.page_cleaner_set_idle(false);|
|
||
| /* Signal under buf_pool.flush_list_mutex. */ | ||
| mysql_mutex_lock(&buf_pool.flush_list_mutex); | ||
| goto wake; |
There was a problem hiding this comment.
When calling buf_flush_ahead() with furious=false, the function updates the async target buf_flush_async_lsn and, if necessary notifies the page cleaner thread to start flushing more eagerly, to avoid a long stall later when checkpoint will be required.
When many threads call buf_flush_ahead() with furious=false, if the update of buf_flush_async_lsn is gated inside
buf_pool.flush_list_mutex, this can cause significant contention, just to deliver the new buf_flush_async_lsn value and a potential notification to the page cleaner thread.
This commit enables, for the non furious case, lock-free update of buf_flush_async_lsn and lock-free check of the page cleaner idle flag, to avoid the contention on buf_pool.flush_list_mutex. The lock is acquired by the CAS-loop winner thread and only after the page cleaner is observed idle.
Since the the reads are not protected by the mutex, there is a chance that the decision to not signal the page cleaner thread is wrong. In the worst case though, under sustained redo-log pressure, eventually a wakeup signal will be delivered to the page cleaner thread anyway, so the risk of a long stall is limited.