Skip to content

Conversation

@OxBat
Copy link

@OxBat OxBat commented Jan 27, 2026

Summary

I identified a critical algorithmic defect in AsyncAppender::setBufferSize.
The appender uses a Ring Buffer with modulo arithmetic (index = counter % size).
When the buffer size changes, the modulus changes, invalidating the mapping of existing data.

The Crash:
If the buffer contains wrapped data, the reader calculates indices based on the new size. These indices point to slots that are logically empty (containing nullptr from std::vector::resize). Dereferencing them causes a SEGFAULT.

Technical Analysis

The append method uses a lock-free write path relying on eventCount.
setBufferSize simply called resize(), which breaks the index = count % size invariant for existing data.

Remediation

The patch implements a "Drain and Re-align" strategy:

  1. Creates a new aligned buffer.
  2. Copies pending committed events linearly (starting at index 0).
  3. Resets atomic counters (dispatchedCount, commitCount, eventCount) to match the new linear layout.

Note: This fix assumes the caller pauses logging before resizing (Quiescence), which is required anyway as append() does not hold the mutex during writes.

The setBufferSize method failed to re-align data indices when changing
the modulo (buffer size). This caused readers to access invalid slots
(nullptr), leading to SIGSEGV.
This patch implements a drain-and-reset strategy to safely realign
pending events into the new buffer and reset atomic counters.
@swebb2066
Copy link
Contributor

Note: This fix assumes the caller pauses logging before resizing (Quiescence), which is required anyway as append() does not hold the mutex during writes.

So why is your fix is any better than the current implementation?

@OxBat
Copy link
Author

OxBat commented Jan 28, 2026

Thanks for the challenge @swebb2066.

The difference is fundamental: The current implementation is mathematically broken even in a single-threaded (quiescent) scenario.

Here is exactly why the current code causes a deterministic crash regardless of concurrency, based on AsyncAppender::dispatch:

The Mathematical Flaw
The reader logic at Line 519 relies on a strict invariant:

// Line 519 in AsyncAppender::dispatch
auto index = priv->dispatchedCount % priv->buffer.size();

If priv->buffer.size() changes via setBufferSize (Line 448), the result of the modulus changes, but the data remains in the old slots. This invalidates the mapping immediately.

Scenario (Current Code Failure):

  1. Initial State: BufferSize = 10. dispatchedCount = 15.
    • The data is physically stored at index 5 (15 % 10).
  2. Action: We call setBufferSize(20).
    • Line 448: priv->buffer.resize(20) expands the vector. The existing data stays at index 5. Indices 10-19 contain default-constructed/empty events.
  3. The Crash: The dispatcher thread resumes at Line 519.
    • It calculates: 15 % 20 = 15.
    • It accesses buffer[15].
    • Result: It reads an empty/invalid event instead of the real data at index 5. This leads to data corruption or a crash when dereferencing event.

My Fix (Drain & Re-align)
My patch fixes this by "linearizing" the buffer during resize. It moves the data from the old slots (e.g., index 5) to index 0, 1, 2... and resets the atomic counters to match the new linear layout (dispatchedCount becomes 0).

  1. It moves data from slot 5 to slot 0.
  2. It resets dispatchedCount to 0.
  3. Line 519 becomes: 0 % 20 = 0. Success.

Conclusion:

  • Current Code: Broken by design. It fails 100% of the time if the buffer has wrapped, even if logging is paused.
  • My Fix: Correct by design. It works reliably as long as the caller respects the quiescence contract.

@rm5248
Copy link
Contributor

rm5248 commented Jan 29, 2026

Are you able to create a test to validate this behavior? If it is reliable a test case to validate that it is broken or not would be helpful.

@OxBat
Copy link
Author

OxBat commented Jan 29, 2026

Agreed, that's a fair point. Since the issue is just the modulo logic getting out of sync with the data layout, the bug is deterministic. We can reproduce it reliably without race conditions.

Here is a standalone test case. On master, this asserts or crashes because close() tries to drain from empty slots after the resize. With the fix, the data is realigned and everything passes.

You can add this to src/test/cpp/asyncappendertestcase.cpp:

void testBufferResizeWraparound()
{
    int initialSize = 10;
    int newSize = 20;

    auto asyncAppender = std::make_shared<AsyncAppender>();
    asyncAppender->setBufferSize(initialSize);
    
    auto vectorAppender = std::make_shared<VectorAppender>();
    asyncAppender->addAppender(vectorAppender);
    
    Pool p;
    asyncAppender->activateOptions(p);

    LoggerPtr root = Logger::getRootLogger();

    // 1. warmup: push 5 events and let them drain completely.
    // this moves the internal read head (dispatchedCount) to 5.
    for (int i = 0; i < 5; ++i) {
        auto event = std::make_shared<spi::LoggingEvent>(
            root->getName(), Level::getInfo(), LOG4CXX_STR("Warmup"), 
            spi::LocationInfo::getLocationUnavailable());
        asyncAppender->append(event, p);
    }
    
    // simple spin-wait to ensure dispatcher caught up
    for (int i = 0; i < 100000 && vectorAppender->getVector().size() < 5; ++i) {
        std::this_thread::yield();
    }
    LOGUNIT_ASSERT_EQUAL((size_t)5, vectorAppender->getVector().size());

    // 2. wrap-around: fill the buffer (10 items).
    // since we started at index 5, the data wraps: [10-14, 5-9]
    for (int i = 0; i < 10; ++i) {
        auto event = std::make_shared<spi::LoggingEvent>(
            root->getName(), Level::getInfo(), LOG4CXX_STR("CrashTest"), 
            spi::LocationInfo::getLocationUnavailable());
        asyncAppender->append(event, p);
    }

    // 3. resize while wrapped.
    // without the fix, this leaves data at the old indices, but the read logic 
    // uses the new size modulo, pointing to empty slots.
    asyncAppender->setBufferSize(newSize);

    // 4. trigger drain.
    // master: reads invalid memory/nullptr. 
    // fix: correctly reads realigned data.
    asyncAppender->close(); 
    
    LOGUNIT_ASSERT_EQUAL((size_t)15, vectorAppender->getVector().size());
}

I'm happy to include this test directly in the PR if that helps the review.

@rm5248
Copy link
Contributor

rm5248 commented Jan 29, 2026

yes, please add this to the PR

@swebb2066
Copy link
Contributor

swebb2066 commented Jan 31, 2026

as long as the caller respects the quiescence contract

I believe setBufferSize() could wait for dispatch thread quiescence after changing bufferSize to zero for the duration of the lock guarded changes (bufferSize would need to be an atomic variable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants