williamwutq · williamwutq · Jun 11, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,9 +10,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 
 - `BStack::process_gen` (Rust) / `bstack_process_gen` (C) (`set` + `atomic`): generator/callback-driven primitive that acquires the write lock once and holds it across a sequence of dependent reads ending in at most one mutating operation (`Write`, `Swap`, `Push`, or `Pop`), which always ends the sequence. Closes the ABA window that a `get_batched_gen` (read, release lock) + `cas` (re-acquire, compare, write) pairing would otherwise leave open for allocator-mutex-free pop-style algorithms — see `examples/atomic_linked_list.rs` / `examples/atomic_linked_list.c` for a worked free-list push/pop demonstration.
-- `BStackGenOp<'a>` (Rust) / `bstack_gen_op_t` (C) (`set` + `atomic`): non-exhaustive enum (Rust) / tagged union (C) of operations yielded by `process_gen`'s closure/callback — `Read { offset, buf }`, `Len { out }`, `Write { offset, data }`, `Swap { a_offset, b_offset, len }`, `Push { data }`, and `Pop { buf }`. `Write`, `Swap`, `Push`, and `Pop` are the only mutating variants — exactly one is permitted per call, and any one of them ends the sequence immediately; `Read`/`Len` do not end the sequence. The Rust enum derives `Debug` (intentionally not `PartialEq`/`Eq`/`Hash` — see the type's doc comment).
+- `BStackGenOp<'a>` (Rust) / `bstack_gen_op_t` (C) (`set` + `atomic`): non-exhaustive enum (Rust) / tagged union (C) of operations yielded by `process_gen`'s closure/callback — `Read { offset, buf }`, `Len { out }`, `Write { offset, data }`, `Swap { a_offset, b_offset, len }`, `Push { data }`, `Pop { buf }`, and `Discard { len }` (Rust; in C, a `Pop` with a `NULL` destination buffer). `Write`, `Swap`, `Push`, `Pop`, and `Discard` are the only mutating variants — exactly one is permitted per call, and any one of them ends the sequence immediately; `Read`/`Len` do not end the sequence. The Rust enum derives `Debug` (intentionally not `PartialEq`/`Eq`/`Hash` — see the type's doc comment).
 - `BStackGenOp::Push { data }` / `BSTACK_GEN_PUSH` and `BStackGenOp::Pop { buf }` / `BSTACK_GEN_POP` (Rust + C, `set` + `atomic`): in-sequence equivalents of `push`/`pop` for `process_gen` — `Push` appends `data` and `Pop` removes the last `buf.len()` bytes into `buf`, growing/shrinking the payload. Like `Write` and `Swap`, exactly one of `Write`/`Swap`/`Push`/`Pop` is permitted per call and any one of them ends the sequence immediately. `Pop` errors if it would remove more than the current payload or shrink it below the locked length.
 - `BStackGenOp::Len { out }` / `BSTACK_GEN_LEN` (Rust + C, `set` + `atomic`): writes the current logical payload size into `out` and, unlike the mutating variants, does not end the sequence — the in-sequence equivalent of `len`, useful when a later step's offset or length depends on the current payload size.
+- `BStackGenOp::Discard { len }` (Rust) / `BSTACK_GEN_POP` with a `NULL` `u.pop.buf` (C) (`set` + `atomic`): removes the last `len` bytes from the end of the file without reading them back, shrinking the payload and ending the sequence — the in-sequence, buffer-free equivalent of `discard` and the counterpart of `Pop`. Useful for truncating a tail whose size is only known once earlier `Read`s/`Len` have resolved, without allocating a throwaway buffer. In Rust this is a dedicated variant (slices cannot be null); in C it is expressed idiomatically as a `Pop` whose destination pointer is `NULL`. Errors on the same conditions as `Pop`.
+
+### Changed
+
+- **`SlabBStackAllocator` and `CheckedSlabBStackAllocator` — `alloc` / `dealloc` / `realloc` are now lock-free under the `atomic` feature** (`alloc` + `set` features): The allocator-level `Mutex` that previously serialised free-list push/pop is gone from these paths. Free-list pop now drives a single `BStack::process_gen` sequence (read `free_head`, read the popped block's `next`, advance `free_head` — all under one held `BStack` write lock, closing the ABA window a `get`/`cas` pair would leave open); free-list push splices a single block or a whole freed run onto the head with one `BStack::cross_exchange`; tail grow/shrink use `BStack::try_extend_zeros` / `BStack::try_discard` (atomic check-and-act under `BStack`'s own write lock). `SlabBStackAllocator` drops its allocator-level `Mutex` entirely and is `Sync` purely through `BStack`'s interior mutability. `CheckedSlabBStackAllocator` retains a `Mutex` solely for `recover` (see below); none of `alloc` / `dealloc` / `realloc` take it. The on-disk format is unchanged — no magic-number bump.
+- **`CheckedSlabBStackAllocator::recover` runs under its own mutex** (`alloc` + `set` features, `atomic`): the `Mutex` is held for the full call solely to keep recovery single-flight, preventing two concurrent runs from reclaiming the same leaked block twice. The scan itself (free-list walk, arena classification, and its one optional tail discard) runs as a single `BStack::process_gen` sequence, so the `BStack` write lock — not the `Mutex` — serialises it against the lock-free `alloc` / `dealloc` / `realloc`. Ordinary `alloc` / `dealloc` / `realloc` never take the `Mutex`.
 
 ## [0.2.4] - 2026-06-07
 

diff --git a/PLANNED.md b/PLANNED.md
@@ -180,66 +180,6 @@ The same change applies to the corresponding methods on `BStackGuardedSlice`, an
 
 ---
 
-## Lock-free free list in `SlabBStackAllocator` and `CheckedSlabBStackAllocator`
-
-**Feature flag:** `atomic`
-**Breaking change:** No (internal implementation change only)
-
-### Motivation
-
-Under the `atomic` feature, `SlabBStackAllocator` guards every free-list mutation with an internal `Mutex<()>`. This mutex serialises `push_free_block`, `push_free_blocks`, and `pop_free_block` across threads, preventing concurrent alloc/dealloc from racing on `free_head`. While correct, the mutex is a point of contention: all threads allocating or deallocating single-block regions must queue behind it, even though the underlying `BStack` already provides atomic compound operations — `cross_exchange` for a lock-free push, and `get_batched_gen` + `cas` for a compare-and-swap pop — that could serve the same role without an allocator-level lock.
-
-The goal is to remove the `Mutex<()>` entirely and replace every free-list path with sequences of BStack primitives that are safe under concurrent `&self` access.
-
-`CheckedSlabBStackAllocator` carries the same `Mutex<()>` and the same free-list pop/push structure as `SlabBStackAllocator`, so whatever solution is adopted here applies to it by extension with no additional design work.
-
-### Design
-
-#### Push: lock-free prepend via `cross_exchange`
-
-To push block `b` (at payload offset `b_addr`) onto the free list:
-
-1. **Plant a self-pointer placeholder.** Write `b_addr` as a little-endian `u64` into the first eight bytes of `b` — i.e., call `stack.set(b_addr, b_addr.to_le_bytes())`. This seeds the slot that will become `b->next` with a safe, in-bounds value.
-
-2. **Atomically splice `b` in as the new head.** Call `stack.cross_exchange(b_addr, FREE_HEAD_OFFSET, 8)`. `cross_exchange` swaps the eight bytes at `b_addr` with the eight bytes at `FREE_HEAD_OFFSET` under a single write lock. Before the call the slot at `b_addr` holds `b_addr` and the slot at `FREE_HEAD_OFFSET` holds the current head `H`; after the call `FREE_HEAD_OFFSET` holds `b_addr` (b is now head) and `b_addr` holds `H` (b's next pointer is the old head). The self-pointer written in step 1 is never observed by any reader: `cross_exchange` atomically replaces it with `H` at the same moment it publishes `b` as the new head.
-
-The call to `set` in step 1 and the call to `cross_exchange` in step 2 are not jointly atomic — a crash between them leaves the self-pointer sitting in `b->next` with `free_head` still pointing to the old head. This leaks `b` rather than corrupting the list, matching the crash-safety class already documented for `push_free_block`.
-
-Push is inherently race-free without any allocator-level mutex: even if two threads push concurrently, each `cross_exchange` is atomic with respect to the other, and each thread's block ends up correctly linked into the list (though their relative order at the head is not deterministic).
-
-#### Pop: single-lock dependent sequence via `process_gen`
-
-To pop the head block from the free list:
-
-1. **Run the whole pop as one `process_gen` sequence.** Drive `stack.process_gen` through a small state machine:
-   - *Step 0* — issue `Read { offset: FREE_HEAD_OFFSET, buf: head_buf }` to read the current head pointer.
-   - *Step 1* — once `head_buf` is populated, parse `head_val`. If `head_val == SENTINEL`, the list is empty: return `None`, ending the sequence with nothing popped — fall through to the tail-extension branch. Otherwise remember `head_val` and issue `Read { offset: head_val, buf: next_buf }` to read that block's `next` pointer.
-   - *Step 2* — issue `Write { offset: FREE_HEAD_OFFSET, data: next_buf }`, replacing the head with the next block and ending the sequence. The caller now owns `head_val`.
-
-The crucial property is that `process_gen` acquires the BStack write lock *before* the first read and holds it, unreleased, across every subsequent read and the terminating write. The read of `free_head`, the read of its `next` pointer, and the write that advances `free_head` all happen as one indivisible critical section — not as separate lock acquisitions that another thread's operations could interleave between.
-
-#### Why a CAS-based design would be unsafe, and how `process_gen` avoids it
-
-A more "obvious" design would pair `get_batched_gen` (read `head` and `head->next` under a read lock, then release it) with `cas` (re-acquire the write lock, compare, and conditionally write `free_head`). That pairing leaves a race window between releasing the read lock and acquiring the write lock — and the ABA problem exploits exactly that window: `free_head` can return to the same byte value it held at read time even though the list structure underneath has completely changed.
-
-**Concrete example.** Suppose the free list is `head → H0 → H1 → H2 → …`:
-
-1. **Thread A** reads `head = H0` and `H0->next = H1`, then releases its read lock.
-2. **Thread B** pops H0 (`free_head`: H0 → H1). H0 is now live.
-3. **Thread B** pops H1 (`free_head`: H1 → H2). H1 is now live.
-4. **Thread B** deallocates H0 — push: writes `H0->next = H2` (the current head), then sets `head = H0`. The free list is now `H0 → H2 → …`.
-5. **Thread A** fires its CAS: re-reads `FREE_HEAD_OFFSET`, sees `H0` — matching what it read in step 1 — and writes `H1`. The CAS "succeeds".
-
-`free_head` is now `H1`, but H1 is still live (allocated to Thread B in step 3): the next `alloc` hands it out a second time — a silent double allocation that no error reports. A no-retry-on-failure policy would not help here; the corruption comes from a *successful* CAS, one that cannot distinguish "head is still H0 because nothing changed" from "head is H0 again because it cycled back".
-
-`process_gen` makes this scenario structurally impossible rather than merely improbable. Because Thread A would hold the *same* write lock continuously from its first read of `free_head` through its final write, none of Thread B's steps — pop H0, pop H1, push H0 back — could execute in between; every one of them needs that same write lock and so simply blocks until Thread A's whole sequence, including the terminating write, has completed. There is no window in which `free_head` can change value and cycle back, so the byte-value re-comparison that CAS relies on — and that ABA defeats — is unnecessary. No generation counter, no on-disk format change, and no retry policy are needed; the allocator only has to drive the read-read-write sequence through `process_gen`, which already owns the single lock acquisition that makes it atomic.
-
-### Open questions
-
-- **Batch push under concurrency.** `push_free_blocks` currently reads `free_head`, builds the entire linked chain in a buffer, then writes the chain and updates `free_head` in two separate calls — not atomic with respect to concurrent pushes. A lock-free batch push requires either (a) holding the allocator mutex for batch operations only, (b) building the chain tail-to-head and using the same `cross_exchange` trick, or (c) a new BStack primitive. The single-block push via `cross_exchange` is already safe; the batch case needs its own solution before the mutex can be removed entirely.
-
----
-
 ## Typed region and I/O parameter types
 
 **Feature flag:** None (additive API surface)