Skip to content

Feat/w8 zero alloc consume#149

Draft
lxsaah wants to merge 17 commits into
mainfrom
feat/w8-zero-alloc-consume
Draft

Feat/w8 zero alloc consume#149
lxsaah wants to merge 17 commits into
mainfrom
feat/w8-zero-alloc-consume

Conversation

@lxsaah

@lxsaah lxsaah commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

After 036 W1 removed the per-message Box<dyn Any> from the connector spine, exactly one AimDB-added per-message heap allocation remained on the in-process consume path: the Pin<Box<dyn Future>> that the object-erased BufferReader::recv() constructed on every call. An async fn on an erased trait isn't object-safe without boxing the future — so the box was pure deadweight imposed by the trait signature, not part of the dyn trade-off.

This PR removes it by converting the reader SPI from boxed-future async to an object-safe poll interface, restoring async fn recv().await ergonomics through a thin, allocation-free handle.

Result: 0 AimDB-added heap allocations per message on the in-process consume path (was 1), verified by the new aimdb-bench B0 suite across all three buffer profiles.

The core change

aimdb-core SPI (buffer/traits.rs):

// before — heap-allocates a future box on every call:
fn recv(&mut self) -> Pin<Box<dyn Future<Output = Result<T, DbError>> + Send + '_>>;
// after — object-safe, allocation-free:
fn poll_recv(&mut self, cx: &mut Context<'_>) -> Poll<Result<T, DbError>>;

try_recv is unchanged. The same swap applies to the remote-access JsonBufferReader (recv_jsonpoll_recv_json).

New consumer-facing handles (buffer/reader.rs) restore the ergonomic surface:

  • buffer::Reader<T> (and buffer::JsonReader under remote-access) wrap the erased reader and expose async fn recv() implemented once via core::future::poll_fncore-only, no_std-clean, zero-allocation, no unsafe.
  • Consumer::subscribe, TypedRecord::subscribe, and AimDb::subscribe now return Reader<T> instead of Box<dyn BufferReader<T> + Send>.

Adapter implementations

Adapter Mechanism Allocation
Tokio — broadcast / watch broadcast/watch expose no public poll API, so the reader round-trips its receiver through a single reused ReusableBoxFuture, re-armed after each Ready 1 box per subscriber lifetime, 0 per message
Tokio — Mailbox Notify replaced by an explicit, deduplicated waker list beside the slot; push drains and wakes 0 per message
Embassy Drives embassy-sync's public poll methods directly: Subscriber::poll_next_message, watch::Receiver::poll_changed, Channel::poll_receive 0 per message, no new unsafe
WASM The old WasmRecvFuture::poll body moves verbatim into poll_recv (the box existed solely to satisfy the old trait signature) 0 per message

try_recv on the tokio broadcast/watch readers polls the reused future with Waker::noop()Ready means a value/error is available now, Pending means empty — preserving the prior semantics. The profiling reader memoizes pending_since so a Pending wait on the producer is not counted as consumer processing time.

The embassy poll methods are small, additive public wrappers added to the vendored embassy-sync (Channel::poll_receive is already public; this gives pubsub/watch the matching method). This replaces the initial W8 cut's hand-rolled no_std ReusableBoxFuture, deleting ~80 lines of raw-pointer unsafe.

Benchmarking infrastructure (design 038)

New host-only aimdb-bench crate (excluded from default-members):

  • B0 — allocation count (headline + gate): a counting #[global_allocator] in dedicated bench binaries measures allocs/message. Committed baselines show 0.0 allocs/msg across all three tokio profiles. CI wiring is advisory/report-only for now; a hard gate is a documented follow-up (038 §6).
  • B1/B2 — latency & throughput for the tokio and embassy-host adapters (trend-only on shared CI runners).
  • B3 — on-target cycles via the new examples/embassy-bench-stm32h5 (STM32H5, embassy runtime).

Breaking changes

  • SPI break (adapter authors only): BufferReader::recvpoll_recv; JsonBufferReader::recv_jsonpoll_recv_json. Object safety is preserved.
  • Subscribe return type: Box<dyn BufferReader<T> + Send>Reader<T>.
  • Source-compatible for consumers: subscribe().recv().await is unchanged at every call site — examples and aimdb-pro compile without edits. Holders of a concrete adapter reader wrap it once: Reader::new(Box::new(reader)).
  • Connector SPI unchanged (BYOC-stable): SerializedReader::recv keeps its boxed RecvSerializedFuture; only the inner per-message box is eliminated.

Full inventory in aimdb-core/CHANGELOG.md.

Verification

Check Result
make all — full matrix: build + tests + clippy -D warnings + fmt, incl. wasm32 & thumbv7em cross-compiles ✅ pass
aimdb-core (std+metrics, std+profiling, no_std+alloc+remote-access, remote::) ✅ pass
tokio adapter buffer unit tests (poll_recv/try_recv × broadcast/watch/mailbox, lag, interleaved drain, multi-reader) ✅ 51/51
embassy adapter (host unit + doctests) ✅ 13/13 + doctests
aimdb-bench --benches build · fmt-check (5 changed crates) ✅ pass · clean
B0 allocations/message (committed baseline) ✅ 0.0 ×3 profiles

⚠️ Merge gate — embassy submodule

The _external/embassy submodule is pinned to a fork commit that adds the public poll_next_message / poll_changed wrappers, which the embassy adapter depends on (the workspace compiles embassy-sync from the submodule path). The corresponding upstream embassy PR is pending.

Until it merges, the submodule points at a fork branch rather than upstream, so CI on this PR will not pass the submodule checkout. Merge sequence:

  1. Land the upstream embassy-sync poll-methods PR.
  2. Repoint _external/embassy to the merged upstream SHA (and .gitmodules back to embassy-rs/embassy).
  3. Merge this PR.

This is the single hard gate on merge; all code, tests, docs, and downstream compatibility are otherwise ready.

lxsaah added 14 commits June 16, 2026 20:13
- Introduced `aimdb-bench` crate for benchmarking AimDB with various profiles.
- Implemented allocation counting benchmarks (B0) using `CountingAllocator`.
- Added latency benchmarks (B1) to measure push-to-receive latency.
- Developed throughput benchmarks (B2) for steady-state performance.
- Created pipeline benchmarks for both allocation (B0-Pipeline) and runner-driven throughput (B-Runner-Pipeline).
- Established workload profiles for telemetry, state, and command messages.
- Results are serialized to JSON for easy analysis and comparison.
… SPI

- Replace async recv method with poll_recv for object safety and zero allocations.
- Remove WasmRecvFuture and its associated heap allocation, simplifying the reader implementation.
- Update documentation to reflect the new design and performance metrics.
- Ensure compatibility with existing consumer-facing API by maintaining async recv method.
- Introduce measurement program to validate allocation and performance improvements across different buffer profiles.
- Implemented `b2_throughput_embassy.rs` to measure steady-state throughput using the Embassy buffer backend.
- Added baseline data for allocation metrics in `b0_alloc_embassy.json`.
- Created cycle profiling baseline for STM32H563ZI in `b3_cycles_stm32h5.json`.
- Updated `lib.rs` to include new Embassy benchmarks and profiles.
- Introduced `profiles_embassy.rs` for buffer constructors tailored for the Embassy adapter.
- Set up STM32H563ZI example with necessary configurations and dependencies in `Cargo.toml`.
- Added README documentation for the STM32H563ZI example, detailing usage and results.
- Implemented a build script for linking and memory configuration in `build.rs`.
- Created a flash script for easy deployment to the STM32H563ZI board.
- Established a Rust toolchain file for consistent development environment.
- Developed main benchmarking logic in `main.rs` to measure cycles and allocations for various buffer profiles.
- Updated comments and documentation in `b_alloc_pipeline.rs` to clarify the measurement scope and execution details.
- Improved clarity in `b_runner_pipeline.rs` regarding the benchmarking process and its scope.
- Simplified and clarified the `alloc.rs` documentation, emphasizing the isolation of the counting allocator from production code.
- Revised `lib.rs` documentation to highlight the non-production nature of the benchmarking infrastructure.
- Enhanced comments in `profiles_embassy.rs` to better explain the behavior of Embassy buffers and the importance of lazy subscriber registration.
- Added a new design document `038-aimdb-bench-crate-design.md` outlining the structured benchmarking infrastructure for AimDB.
- Introduced `039-proof-artifact-and-story-roadmap.md` to detail the sequencing of proof artifacts and story publication.
@lxsaah lxsaah self-assigned this Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant