Issue #58 added automatic stage profiling for .source(), .tap(), and .link() (RFC docs/design/014-M6-stage-profiling.md). Transforms were intentionally scoped out and left as a stub. This issue finishes the job: make .transform() and .transform_join() first-class stage-profiling participants.
Why this matters (user-visible symptom)
The get_stage_profiling MCP tool reports a bottleneck = the stage with the highest average wall-clock time, iterating whatever the snapshot returns (tools/aimdb-mcp/src/tools/profiling.rs:63). Because transforms never appear in the snapshot, a transform that is the real bottleneck is invisible, and the tool silently flags the slowest of the remaining source/tap/link stages instead. This is a correctness gap in a shipped tool's output, not just a missing metric.
Current state — verified 2026-06-29
Already wired (transform-aware, do not redo):
RecordProfilingMetrics::set_stage_name() already handles StageKind::Transform → transforms vec (aimdb-core/src/profiling/record_profiling.rs:113-123). Naming will work as soon as a real index is stored (see below).
reset_all() already chains transforms (record_profiling.rs:126-136).
- The MCP tool already iterates the snapshot generically and computes the bottleneck by max-avg (
tools/aimdb-mcp/src/tools/profiling.rs:63-87), so once the snapshot emits transform entries it picks them up for free — no MCP change needed.
Still missing (the actual work):
transforms: Vec<StageEntry> is a dead stub: #[allow(dead_code)] at record_profiling.rs:36.
- No
push_transform() and no transforms() accessor (only push_source/tap/link and sources()/taps()/links() exist).
snapshot() iterates only source/tap/link (aimdb-core/src/profiling/info.rs:53-65) — transforms are never reported.
.transform() / .transform_join() store last_stage = Some((StageKind::Transform, 0)) with a hardcoded 0 (aimdb-core/src/typed_api.rs:562 and :579), so .with_name(...) silently no-ops past the first transform.
run_single_transform calls transform_fn(&input_value, &mut state) untimed (aimdb-core/src/transform/single.rs:186) — no Clock/StageMetrics.
- The join task output producer is not instrumented.
Goal
get_stage_profiling returns a stage_type: "transform" entry per registered transform, with the same call_count / avg / min / max shape as the other stages.
.with_name("...") after a .transform* registration is honored and surfaces in the MCP output.
- Existing source / tap / link behavior is unchanged.
- Off by default, zero-cost when the
profiling feature is disabled, still builds on no_std + alloc + Embassy.
Where to implement — mirror existing code
This is a "follow the .tap() / .source() pattern" task. The four moving parts and their existing analogues:
| Part |
Mirror this |
Add for transform |
| Registration |
push_tap() (record_profiling.rs:60) |
push_transform(); remove #[allow(dead_code)]; add a transforms() accessor mirroring taps() (record_profiling.rs:103) |
Real index in last_stage |
.tap() stores (StageKind::Tap, idx) (typed_api.rs:478) |
In transform / transform_raw / transform_join / transform_join_raw, call push_transform() and store the returned idx instead of 0 |
| Spawn-time metric attach |
collect_producer_future does producer.set_profiling(entry.metrics.clone(), clock) (typed_record.rs:1056); collect_consumer_futures does the consumer equivalent (:1020) |
In collect_transform_futures (typed_record.rs:763), the output Producer<T> is created at :782 and passed to build_fn at :784 — attach set_profiling there, gated on #[cfg(feature = "profiling")] |
| Reporting |
snapshot() tuple list (info.rs:55-59) |
Add ("transform", self.transforms()) to the iteration (order: sources → taps → links → transforms) |
Timing semantics (decided — no open question for the contributor)
Single-input transforms (.transform() / .transform_raw()): time the closure itself — "time per input value processed."
- Thread an
Option<Arc<StageMetrics>> through create_single_transform_descriptor → build_fn → run_single_transform (transform/single.rs:112 / :146).
- Wrap the
transform_fn(&input_value, &mut state) call at single.rs:186 with Clock::now() before/after; record on both Some(output) and None (a None still consumed an input).
Multi-input joins (.transform_join() / .transform_join_raw()): the Design 027 task-model handler owns its own event loop, so we can't time per-trigger work without intruding into user code. Instead, time the interval between successive Producer::produce() calls on the output producer — the same trick ProducerProfilingState already uses for .source(). Interpretation: "average time between successive join outputs" (a throughput proxy). This falls out of the collect_transform_futures attach point above for free, since the join also produces through that Producer<T>.
Rejected alternative: using producer-cadence for single transforms too (simpler/unified). It conflates closure time with input-wait time, so single transforms get explicit per-call timing instead.
Suggested implementation order
push_transform() + transforms() accessor + drop #[allow(dead_code)] (record_profiling.rs). Unit-test it like the existing push_assigns_sequential_indices test.
- Real
idx in last_stage for all four .transform* entry points (typed_api.rs). Unit-test that .with_name() lands on the right transform index (naming already works via set_stage_name).
- Add transforms to
snapshot() (info.rs).
- Instrument
run_single_transform (per-call Clock timing) + thread metrics through the descriptor.
- Attach
ProducerProfilingState in collect_transform_futures for the join output cadence.
- Tests + example + RFC update.
Tasks
How to verify
# Unit + integration (feature is off by default — turn it on explicitly)
cargo test -p aimdb-core --features profiling
cargo test -p aimdb-tokio-adapter --features profiling
# Zero-cost-when-off + full workspace gate
make check
# Embassy cross-compile (thumbv7em-none-eabihf); already gates embassy-runtime,profiling
make test-embedded
# Eyeball the MCP output end-to-end: run examples/remote-access-demo with the
# profiling feature and call the get_stage_profiling tool; confirm a
# stage_type:"transform" entry appears and is selected as bottleneck when slowest.
Acceptance criteria
- With
--features profiling, get_stage_profiling for a record with a transform returns a stage_type: "transform" entry containing call_count, avg_time_ns, min_time_ns, max_time_ns, plus the name set via .with_name(...).
- When the transform is the slowest stage,
bottleneck in the MCP response points at it with the human-readable recommendation string.
- Without the
profiling feature: no overhead, no API change (verify make check and a default-features build).
- Embassy
thumbv7em-none-eabihf build still passes with the feature on.
- All existing tests continue to pass.
Out of scope
- CPU-time tracking (still wall-clock only).
- Histogram / percentile metrics.
- Per-input-port timing for joins (single aggregate "output cadence" only — per-port timing would mean instrumenting the fan-in forwarders, a separate piece of work if it turns out to be needed).
Prerequisites / difficulty
Mid-sized. Comfortable Rust + async, plus care with #[cfg(feature = "profiling")] gating and no_std + alloc constraints (no std-only types on the core path). Every piece has a working source/tap/link analogue to copy, and there are no unresolved design decisions — timing semantics and the stage ordering are settled above.
Related
Issue #58 added automatic stage profiling for
.source(),.tap(), and.link()(RFCdocs/design/014-M6-stage-profiling.md). Transforms were intentionally scoped out and left as a stub. This issue finishes the job: make.transform()and.transform_join()first-class stage-profiling participants.Why this matters (user-visible symptom)
The
get_stage_profilingMCP tool reports abottleneck= the stage with the highest average wall-clock time, iterating whatever the snapshot returns (tools/aimdb-mcp/src/tools/profiling.rs:63). Because transforms never appear in the snapshot, a transform that is the real bottleneck is invisible, and the tool silently flags the slowest of the remaining source/tap/link stages instead. This is a correctness gap in a shipped tool's output, not just a missing metric.Current state — verified 2026-06-29
Already wired (transform-aware, do not redo):
RecordProfilingMetrics::set_stage_name()already handlesStageKind::Transform→transformsvec (aimdb-core/src/profiling/record_profiling.rs:113-123). Naming will work as soon as a real index is stored (see below).reset_all()already chainstransforms(record_profiling.rs:126-136).tools/aimdb-mcp/src/tools/profiling.rs:63-87), so once the snapshot emits transform entries it picks them up for free — no MCP change needed.Still missing (the actual work):
transforms: Vec<StageEntry>is a dead stub:#[allow(dead_code)]atrecord_profiling.rs:36.push_transform()and notransforms()accessor (onlypush_source/tap/linkandsources()/taps()/links()exist).snapshot()iterates only source/tap/link (aimdb-core/src/profiling/info.rs:53-65) — transforms are never reported..transform()/.transform_join()storelast_stage = Some((StageKind::Transform, 0))with a hardcoded0(aimdb-core/src/typed_api.rs:562and:579), so.with_name(...)silently no-ops past the first transform.run_single_transformcallstransform_fn(&input_value, &mut state)untimed (aimdb-core/src/transform/single.rs:186) — noClock/StageMetrics.Goal
get_stage_profilingreturns astage_type: "transform"entry per registered transform, with the samecall_count/avg/min/maxshape as the other stages..with_name("...")after a.transform*registration is honored and surfaces in the MCP output.profilingfeature is disabled, still builds onno_std + alloc+ Embassy.Where to implement — mirror existing code
This is a "follow the
.tap()/.source()pattern" task. The four moving parts and their existing analogues:push_tap()(record_profiling.rs:60)push_transform(); remove#[allow(dead_code)]; add atransforms()accessor mirroringtaps()(record_profiling.rs:103)last_stage.tap()stores(StageKind::Tap, idx)(typed_api.rs:478)transform/transform_raw/transform_join/transform_join_raw, callpush_transform()and store the returnedidxinstead of0collect_producer_futuredoesproducer.set_profiling(entry.metrics.clone(), clock)(typed_record.rs:1056);collect_consumer_futuresdoes the consumer equivalent (:1020)collect_transform_futures(typed_record.rs:763), the outputProducer<T>is created at:782and passed tobuild_fnat:784— attachset_profilingthere, gated on#[cfg(feature = "profiling")]snapshot()tuple list (info.rs:55-59)("transform", self.transforms())to the iteration (order: sources → taps → links → transforms)Timing semantics (decided — no open question for the contributor)
Single-input transforms (
.transform()/.transform_raw()): time the closure itself — "time per input value processed."Option<Arc<StageMetrics>>throughcreate_single_transform_descriptor→build_fn→run_single_transform(transform/single.rs:112/:146).transform_fn(&input_value, &mut state)call atsingle.rs:186withClock::now()before/after; record on bothSome(output)andNone(aNonestill consumed an input).Multi-input joins (
.transform_join()/.transform_join_raw()): the Design 027 task-model handler owns its own event loop, so we can't time per-trigger work without intruding into user code. Instead, time the interval between successiveProducer::produce()calls on the output producer — the same trickProducerProfilingStatealready uses for.source(). Interpretation: "average time between successive join outputs" (a throughput proxy). This falls out of thecollect_transform_futuresattach point above for free, since the join also produces through thatProducer<T>.Suggested implementation order
push_transform()+transforms()accessor + drop#[allow(dead_code)](record_profiling.rs). Unit-test it like the existingpush_assigns_sequential_indicestest.idxinlast_stagefor all four.transform*entry points (typed_api.rs). Unit-test that.with_name()lands on the right transform index (naming already works viaset_stage_name).snapshot()(info.rs).run_single_transform(per-callClocktiming) + thread metrics through the descriptor.ProducerProfilingStateincollect_transform_futuresfor the join output cadence.Tasks
RecordProfilingMetrics::push_transform()+transforms()accessor + remove#[allow(dead_code)]snapshot()includes transforms insources → taps → links → transformsorder.transform*entry points register a stage and store the real index inlast_stageOption<Arc<StageMetrics>>throughrun_single_transform; wrap thetransform_fncall withClockmeasurement (record onSomeandNone)ProducerProfilingStateto the producer incollect_transform_futuresso join output cadence is timedwith_namelands on the right index; adjacent records don't cross-talkaimdb-tokio-adapter/tests/: source → transform → tap pipeline; assertmin ≤ avg ≤ maxon the transform stage and that the slowest-by-design stage is reported asbottleneckexamples/remote-access-demo: add one transform-driven record soget_stage_profilingreturns astage_type: "transform"entry; update the README stage tabledocs/design/014-M6-stage-profiling.md"Implementation notes" to describe transform timing semantics (per-call for single, output-interval for join)How to verify
Acceptance criteria
--features profiling,get_stage_profilingfor a record with a transform returns astage_type: "transform"entry containingcall_count,avg_time_ns,min_time_ns,max_time_ns, plus the name set via.with_name(...).bottleneckin the MCP response points at it with the human-readable recommendation string.profilingfeature: no overhead, no API change (verifymake checkand a default-features build).thumbv7em-none-eabihfbuild still passes with the feature on.Out of scope
Prerequisites / difficulty
Mid-sized. Comfortable Rust + async, plus care with
#[cfg(feature = "profiling")]gating andno_std + allocconstraints (nostd-only types on the core path). Every piece has a working source/tap/link analogue to copy, and there are no unresolved design decisions — timing semantics and the stage ordering are settled above.Related
.transform()follow-up left explicitly out of scope by Integrate Stage Profiling #58.docs/design/014-M6-stage-profiling.mdno_stdTransform API) defines the join task model this builds on.