feat: add topology-aware adaptive controls#282
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds topology-aware adaptive controls for convergence, hint load-shedding, and tool-plan drift across the adaptive stack, and updates the supporting config surfaces, runtime wiring, tests, docs, and one pre-commit hook. ChangesTopology-aware adaptive controls
Pre-commit ty hook path pattern fix
Sequence Diagram(s)ACG convergence pathsequenceDiagram
participant AcgLearner
participant Backend
participant ConvergenceDetector
participant HotCache
AcgLearner->>Backend: load cached stability and observations
Backend-->>AcgLearner: cached state
AcgLearner->>ConvergenceDetector: record_epoch(betti, drift, error)
ConvergenceDetector-->>AcgLearner: stability.converged
AcgLearner->>Backend: store updated observations and stability
AcgLearner->>HotCache: update cached stability
Hint governor pathsequenceDiagram
participant AdaptiveHintsIntercept
participant HintGovernor
participant GeometricGovernor
AdaptiveHintsIntercept->>HintGovernor: allow(latency_sensitivity)
HintGovernor->>GeometricGovernor: should_trigger(deviation)
GeometricGovernor-->>HintGovernor: allow / deny
HintGovernor-->>AdaptiveHintsIntercept: inject / skip AgentHints
Tool plan drift pathsequenceDiagram
participant ToolParallelismLearner
participant DriftDetector
participant HotCache
ToolParallelismLearner->>DriftDetector: update(cohort_feature_vector)
DriftDetector-->>ToolParallelismLearner: drift decision
alt drift detected
ToolParallelismLearner->>HotCache: replace stored plan with empty execution plan
else drift not detected
ToolParallelismLearner->>HotCache: load existing execution plan
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)
82-94: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAssert
convergedin stability round-trip tests.The fixture now includes
converged, but the round-trip test only checksstable_prefix_lengthandtotal_observations. Add an assertion forloaded_stability.convergedto lock this storage contract, since runtime behavior reads this field.Proposed test assertion
assert_eq!(loaded_stability.stable_prefix_length, 1); assert_eq!(loaded_stability.total_observations, 3); + assert!(!loaded_stability.converged);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 82 - 94, In the round-trip test that uses the sample_stability fixture, add an assertion to verify that loaded_stability.converged equals the value set in the sample_stability function (which is false). This assertion should be added alongside the existing assertions for stable_prefix_length and total_observations to ensure the converged field is properly persisted and loaded during the serialization round-trip.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 150-156: The stability_window parameter in the new method is only
being clamped to a minimum value using max(MIN_STABILITY_WINDOW), but is not
clamped to a maximum value. Since the RingBuffer instances (betti_history,
drift_history, error_history) have a fixed capacity of MAX_HISTORY (32), if
stability_window exceeds this capacity, methods like is_betti_stable and
is_drift_decreasing will never satisfy their length checks, making
topology-based convergence unreachable. Apply both minimum and maximum clamping
to stability_window by using a method that clamps it between
MIN_STABILITY_WINDOW and MAX_HISTORY.
- Line 166: The `epoch` field is typed as `u32` and the increment operation on
line 166 (self.epoch += 1) can cause integer overflow and wraparound in
long-running detectors, breaking epoch-based gating and diagnostics. Change the
type of the `epoch` field from `u32` to `u64` throughout the convergence module
to prevent wraparound and ensure reliable epoch tracking for the lifetime of the
detector.
In `@crates/adaptive-topology/src/geometry.rs`:
- Around line 252-257: The pruning_ratio method is counting true values across
the entire active_mask array, but only the first self.counts[0] entries
represent actual populated level-0 blocks. When the mask contains true entries
beyond self.counts[0], this causes the calculation to produce invalid ratios.
Restrict the active count calculation to only the first self.counts[0] elements
of active_mask by using slice notation to iterate over
active_mask[..self.counts[0]] instead of the full active_mask.
In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The assertion in the `new` method uses `D * tau` as the
validation threshold, but the actual embedding definition only requires `(D - 1)
* tau + 1` samples. Update the assertion logic to use the correct formula `(D -
1) * tau + 1` instead of `D * tau`, and also apply the same correction to the
embed readiness threshold check around lines 121-123 to ensure consistency and
avoid unnecessarily delaying valid embeddings.
- Around line 239-246: The issue in the compute_betti_0 function is that nodes
are not marked as visited when pushed onto the stack, only when popped, causing
the same neighbor to be pushed multiple times. Under dense connectivity, this
wastes the fixed-size stack space and can cause the stack to overflow, skipping
reachable nodes and producing incorrect Betti number calculations. Fix this by
marking a neighbor as visited immediately when it is pushed onto the stack in
the section where stack[stack_top] = neighbor is executed, rather than deferring
the visited marking until pop-time, to prevent duplicate pushes of the same
neighbor.
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 185-208: The issue is that store_stability is being called before
store_observations, which means if observation storage fails after stability is
marked as converged, the next run will skip observation repair permanently. To
fix this, locate all places where store_stability and store_observations are
called together (including the instances at lines 223-229 and 235-237 mentioned
in the comment), and reorder these calls so that store_observations is always
called before store_stability. This ensures that if observation storage fails,
the profile won't yet be marked as converged and can be retried on the next run.
In `@python/nemo_relay/_native.pyi`:
- Around line 1235-1236: The DriftDetector.update method stub currently accepts
Sequence[float] for the centroid parameter, but the native binding requires a
fixed 3-element array. Change the centroid parameter type from Sequence[float]
to a fixed-size type representation (such as a tuple of exactly three floats) to
match the native binding's expectations. Additionally, update the docstring to
explicitly clarify that the centroid must be a 3-dimensional coordinate with
exactly three float values.
---
Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 82-94: In the round-trip test that uses the sample_stability
fixture, add an assertion to verify that loaded_stability.converged equals the
value set in the sample_stability function (which is false). This assertion
should be added alongside the existing assertions for stable_prefix_length and
total_observations to ensure the converged field is properly persisted and
loaded during the serialization round-trip.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 8b2fd9ec-2173-410e-a7fa-73fc991c831b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (66)
.pre-commit-config.yamlATTRIBUTIONS-Rust.mdCargo.tomlcrates/adaptive-topology/Cargo.tomlcrates/adaptive-topology/README.mdcrates/adaptive-topology/src/convergence.rscrates/adaptive-topology/src/drift.rscrates/adaptive-topology/src/geometry.rscrates/adaptive-topology/src/governor.rscrates/adaptive-topology/src/lib.rscrates/adaptive-topology/src/manifold.rscrates/adaptive-topology/src/topology.rscrates/adaptive/Cargo.tomlcrates/adaptive/README.mdcrates/adaptive/benches/convergence_bench.rscrates/adaptive/src/acg/stability.rscrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/config.rscrates/adaptive/src/lib.rscrates/adaptive/src/plugin_component.rscrates/adaptive/src/runtime/features.rscrates/adaptive/src/runtime/validation.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/tests/integration/runtime_integration_tests.rscrates/adaptive/tests/integration/tool_parallelism_plan_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg/economics_internal_tests.rscrates/adaptive/tests/unit/acg/economics_policy_tests.rscrates/adaptive/tests/unit/acg/multi_breakpoint_tests.rscrates/adaptive/tests/unit/acg_component_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/cache_diagnostics_tests.rscrates/adaptive/tests/unit/config_tests.rscrates/adaptive/tests/unit/intercepts_tests.rscrates/adaptive/tests/unit/plugin_component_tests.rscrates/adaptive/tests/unit/runtime_features_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_memory_internal_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/tool_parallelism_learner_tests.rscrates/adaptive/tests/unit/types_tests.rscrates/node/adaptive.d.tscrates/node/adaptive.jscrates/node/tests/adaptive_tests.mjscrates/python/Cargo.tomlcrates/python/src/lib.rscrates/python/src/py_adaptive_topology.rscrates/python/tests/coverage/py_storage_coverage_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/adaptive-hints.mdxdocs/adaptive-plugin/configuration.mdxgo/nemo_relay/adaptive.gogo/nemo_relay/adaptive/adaptive.gogo/nemo_relay/adaptive_test.gopython/nemo_relay/__init__.pypython/nemo_relay/__init__.pyipython/nemo_relay/_native.pyipython/nemo_relay/adaptive.pypython/nemo_relay/adaptive.pyipython/nemo_relay/adaptive_topology.pypython/nemo_relay/adaptive_topology.pyipython/tests/test_adaptive.pypython/tests/test_adaptive_config.pypython/tests/test_adaptive_topology.py
| pub fn new(epsilon: f64, stability_window: usize) -> Self { | ||
| Self { | ||
| betti_history: RingBuffer::new(), | ||
| drift_history: RingBuffer::new(), | ||
| error_history: RingBuffer::new(), | ||
| stability_window: stability_window.max(MIN_STABILITY_WINDOW), | ||
| epsilon: sanitize_positive(epsilon, DEFAULT_EPSILON), |
There was a problem hiding this comment.
Clamp stability_window to ring-buffer capacity.
At Line 155, only a minimum clamp is applied. If callers pass a value larger than MAX_HISTORY (32), is_betti_stable and is_drift_decreasing can never satisfy their length checks, so topology-based convergence becomes unreachable.
Proposed fix
- stability_window: stability_window.max(MIN_STABILITY_WINDOW),
+ stability_window: stability_window.clamp(MIN_STABILITY_WINDOW, MAX_HISTORY),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` around lines 150 - 156, The
stability_window parameter in the new method is only being clamped to a minimum
value using max(MIN_STABILITY_WINDOW), but is not clamped to a maximum value.
Since the RingBuffer instances (betti_history, drift_history, error_history)
have a fixed capacity of MAX_HISTORY (32), if stability_window exceeds this
capacity, methods like is_betti_stable and is_drift_decreasing will never
satisfy their length checks, making topology-based convergence unreachable.
Apply both minimum and maximum clamping to stability_window by using a method
that clamps it between MIN_STABILITY_WINDOW and MAX_HISTORY.
| self.betti_history.push(betti); | ||
| self.drift_history.push(sanitize_non_negative(drift)); | ||
| self.error_history.push(sanitize_non_negative(error)); | ||
| self.epoch += 1; |
There was a problem hiding this comment.
Prevent epoch wraparound in long-running detectors.
Line 166 uses u32 += 1. This can overflow and wrap, which can break downstream epoch-based gating and diagnostics.
Proposed fix
- self.epoch += 1;
+ self.epoch = self.epoch.saturating_add(1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` at line 166, The `epoch` field
is typed as `u32` and the increment operation on line 166 (self.epoch += 1) can
cause integer overflow and wraparound in long-running detectors, breaking
epoch-based gating and diagnostics. Change the type of the `epoch` field from
`u32` to `u64` throughout the convergence module to prevent wraparound and
ensure reliable epoch tracking for the lifetime of the detector.
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | ||
| if self.counts[0] == 0 { | ||
| return 0.0; | ||
| } | ||
| let active = active_mask.iter().filter(|&&x| x).count(); | ||
| 1.0 - (active as f64 / self.counts[0] as f64) |
There was a problem hiding this comment.
Restrict pruning math to populated level-0 blocks.
At Line 256, active is counted across all MAX_BLOCKS. For caller-provided masks, true entries beyond self.counts[0] can produce invalid ratios (including negatives). Count only the tracked prefix.
Proposed fix
- let active = active_mask.iter().filter(|&&x| x).count();
+ let active = active_mask[..self.counts[0]]
+ .iter()
+ .filter(|&&x| x)
+ .count();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | |
| if self.counts[0] == 0 { | |
| return 0.0; | |
| } | |
| let active = active_mask.iter().filter(|&&x| x).count(); | |
| 1.0 - (active as f64 / self.counts[0] as f64) | |
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | |
| if self.counts[0] == 0 { | |
| return 0.0; | |
| } | |
| let active = active_mask[..self.counts[0]] | |
| .iter() | |
| .filter(|&&x| x) | |
| .count(); | |
| 1.0 - (active as f64 / self.counts[0] as f64) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/geometry.rs` around lines 252 - 257, The
pruning_ratio method is counting true values across the entire active_mask
array, but only the first self.counts[0] entries represent actual populated
level-0 blocks. When the mask contains true entries beyond self.counts[0], this
causes the calculation to produce invalid ratios. Restrict the active count
calculation to only the first self.counts[0] elements of active_mask by using
slice notation to iterate over active_mask[..self.counts[0]] instead of the full
active_mask.
| /// Panics if `D * tau` exceeds `EMBED_BUFFER_CAPACITY`. This is a | ||
| /// programming error, not a runtime failure. | ||
| pub fn new(tau: usize) -> Self { | ||
| let tau = if tau == 0 { DEFAULT_TAU } else { tau }; | ||
| assert!( | ||
| D * tau <= EMBED_BUFFER_CAPACITY, | ||
| "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})", | ||
| D * tau, | ||
| EMBED_BUFFER_CAPACITY | ||
| ); |
There was a problem hiding this comment.
Embed readiness threshold is too strict for tau > 1.
Line 121 uses D * tau, but the embedding definition in Line 74 only needs samples through x(t-(D-1)tau), i.e. (D - 1) * tau + 1. This delays valid embeddings and over-restricts constructor validation.
Proposed fix
- assert!(
- D * tau <= EMBED_BUFFER_CAPACITY,
- "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})",
- D * tau,
- EMBED_BUFFER_CAPACITY
- );
+ let required = if D == 0 {
+ 0
+ } else {
+ (D - 1).saturating_mul(tau).saturating_add(1)
+ };
+ assert!(
+ required <= EMBED_BUFFER_CAPACITY,
+ "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})",
+ required,
+ EMBED_BUFFER_CAPACITY
+ );
@@
- let required = D * self.tau;
+ let required = if D == 0 {
+ 0
+ } else {
+ (D - 1).saturating_mul(self.tau).saturating_add(1)
+ };
if self.buffer_len < required {
return None;
}Also applies to: 121-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The
assertion in the `new` method uses `D * tau` as the validation threshold, but
the actual embedding definition only requires `(D - 1) * tau + 1` samples.
Update the assertion logic to use the correct formula `(D - 1) * tau + 1`
instead of `D * tau`, and also apply the same correction to the embed readiness
threshold check around lines 121-123 to ensure consistency and avoid
unnecessarily delaying valid embeddings.
da2470a to
17488c8
Compare
Signed-off-by: teerth sharma <teerths57@gmail.com>
17488c8 to
24ed6f2
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)
243-253: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAssert the new
convergedfield in the stability round-trip test.Line 244 loads the full stability record, but the test never checks
converged. Add an explicit assertion so storage regressions on this field are caught.Suggested diff
assert_eq!(loaded_stability.stable_prefix_length, 1); assert_eq!(loaded_stability.total_observations, 3); + assert!(!loaded_stability.converged);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 243 - 253, The stability round-trip test loads the full stability record into the loaded_stability variable but does not assert the converged field, which could allow storage regressions on this field to go undetected. Add an explicit assertion immediately after the existing stability assertions (after the assertion for loaded_stability.total_observations) to verify the converged field has the expected value, ensuring the field is properly persisted and retrieved during the round-trip.
♻️ Duplicate comments (2)
crates/adaptive-topology/src/manifold.rs (2)
91-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the mathematically correct embedding readiness threshold.
Line 91 and Line 121 use
D * tau, but the embedding in Line 74 ([x(t), x(t-tau), ..., x(t-(D-1)tau)]) only requires(D - 1) * tau + 1samples. Current logic rejects valid embeddings fortau > 1.Proposed fix
pub fn new(tau: usize) -> Self { let tau = if tau == 0 { DEFAULT_TAU } else { tau }; + let required = if D == 0 { + 0 + } else { + (D - 1).saturating_mul(tau).saturating_add(1) + }; assert!( - D * tau <= EMBED_BUFFER_CAPACITY, - "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})", - D * tau, + required <= EMBED_BUFFER_CAPACITY, + "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})", + required, EMBED_BUFFER_CAPACITY ); @@ pub fn embed(&self) -> Option<ManifoldPoint<D>> { - let required = D * self.tau; + let required = if D == 0 { + 0 + } else { + (D - 1).saturating_mul(self.tau).saturating_add(1) + }; if self.buffer_len < required { return None; }Also applies to: 121-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The embedding readiness threshold is mathematically incorrect. The embedding pattern requires (D - 1) * tau + 1 samples, not D * tau. Update the assertion in the new method (where tau is validated) to use (D - 1) * tau + 1 instead of D * tau. Also apply the same fix to line 121-123 which has the same incorrect threshold check. This correction will allow valid embeddings to proceed instead of rejecting them unnecessarily when tau is greater than 1.
225-246:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMark vertices visited at push-time in DFS to avoid duplicate stack entries.
In Line 239-Line 246, neighbors are marked visited only after pop. That allows duplicate pushes for the same vertex, which can consume the fixed stack and skip reachable vertices, producing incorrect
β₀.Proposed fix
components += 1; let mut stack = [0usize; MAX_GRAPH_POINTS]; let mut stack_top = 1; stack[0] = start; + visited[start] = true; while stack_top > 0 { stack_top -= 1; let current = stack[stack_top]; - if visited[current] { - continue; - } - visited[current] = true; - - for (neighbor, is_visited) in visited.iter().enumerate().take(self.point_count) { - if !is_visited - && self.are_neighbors(current, neighbor) - && stack_top < MAX_GRAPH_POINTS - { + for neighbor in 0..self.point_count { + if !visited[neighbor] && self.are_neighbors(current, neighbor) { + visited[neighbor] = true; stack[stack_top] = neighbor; stack_top += 1; } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive-topology/src/manifold.rs` around lines 225 - 246, The DFS traversal marks vertices as visited only after they are popped from the stack (when visited[current] is set to true), which allows the same vertex to be pushed onto the stack multiple times before it gets processed. This causes duplicate stack entries that can overflow the fixed-size stack and skip reachable vertices, resulting in incorrect component counting. Move the visited marking to happen when pushing neighbors onto the stack rather than when popping them. In the neighbor iteration loop where neighbors are added to the stack (around lines 239-246), set visited[neighbor] = true immediately before pushing the neighbor onto the stack to prevent duplicate pushes of the same vertex.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive-topology/README.md`:
- Around line 33-35: The code example block in the README.md file starts
immediately after the "## Example" heading without a complete-sentence
introduction. Add a descriptive sentence between the heading and the opening
triple backticks that explains what the code example demonstrates, following the
documentation style guideline that requires every code block to be introduced
with a complete sentence.
In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 237-239: In the betti_score calculation within the
convergence_score method, the variations count is being normalized by
self.stability_window when it should be normalized by the actual number of
transitions. Since variations are counted from window[..count].windows(2), which
produces count - 1 adjacent pairs, change the denominator in the betti_score
assignment from self.stability_window to (count - 1) to correctly normalize the
variation ratio.
In `@crates/adaptive-topology/src/drift.rs`:
- Around line 40-60: Add a validation check at the beginning of the update
method to ensure all values in the centroid array are finite (not NaN or Inf).
If any non-finite value is detected, return early with a drift value of 0.0 to
prevent non-finite values from being written into the internal state fields
(previous, expected, velocity). This guards against NaN values persisting in the
drift and velocity calculations, which would disable meaningful drift checks.
In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs`:
- Around line 361-362: The current assertion only verifies that the agent hints
header is absent from the request, but it does not verify that the agent hints
data is also absent from the request body. To harden the test, add a negative
assertion that checks the request body does not contain the nvext.agent_hints
field after the existing header check on line 361. This ensures that if a
regression accidentally injects nvext.agent_hints in the body while skipping the
header, the test will catch it and fail as expected.
In `@crates/adaptive/tests/unit/runtime_tests.rs`:
- Around line 322-375: The test
validate_config_reports_invalid_topology_numeric_fields should verify
field-level diagnostics for all invalid fields within components, not just
component-level diagnostics. Currently it checks for component paths like
"adaptive_hints.governor" and "acg.convergence", but it should also assert that
diagnostics exist for specific invalid fields within those components. For the
acg.convergence and convergence components which each have multiple invalid
fields (epsilon and stability_window), add additional assertions to the
report.diagnostics iteration to verify that field-level diagnostics are reported
for paths like "acg.convergence.epsilon", "acg.convergence.stability_window",
"convergence.epsilon", and "convergence.stability_window" to ensure all invalid
numeric fields are being validated and prevent silent validator regressions.
In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The documentation uses inconsistent terminology for the
same concept: "topology-aware" appears in one location while "topology-inspired"
appears in another location (in the line mentioning "topology-inspired
signals"). Standardize on a single term throughout the document by identifying
all instances of both "topology-aware" and "topology-inspired" and replacing
them with one consistent term. Ensure the chosen term is applied uniformly
across the entire file to maintain clarity and avoid ambiguity for readers.
In `@docs/adaptive-plugin/configuration.mdx`:
- Around line 221-233: The Rust example for tool_parallelism configuration uses
ToolParallelismComponentConfig::default() implicitly, while the TOML, Python,
and Node examples all explicitly demonstrate the drift configuration. Update the
line setting adaptive.tool_parallelism to explicitly define the drift field
configuration instead of relying on the default() method, ensuring the Rust
example maintains parity with the other language examples in the documentation.
---
Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 243-253: The stability round-trip test loads the full stability
record into the loaded_stability variable but does not assert the converged
field, which could allow storage regressions on this field to go undetected. Add
an explicit assertion immediately after the existing stability assertions (after
the assertion for loaded_stability.total_observations) to verify the converged
field has the expected value, ensuring the field is properly persisted and
retrieved during the round-trip.
---
Duplicate comments:
In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The embedding readiness threshold is mathematically
incorrect. The embedding pattern requires (D - 1) * tau + 1 samples, not D *
tau. Update the assertion in the new method (where tau is validated) to use (D -
1) * tau + 1 instead of D * tau. Also apply the same fix to line 121-123 which
has the same incorrect threshold check. This correction will allow valid
embeddings to proceed instead of rejecting them unnecessarily when tau is
greater than 1.
- Around line 225-246: The DFS traversal marks vertices as visited only after
they are popped from the stack (when visited[current] is set to true), which
allows the same vertex to be pushed onto the stack multiple times before it gets
processed. This causes duplicate stack entries that can overflow the fixed-size
stack and skip reachable vertices, resulting in incorrect component counting.
Move the visited marking to happen when pushing neighbors onto the stack rather
than when popping them. In the neighbor iteration loop where neighbors are added
to the stack (around lines 239-246), set visited[neighbor] = true immediately
before pushing the neighbor onto the stack to prevent duplicate pushes of the
same vertex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: ccb1a467-92d2-498a-ae76-c078c7204100
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (66)
.pre-commit-config.yamlATTRIBUTIONS-Rust.mdCargo.tomlcrates/adaptive-topology/Cargo.tomlcrates/adaptive-topology/README.mdcrates/adaptive-topology/src/convergence.rscrates/adaptive-topology/src/drift.rscrates/adaptive-topology/src/geometry.rscrates/adaptive-topology/src/governor.rscrates/adaptive-topology/src/lib.rscrates/adaptive-topology/src/manifold.rscrates/adaptive-topology/src/topology.rscrates/adaptive/Cargo.tomlcrates/adaptive/README.mdcrates/adaptive/benches/convergence_bench.rscrates/adaptive/src/acg/stability.rscrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/config.rscrates/adaptive/src/lib.rscrates/adaptive/src/plugin_component.rscrates/adaptive/src/runtime/features.rscrates/adaptive/src/runtime/validation.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/tests/integration/runtime_integration_tests.rscrates/adaptive/tests/integration/tool_parallelism_plan_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg/economics_internal_tests.rscrates/adaptive/tests/unit/acg/economics_policy_tests.rscrates/adaptive/tests/unit/acg/multi_breakpoint_tests.rscrates/adaptive/tests/unit/acg_component_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/cache_diagnostics_tests.rscrates/adaptive/tests/unit/config_tests.rscrates/adaptive/tests/unit/intercepts_tests.rscrates/adaptive/tests/unit/plugin_component_tests.rscrates/adaptive/tests/unit/runtime_features_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_memory_internal_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/tool_parallelism_learner_tests.rscrates/adaptive/tests/unit/types_tests.rscrates/node/adaptive.d.tscrates/node/adaptive.jscrates/node/tests/adaptive_tests.mjscrates/python/Cargo.tomlcrates/python/src/lib.rscrates/python/src/py_adaptive_topology.rscrates/python/tests/coverage/py_storage_coverage_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/adaptive-hints.mdxdocs/adaptive-plugin/configuration.mdxgo/nemo_relay/adaptive.gogo/nemo_relay/adaptive/adaptive.gogo/nemo_relay/adaptive_test.gopython/nemo_relay/__init__.pypython/nemo_relay/__init__.pyipython/nemo_relay/_native.pyipython/nemo_relay/adaptive.pypython/nemo_relay/adaptive.pyipython/nemo_relay/adaptive_topology.pypython/nemo_relay/adaptive_topology.pyipython/tests/test_adaptive.pypython/tests/test_adaptive_config.pypython/tests/test_adaptive_topology.py
| let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count(); | ||
| let betti_score = 1.0 - (variations as f64 / self.stability_window as f64); | ||
| score += BETTI_STABILITY_WEIGHT * betti_score; |
There was a problem hiding this comment.
Normalize Betti variation by transition count, not window length.
Line [238] divides by self.stability_window, but variations are counted across adjacent pairs (count - 1 transitions). This systematically overstates Betti stability in convergence_score.
Suggested fix
let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count();
-let betti_score = 1.0 - (variations as f64 / self.stability_window as f64);
+let transitions = count.saturating_sub(1).max(1);
+let betti_score = 1.0 - (variations as f64 / transitions as f64);
score += BETTI_STABILITY_WEIGHT * betti_score;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` around lines 237 - 239, In the
betti_score calculation within the convergence_score method, the variations
count is being normalized by self.stability_window when it should be normalized
by the actual number of transitions. Since variations are counted from
window[..count].windows(2), which produces count - 1 adjacent pairs, change the
denominator in the betti_score assignment from self.stability_window to (count -
1) to correctly normalize the variation ratio.
| pub fn update(&mut self, centroid: &[f64; D]) -> f64 { | ||
| let drift = if self.has_previous { | ||
| l2_distance(&self.expected, centroid) | ||
| } else { | ||
| 0.0 | ||
| }; | ||
|
|
||
| if self.has_previous { | ||
| for (d, vel) in self.velocity.iter_mut().enumerate().take(D) { | ||
| *vel = centroid[d] - self.previous[d]; | ||
| } | ||
| } | ||
|
|
||
| for (d, exp) in self.expected.iter_mut().enumerate().take(D) { | ||
| *exp = centroid[d] + self.velocity[d]; | ||
| } | ||
|
|
||
| self.previous = *centroid; | ||
| self.has_previous = true; | ||
|
|
||
| drift |
There was a problem hiding this comment.
Guard update against non-finite centroids to prevent persistent NaN state.
At Line [40], non-finite centroid values are accepted and then written into previous/expected/velocity. After that, drift and velocity calculations can stay NaN and effectively disable meaningful drift checks.
Suggested fix
pub fn update(&mut self, centroid: &[f64; D]) -> f64 {
+ if centroid.iter().any(|coord| !coord.is_finite()) {
+ self.reset();
+ return f64::INFINITY;
+ }
+
let drift = if self.has_previous {
l2_distance(&self.expected, centroid)
} else {
0.0
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/drift.rs` around lines 40 - 60, Add a validation
check at the beginning of the update method to ensure all values in the centroid
array are finite (not NaN or Inf). If any non-finite value is detected, return
early with a drift value of 0.0 to prevent non-finite values from being written
into the internal state fields (previous, expected, velocity). This guards
against NaN values persisting in the drift and velocity calculations, which
would disable meaningful drift checks.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Teerth Sharma <teerths57@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The file docs/adaptive-plugin/about.mdx is missing the
required SPDX license header at the top. Add the SPDX header using JSX comment
delimiters {/* ... */} at the very beginning of the file before the YAML
frontmatter. The header should include the SPDX-FileCopyrightText line with the
copyright notice and the SPDX-License-Identifier line set to Apache-2.0, wrapped
in the JSX comment syntax as specified in the review comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 289cbccd-452e-4a63-9da2-9d593edfa1a4
📒 Files selected for processing (1)
docs/adaptive-plugin/about.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (12)
{docs/**,README.md,CONTRIBUTING.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Usejust docsfor docs-site builds andjust docs-linkcheckwhen links changed
Run docs site build withjust docs
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Run docs link validation with
just docs-linkcheckwhen links change
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify README and docs entry points still match current package names and paths for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,examples/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify examples still run with documented commands for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Keep package names, repo references, and build commands current
Files:
docs/adaptive-plugin/about.mdx
**/*.mdx
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.
MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)
Files:
docs/adaptive-plugin/about.mdx
**/*.{html,md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Include SPDX license header in HTML and Markdown files using HTML comment syntax
Files:
docs/adaptive-plugin/about.mdx
docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed
Files:
docs/adaptive-plugin/about.mdx
docs/**
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Run
just docsor./scripts/build-docs.sh htmlto regenerate ignored Fern API reference pages before validation for documentation site changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}
⚙️ CodeRabbit configuration file
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.
Files:
docs/adaptive-plugin/about.mdx
**
⚙️ CodeRabbit configuration file
**:AGENTS.md
This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.
Project Overview
NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.
The shared runtime model is:
- Scope stacks decide where work belongs and which scope-local behavior is visible.
- Middleware registries decide what guardrails and intercepts run around managed calls.
- Plugins install reusable runtime behavior from configuration.
- Events record runtime behavior in ATOF form.
- Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.
Repository Structure
The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.crates/ core/ # Rust core runtime crate, published as nemo-relay adaptive/ # Adaptive runtime primitives and plugin components python/ # PyO3 native extension for the Python package ffi/ # Raw C ABI layer used by downstream bindings such as Go node/ # NAPI Node.js binding and JavaScript/TypeScript entry points wasm/ # wasm-bindgen WebAssembly binding and JS wrappers python/ nemo_relay/ # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers tests/ # Python tests go/ nemo_relay/ # Experimental Go CGo binding and tests fern/ # Fern documentation site scripts/ # Stable wrappers and helper scripts; build/test/docs entry points live in justfile third_party/ # P...
Files:
docs/adaptive-plugin/about.mdx
🔇 Additional comments (2)
docs/adaptive-plugin/about.mdx (2)
39-40: Terminology consistency already addressed.The past review flagged inconsistency between "topology-aware" and "topology-inspired" at line 39. The current code correctly uses "topology-aware signals stabilize," which aligns with terminology used earlier in the document and throughout the adaptive plugin docs. This concern has been resolved.
39-40: New documentation bullets are technically accurate and well-motivated.
- Line 39: "Stop adaptive learning once observed topology-aware signals stabilize" directly reflects convergence detection via Betti-number stability (convergence detector in PR).
- Line 40: "Invalidate stale tool parallelism plans when cohort shape drifts" aligns with drift-based plan invalidation behavior implemented in tool-parallelism learner.
Both bullets are clear, actionable, and grounded in the topology-aware adaptive feature set.
willkill07
left a comment
There was a problem hiding this comment.
Without a design document and clear measurable and observable improvements, I am extremely reluctant to spend the time to review this PR.
Additionally:
- no tests should be embedded in source files
- no new crates should be added if the behavior is internal only (which it sounds like adaptive-topology is
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
|
Addressed the review concerns in
Validation run locally:
|
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/src/tool_parallelism_learner.rs (1)
73-75:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDrift invalidation should use an inclusive threshold.
At Line 74, using
>misses invalidation when drift is exactly at threshold.Suggested fix
- Ok(drift > config.threshold) + Ok(drift >= config.threshold)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/tool_parallelism_learner.rs` around lines 73 - 75, In the return statement within the method that updates drift using detector.update(¢roid), change the comparison operator from > to >= when comparing drift against config.threshold. This ensures that invalidation occurs when drift equals the threshold value exactly, not just when it exceeds it. Update the condition Ok(drift > config.threshold) to use the inclusive >= operator instead.
♻️ Duplicate comments (1)
crates/adaptive/src/acg_learner.rs (1)
186-206:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersisting stability before observations can strand profiles in a false converged state.
At Line 227, converged stability is stored before observations (Line 235). If observation storage fails, the next run can hit the converged fast-path at Line 186 and skip repair permanently.
Suggested fix
- if existing_stability + if existing_stability .as_ref() .map(|stability| stability.converged) .unwrap_or(false) + && existing.is_some() { @@ - if self.record_stability_epoch(&profile_key, &stability_result)? { - stability_result.converged = true; - } - - backend - .store_stability(&profile_key, &stability_result) - .await?; + let converged_now = self.record_stability_epoch(&profile_key, &stability_result)?; @@ backend .store_observations(&profile_key, &observations_vec) .await?; + + if converged_now { + stability_result.converged = true; + } + backend + .store_stability(&profile_key, &stability_result) + .await?;Also applies to: 226-236
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/acg_learner.rs` around lines 186 - 206, The issue is that stability state is being persisted before observations are persisted, creating a race condition where if observation persistence fails, the converged flag in stability remains true, causing the fast-path at the beginning of the function (checking if existing_stability is converged) to incorrectly skip repair on the next run. Fix this by reversing the order of persistence operations so that observations are stored first before stability is persisted, ensuring that if observation storage fails, the converged state remains false and repair will be attempted on the next execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive/src/topology.rs`:
- Around line 135-147: The convergence check in the is_converged method
currently uses a short-circuit OR operator that can declare convergence based on
insufficient evidence, and the is_error_converged method only checks if the last
single error value is below epsilon rather than verifying stability over a
configured window. To fix this, modify is_error_converged to check that the last
N error samples in error_history are all consistently below epsilon threshold to
guarantee convergence over a window period, rather than checking only the latest
point. Additionally, change the logic operator in is_converged from OR to AND to
ensure that error convergence, betti stability, and drift decrease all must be
satisfied together before declaring the topology converged.
In `@crates/adaptive/tests/unit/topology_tests.rs`:
- Around line 27-33: The test function error_below_epsilon_converges expects
convergence after only a single low-error epoch, which may not align with the
intended window-based stability check. If the ConvergenceDetector requires a
stability window (indicated by the 3 parameter in ConvergenceDetector::new(0.01,
3)), then the test needs to record multiple consecutive epochs with low error
values before asserting is_converged() returns true. Either call record_epoch
multiple times with the same low-error pattern to fill the window, or change the
assertion to expect false convergence after a single epoch to properly test
window-based behavior.
---
Outside diff comments:
In `@crates/adaptive/src/tool_parallelism_learner.rs`:
- Around line 73-75: In the return statement within the method that updates
drift using detector.update(¢roid), change the comparison operator from > to
>= when comparing drift against config.threshold. This ensures that invalidation
occurs when drift equals the threshold value exactly, not just when it exceeds
it. Update the condition Ok(drift > config.threshold) to use the inclusive >=
operator instead.
---
Duplicate comments:
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 186-206: The issue is that stability state is being persisted
before observations are persisted, creating a race condition where if
observation persistence fails, the converged flag in stability remains true,
causing the fast-path at the beginning of the function (checking if
existing_stability is converged) to incorrectly skip repair on the next run. Fix
this by reversing the order of persistence operations so that observations are
stored first before stability is persisted, ensuring that if observation storage
fails, the converged state remains false and repair will be attempted on the
next execution.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 2c6e25e2-9b5b-49f7-8fac-0fcddc8494b7
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
crates/adaptive/Cargo.tomlcrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/lib.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/topology.rscrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdx
💤 Files with no reviewable changes (1)
- crates/adaptive/Cargo.toml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (26)
{docs/**,README.md,CONTRIBUTING.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Usejust docsfor docs-site builds andjust docs-linkcheckwhen links changed
Run docs site build withjust docs
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Run docs link validation with
just docs-linkcheckwhen links change
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify README and docs entry points still match current package names and paths for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,examples/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify examples still run with documented commands for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Keep package names, repo references, and build commands current
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.mdx
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.
MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.{html,md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Include SPDX license header in HTML and Markdown files using HTML comment syntax
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
docs/**
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Run
just docsor./scripts/build-docs.sh htmlto regenerate ignored Fern API reference pages before validation for documentation site changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}
⚙️ CodeRabbit configuration file
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**
⚙️ CodeRabbit configuration file
**:AGENTS.md
This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.
Project Overview
NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.
The shared runtime model is:
- Scope stacks decide where work belongs and which scope-local behavior is visible.
- Middleware registries decide what guardrails and intercepts run around managed calls.
- Plugins install reusable runtime behavior from configuration.
- Events record runtime behavior in ATOF form.
- Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.
Repository Structure
The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.crates/ core/ # Rust core runtime crate, published as nemo-relay adaptive/ # Adaptive runtime primitives and plugin components python/ # PyO3 native extension for the Python package ffi/ # Raw C ABI layer used by downstream bindings such as Go node/ # NAPI Node.js binding and JavaScript/TypeScript entry points wasm/ # wasm-bindgen WebAssembly binding and JS wrappers python/ nemo_relay/ # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers tests/ # Python tests go/ nemo_relay/ # Experimental Go CGo binding and tests fern/ # Fern documentation site scripts/ # Stable wrappers and helper scripts; build/test/docs entry points live in justfile third_party/ # P...
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxcrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdxcrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.rs
📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)
Use
snake_casenaming convention for Rust identifiers (e.g.,nemo_relay_tool_call)
**/*.rs: Any Rust change must runjust test-rust
Any Rust change must runcargo fmt --all
Any Rust change must runcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allfor all FFI work since it is Rust work
Runjust test-rustto validate FFI changes
Runcargo clippy --workspace --all-targets -- -D warningsto enforce strict linting on FFI workWhen Rust files changed as part of Go work, also run
cargo fmt --all,just test-rust, andcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allwhen Rust files are changed as part of Node work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files are changed as part of Node work
Runjust test-rustwhen Rust files are changed as part of Node work
**/*.rs: Runcargo fmt --allto format all Rust code
Runcargo clippy --workspace --all-targets -- -D warningsto enforce all clippy lints as errors
**/*.rs: Runcargo fmt --allwhen Rust files changed as part of WebAssembly work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files changed as part of WebAssembly work
**/*.rs: If any Rust code changed, always runjust test-rust
If any Rust code changed, also runcargo fmt --all
If any Rust code changed, also runcargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting withcargo fmt --all
Run Rust linting withcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Usecargo fmtfor Rust code formatting
Runcargo clippy -- -D warningsto lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code withuv run pre-commit run --all-filesto enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/adaptive.py,python/nemo_relay/plugin.py,go/nemo_relay/adaptive/**,go/nemo_relay/!(adaptive)/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Keep adaptive surface in sync across crates/adaptive, shared plugin behavior in core and bindings, Python adaptive/plugin wrappers in python/nemo_relay/adaptive.py and python/nemo_relay/plugin.py, Go adaptive helpers under go/nemo_relay/adaptive plus shared plugin helpers in go/nemo_relay, and Node/WebAssembly adaptive helpers and plugin wrappers
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}: Maintain consistent plugin lifecycle across all language bindings (Python, Go, Node/WebAssembly, and Rust)
Keep plugin context surfaces aligned across all language implementations
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Maintain documented and tested validation and report behavior for adaptive surfaces
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/{Cargo.toml,**/*.rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Maintain consistency between Rust package names in
Cargo.tomland their actual usage across the codebase
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{h,hpp,c,cpp,rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Ensure FFI header and library naming follows consistent conventions across platform-specific builds
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/core,crates/adaptive}/**/*
📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)
Changes to
crates/coreorcrates/adaptivemust run the full language matrix
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,toml}
📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)
Update Rust crate names and module prefixes during coordinated rename operations
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
If
crates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}
📄 CodeRabbit inference engine (AGENTS.md)
Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,py,go,js,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Follow binding naming conventions: Rust and Python use
snake_case, C FFI exports prefixednemo_relay_, Go usesPascalCasefor public APIs, Node.js usescamelCase.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
UseJson = serde_json::Valuein Rust-facing runtime APIs for JSON payload handling.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**/*.rs
⚙️ CodeRabbit configuration file
crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}
⚙️ CodeRabbit configuration file
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.
Files:
crates/adaptive/tests/unit/topology_tests.rs
🔇 Additional comments (7)
docs/adaptive-plugin/topology-aware-controls-design.mdx (1)
1-83: LGTM!docs/adaptive-plugin/configuration.mdx (2)
202-235: Rust example now properly demonstrates topology-aware configuration.The Rust example imports are extended with
ConvergenceConfigandDriftConfig(line 205), and the initialization now explicitly setstool_parallelism.drift,adaptive_hints.governor,acg.convergence, and top-levelconvergencefields (lines 221-235). This matches the Python and Node.js examples' completeness and addresses the prior guidance about explicit cross-language parity.
249-268: TOML example already includes all topology-aware control blocks with correct defaults and cross-language parity.The TOML configuration snippet (lines 74-110) contains all three topology-aware control sections documented in the field table:
[components.config.tool_parallelism.drift],[components.config.adaptive_hints.governor], and[components.config.acg.convergence]. Field names and default values align with the Python, Node.js, and Rust examples. No changes needed.docs/adaptive-plugin/acg.mdx (1)
224-238: No changes needed; convergence validation rules are correctly documented in the "Common Validation Failures" section.docs/adaptive-plugin/about.mdx (1)
1-64: SPDX header is correctly formatted.The MDX file includes the required SPDX header with proper JSX comment delimiters
{/* ... */}after the YAML frontmatter. No action needed.crates/adaptive/src/adaptive_hints_intercept.rs (1)
137-150: LGTM!Also applies to: 183-196, 223-227, 248-258
crates/adaptive/src/lib.rs (1)
34-34: Please confirm the required Rust validation commands were run for this change set.I only see explicit test counts in the PR context; please confirm
just test-rust,cargo fmt --all, andcargo clippy --workspace --all-targets -- -D warningswere executed successfully.As per coding guidelines, "If any Rust code changed, always run
just test-rust; also runcargo fmt --all; also runcargo clippy --workspace --all-targets -- -D warnings."Source: Coding guidelines
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
|
@willkill07 @lvojtku ready for re-review when you have bandwidth. The requested changes have been addressed on the current head
Validation for the latest docs move: |
willkill07
left a comment
There was a problem hiding this comment.
Additionally, the design doc isn't something that should be published as documentation.
I'm still trying to understand the benefit besides "just because we can"
There was a problem hiding this comment.
This is not a design document showing benefits. You state measurable Outcomes but do not provide any samples to demonstrate usefulness.
There was a problem hiding this comment.
Addressed in 012a96643d566113e7d53c00505638c611ea07e6 by adding a Sample Evidence section to this design doc.
It now includes concrete samples for each control:
- ACG convergence: benchmark fixture with
observations-to-decision: without=50, with=3 - tool drift: sample cohort centroid movement from
[1, 2, 0.0, 2]to[2, 3, 0.4, 3], showing stale plan invalidation - hint governor: sample learned hint shed at
latency_sensitivity = 2.0underepsilon = 10.0, while manual override at11.0still emits hints
I also ran cd fern && npx fern check --warnings; it passes with only the expected unauthenticated redirects warning.
There was a problem hiding this comment.
Addressed in 131ffbc60948a0bcc2d8b8d813c8283f7eeb48d6.
Changes made:
- moved the design note out of the published Fern docs tree to
.github/designs/topology-aware-adaptive-controls.md - removed the public docs link/page from
docs/adaptive-plugin/about.mdxanddocs/adaptive-plugin/configuration.mdx - rewrote the design note around benefit gates rather than capability: each control now has a concrete gate for when it should stay disabled versus when it is useful
- kept the sample evidence in the internal note, including the benchmark sample
observations-to-decision: without=50, with=3
Docs validation still passes with cd fern && npx fern check --warnings.
There was a problem hiding this comment.
On it will give measure samples as many i can on my local device
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: teerth sharma <teerths57@gmail.com>
But were these actual agents running or just synthetic? Synthetic benchmarks don't mean anything (to me) here. We care about realistic agent behavior and traces with multiple turns, drift, etc. Of course if your data has duplicate requests you will be able to optimize. That's not what I'm asking for here. |
Testing with gemini Right Now I understand the goal much better now |
|
Gemini is not supported with NeMo Relay at this point in time, so I'm not sure how you expect to do validation. Supported Harnesses:
|
Using Gemini api externally to create prompts (will help with accutal usage )and then feed it into the relay with open code is my plan thanks for the input |
|
@teerthsharma Some record of experiment setup seems crucial to include here so I can validate/understand what is being covered. |
Will be included with the tests results shall I include a csv of logs as well? (Testing time remains testing atleast 1000 cycle) |
|
I would like to share the results from a recent local 100-prompt agent trace run validating our ACG convergence behavior. For this test, I ran a sandbox replay where Gemma/LiteRT generated 100 workload prompts across 10 prompt families. The prompt set was executed twice through NeMo Relay Agent scopes using
Run Results
Key Takeaways
|
The p95 latency is included for no-degradation visibility only; the run used local/live opencode behavior, so latency is noisy. |
Because the baseline doesn't do any convergence...
A 10 second delta feels like more than just noise to me. It shows that across 100 LLM calls, the average added latency was 100ms. This seems unacceptable? Also, what does Pass/Fail mean? Nothing should fail here as there is nothing fallible. |
Yes N/A , p95 latency is under investigation as for pass or fail that was a way to show no degradation in pipeline and newer results will show different wording scheme for such |
I'm confused by this. How could the baseline show degradation? What is your measurement of degradation? |
Pass/fail means external opencode subprocess completion, not Relay correctness, so I will remove or relabel that column. |
|
ACG learner stability cache PR Under the 100‑profile repeated workload, pool utilisation peaked at ~60%, so there’s headroom for larger clusters. No deadlocks or hung workers were observed across all 200 runs. +0.526 ms delta with this which seems acceptable it includes the pool dispatch, cache lookup, and short‑circuit logic. For converged profiles, the savings are much larger: we skip reading the full observation window (which can be thousands of records) and skip the rewrite step entirely. The +0.5 ms is only paid once per profile at startup; subsequent convergence checks are near‑zero. But The stability cache TTL is currently tied to stability_window (5). If a profile converges and then the underlying data distribution shifts before the window expires, we might reuse stale stability. I’ve added a conservative guard: if the profile’s last observed timestamp is older than 2 * stability_window, we treat it as stale and fall through to the repair path. That logic is already in the PR. |
|
@teerthsharma I need some more information here. Was each run multiple LLM calls? did you measure the entire E2E time of each run or just look at when the workers completed? Where are the actual results? At this point, I am leaning towards this not being substantial enough of an improvement to warrant the additional 5k line maintenance cost. From what you've disclosed so far, there does not appear to be a notable improvement compared to the existing behavior. |
|
I have results for each run i can investigate for any specific thing if required but |
|
Even I relax and don't care about any line count changes consider the following: An external contributor introduces new code which doesn't show clear benefit. |
Right sir , I should redesign this , find some accutally valuable numbers, thanks for the guidance making it a draft for now . Thanks for the time |
Signed-off-by: teerth <teerths57@gmail.com>
|
I updated the topology-aware ACG work to focus on reusable agent structure, not exact repeated prompts. use case: a moderation or triage agent often runs the same workflow over changing inputs. For example, every forum-post review may follow the same scaffold: check whether moderator attention is needed, look for policy/quality signals, explain why, and suggest the next action. The post content changes each time, but the review workflow stays stable. The old convergence logic treated that changing post content as instability. The new behavior lets ACG recognize that the reusable workflow is stable while the individual work item varies. Technical changes:
Local sandbox evidence:
Important limitation: I am not claiming end-to-end opencode latency improved. That run was contaminated by ordering/time decay and a Windows critical-battery sleep event. If you want latency evidence, I can run a cleaner approved follow-up: either a dry-run opencode test to isolate Relay/ACG overhead, or an interleaved A/B run so baseline and topology execute task-by-task under the same conditions. |
|
Treating stable prompt prefix / reusable agent structure as the convergence target is much more defensible than requiring exact prompt repetition, and the explicit reopen-on-topology-change behavior addresses the main safety concern with over-sticky convergence. Also appreciate the correction on latency: not claiming an e2e win from a contaminated run is the right call. If you do a follow-up measurement, the stronger one would be the interleaved A/B run, not the dry-run:
Resuming automated review now. ✏️ Learnings added
✅ Action performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs (1)
386-389: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAssert header injection on the manual-override branch.
This branch only verifies the body payload.
inject_agent_hintsalso writesAGENT_HINTS_HEADER_KEY, so a header-only regression would still pass here.Proposed test hardening
assert_eq!( manual_request.content["nvext"]["agent_hints"]["latency_sensitivity"], serde_json::json!(11.0) ); + assert!(manual_request.headers.get(AGENT_HINTS_HEADER_KEY).is_some());As per path instructions, "Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs` around lines 386 - 389, The manual-override test only checks the request body and misses the header behavior added by inject_agent_hints, so it can still pass if AGENT_HINTS_HEADER_KEY is not written. Update this test to also assert the expected header is present on the manual_request, using AGENT_HINTS_HEADER_KEY and the same manual-override setup, so the branch verifies both body and header injection.Source: Path instructions
crates/adaptive/src/acg_learner.rs (1)
224-285: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winBest-profile selection is order-dependent when cached and non-cached profiles mix.
The cached branch compares against
best_aggregate_stability, but the non-cached branch compares againstbest_profile_seed. If a stronger cached profile is seen first, the first non-cached profile always overwrites it becausebest_profile_seedis stillNone. Sincegrouped_observationsis aHashMap, the aggregate winner becomes nondeterministic and can regress the agent-level stability cached later in this method.Suggested fix
- let replace_best = best_profile_seed - .as_ref() - .map(|(_, current)| { + let replace_best = best_aggregate_stability + .as_ref() + .map(|current| { ( stability_result.stable_prefix_length, stability_result.total_observations, ) > (current.stable_prefix_length, current.total_observations) }) .unwrap_or(true); if replace_best { best_profile_seed = Some((observations_vec.clone(), stability_result.clone())); best_aggregate_stability = Some(stability_result.clone()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/acg_learner.rs` around lines 224 - 285, The best-profile selection logic in ACG learner is split between the cached path and the non-cached path, which makes the aggregate winner depend on iteration order. Update the selection in acg_learner.rs so both branches compare against the same shared “best” state instead of using best_aggregate_stability for cached profiles and best_profile_seed for fresh profiles; use the existing symbols best_profile_seed, best_aggregate_stability, and stability_result to centralize the comparison. Ensure a cached profile cannot be overwritten just because the first non-cached profile is encountered, and keep the winner deterministic across HashMap iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/designs/topology-aware-adaptive-controls.md:
- Around line 93-118: The Mermaid diagram under the Architecture Flow heading
starts without an introductory sentence, so add a short complete lead-in
sentence immediately before the fenced mermaid block in the
topology-aware-adaptive-controls.md section. Keep the existing diagram content
intact and ensure the new sentence clearly introduces the flowchart, matching
the documentation style used elsewhere in the document.
In `@ATTRIBUTIONS-Rust.md`:
- Around line 36414-36418: Fix the markdownlint regressions in the attribution
entries by updating the affected blocks around the strum entry and the other
referenced attribution block. Add the required blank lines around the heading
and the fenced code block, and change the fence to use a text info string so the
content satisfies MD022, MD031, and MD040. Use the existing attribution section
structure to keep the formatting consistent across these entries.
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 97-114: The drift calculation in the ACG learner is currently a
binary flag based only on whether stable_prefix_length is zero, which can
trigger convergence too early. Update the logic in acg_learner.rs around the
drift/stable_prefix_len computation to derive drift from the unstable fraction
of the profile instead of a 0.0/1.0 switch, using the available stability fields
in the learner flow so the drift gate reflects how much of the suffix remains
unstable.
In `@crates/adaptive/tests/unit/acg_learner_tests.rs`:
- Around line 445-480: The empty-observation repair test in acg_learner_tests.rs
only checks that a repaired observation is written back, but it should also
verify the stale converged state is cleared. Update
acg_learner_repairs_converged_stability_with_empty_observations to assert the
observation stored by SeedObservationBackend after AcgLearner::process_run no
longer has converged set to true, matching the behavior already expected by the
neighboring missing-observation repair case.
---
Outside diff comments:
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 224-285: The best-profile selection logic in ACG learner is split
between the cached path and the non-cached path, which makes the aggregate
winner depend on iteration order. Update the selection in acg_learner.rs so both
branches compare against the same shared “best” state instead of using
best_aggregate_stability for cached profiles and best_profile_seed for fresh
profiles; use the existing symbols best_profile_seed, best_aggregate_stability,
and stability_result to centralize the comparison. Ensure a cached profile
cannot be overwritten just because the first non-cached profile is encountered,
and keep the winner deterministic across HashMap iteration order.
In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs`:
- Around line 386-389: The manual-override test only checks the request body and
misses the header behavior added by inject_agent_hints, so it can still pass if
AGENT_HINTS_HEADER_KEY is not written. Update this test to also assert the
expected header is present on the manual_request, using AGENT_HINTS_HEADER_KEY
and the same manual-override setup, so the branch verifies both body and header
injection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: bcb7c090-5d4f-4500-8584-8801f2beb8ff
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.github/designs/topology-aware-adaptive-controls.mdATTRIBUTIONS-Rust.mdcrates/adaptive/src/acg_learner.rscrates/adaptive/src/topology.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/configuration.mdx
💤 Files with no reviewable changes (2)
- docs/adaptive-plugin/about.mdx
- docs/adaptive-plugin/configuration.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (27)
**/*.rs
📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)
Use
snake_casenaming convention for Rust identifiers (e.g.,nemo_relay_tool_call)
**/*.rs: Any Rust change must runjust test-rust
Any Rust change must runcargo fmt --all
Any Rust change must runcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allfor all FFI work since it is Rust work
Runjust test-rustto validate FFI changes
Runcargo clippy --workspace --all-targets -- -D warningsto enforce strict linting on FFI workWhen Rust files changed as part of Go work, also run
cargo fmt --all,just test-rust, andcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allwhen Rust files are changed as part of Node work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files are changed as part of Node work
Runjust test-rustwhen Rust files are changed as part of Node work
**/*.rs: Runcargo fmt --allto format all Rust code
Runcargo clippy --workspace --all-targets -- -D warningsto enforce all clippy lints as errors
**/*.rs: Runcargo fmt --allwhen Rust files changed as part of WebAssembly work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files changed as part of WebAssembly work
**/*.rs: If any Rust code changed, always runjust test-rust
If any Rust code changed, also runcargo fmt --all
If any Rust code changed, also runcargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting withcargo fmt --all
Run Rust linting withcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Usecargo fmtfor Rust code formatting
Runcargo clippy -- -D warningsto lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code withuv run pre-commit run --all-filesto enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/adaptive.py,python/nemo_relay/plugin.py,go/nemo_relay/adaptive/**,go/nemo_relay/!(adaptive)/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Keep adaptive surface in sync across crates/adaptive, shared plugin behavior in core and bindings, Python adaptive/plugin wrappers in python/nemo_relay/adaptive.py and python/nemo_relay/plugin.py, Go adaptive helpers under go/nemo_relay/adaptive plus shared plugin helpers in go/nemo_relay, and Node/WebAssembly adaptive helpers and plugin wrappers
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}: Maintain consistent plugin lifecycle across all language bindings (Python, Go, Node/WebAssembly, and Rust)
Keep plugin context surfaces aligned across all language implementations
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Maintain documented and tested validation and report behavior for adaptive surfaces
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
**/{Cargo.toml,**/*.rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Maintain consistency between Rust package names in
Cargo.tomland their actual usage across the codebase
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
**/*.{h,hpp,c,cpp,rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Ensure FFI header and library naming follows consistent conventions across platform-specific builds
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
{crates/core,crates/adaptive}/**/*
📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)
Changes to
crates/coreorcrates/adaptivemust run the full language matrix
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,toml}
📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)
Update Rust crate names and module prefixes during coordinated rename operations
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
If
crates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}
📄 CodeRabbit inference engine (AGENTS.md)
Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rsATTRIBUTIONS-Rust.md
**/*.{rs,py,go,js,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Follow binding naming conventions: Rust and Python use
snake_case, C FFI exports prefixednemo_relay_, Go usesPascalCasefor public APIs, Node.js usescamelCase.
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
crates/**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
UseJson = serde_json::Valuein Rust-facing runtime APIs for JSON payload handling.
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
**
⚙️ CodeRabbit configuration file
**:AGENTS.md
This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.
Project Overview
NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.
The shared runtime model is:
- Scope stacks decide where work belongs and which scope-local behavior is visible.
- Middleware registries decide what guardrails and intercepts run around managed calls.
- Plugins install reusable runtime behavior from configuration.
- Events record runtime behavior in ATOF form.
- Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.
Repository Structure
The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.crates/ core/ # Rust core runtime crate, published as nemo-relay adaptive/ # Adaptive runtime primitives and plugin components python/ # PyO3 native extension for the Python package ffi/ # Raw C ABI layer used by downstream bindings such as Go node/ # NAPI Node.js binding and JavaScript/TypeScript entry points wasm/ # wasm-bindgen WebAssembly binding and JS wrappers python/ nemo_relay/ # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers tests/ # Python tests go/ nemo_relay/ # Experimental Go CGo binding and tests fern/ # Fern documentation site scripts/ # Stable wrappers and helper scripts; build/test/docs entry points live in justfile third_party/ # P...
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rsATTRIBUTIONS-Rust.md
crates/{core,adaptive}/**/*.rs
⚙️ CodeRabbit configuration file
crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/acg_learner.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}
⚙️ CodeRabbit configuration file
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.
Files:
crates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/topology_tests.rs
{.github/**,.gitlab-ci.yml,.pre-commit-config.yaml,justfile,scripts/**}
⚙️ CodeRabbit configuration file
{.github/**,.gitlab-ci.yml,.pre-commit-config.yaml,justfile,scripts/**}: Review automation changes for reproducibility, pinned versions where appropriate, secret handling, and consistency with the documented validation matrix.
Pay attention to commands that need generated native artifacts, FFI libraries, or platform-specific environment variables.
Files:
.github/designs/topology-aware-adaptive-controls.md
**/*.{md,rst,html,txt}
📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)
**/*.{md,rst,html,txt}: Always spellNVIDIAin all caps. Do not useNvidia,nvidia,nVidia,nVIDIA, orNV.
Usean NVIDIAbefore a noun because the name starts with an 'en' sound.
Do not add a registered trademark symbol afterNVIDIAwhen referring to the company.
Use trademark symbols with product names only when the document type or legal guidance requires them.
Verify official capitalization, spacing, and hyphenation for product names.
Precede NVIDIA product names withNVIDIAon first mention when it is natural and accurate.
Do not rewrite product names for grammar or title-case rules.
Preserve third-party product names according to the owner's spelling.
Include the company name and full model qualifier on first use when it helps identify the model.
Preserve the official capitalization and punctuation of model names.
Use shorter family names only after the full name is established.
Spell out a term on first use and put the acronym in parentheses unless the acronym is widely understood by the intended audience.
Use the acronym on later mentions after it has been defined.
For long documents, reintroduce the full term if readers might lose context.
Form plurals of acronyms withs, not an apostrophe, such asGPUs.
In headings, common acronyms can remain abbreviated. Spell out the term in the first or second sentence of the body.
Common terms such asCPU,GPU,PC,API, andUIusually do not need to be spelled out for developer audiences.
Files:
ATTRIBUTIONS-Rust.md
**/*.{md,rst,html}
📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)
Link the first mention of a product name when the destination helps the reader.
Files:
ATTRIBUTIONS-Rust.md
**/*.md
📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)
Documentation must be updated if activation or usage changed
**/*.md: Use title case consistently in technical documentation headings
Avoid quotation marks, ampersands, and exclamation marks in headings
Keep product, event, research, and whitepaper names in their official title case
Use title case for table headers
Do not force social-media sentence case into technical docs
Format code elements, commands, parameters, package names, and expressions in monospace
Format directories, file names, and paths in monospace using backticks
Use angle brackets inside monospace for variables inside paths, such as/home/<username>/.login
Format error messages and strings in quotation marks, keeping literal code strings in code formatting when clearer
Format UI buttons, menus, fields, and labels in bold
Use angle brackets between UI labels for menu paths, such as File > Save As
Use italics for new terms on first use, sparingly and only when introducing the term
Use italics for publication titles
Format keyboard shortcuts in plain text, such as Press Ctrl+Alt+Delete
Use owner/repo link text for GitHub repositories, preferring[NVIDIA/NeMo](link)over prose references like 'the GitHub repo'
Introduce every code block with a complete sentence
Do not make a code block complete the grammar of the previous sentence
Do not continue a sentence after a code block
Use syntax highlighting when the format supports it for code blocks
Avoid the word 'snippet' unless the surrounding docs already use it as a term of art
Keep inline method, function, and class references consistent with nearby docs, omitting empty parentheses for prose readability when no call is shown
Use descriptive anchor text that matches the destination title when possible for links
Avoid raw URLs in running text
Avoid generic anchor text such as 'here,' 'this page,' and 'read more'
Include acronyms in link text when a linked term includes an acronym
Do not link long sentences or multiple sentences
Avoid links ...
Files:
ATTRIBUTIONS-Rust.md
**/{docs,examples,**/*.md,*.patch,*.diff,.github,*.sh,*.yaml,*.yml}
📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)
Update documentation, examples, CI configuration, and patch artifacts when performing rename operations
Files:
ATTRIBUTIONS-Rust.md
**/*.{md,rst,txt}
📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)
Spell
NVIDIAin all caps. Do not useNvidia,nvidia, orNV.
Files:
ATTRIBUTIONS-Rust.md
**/*.{md,rst}
📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)
**/*.{md,rst}: Format commands, code elements, expressions, package names, file names, and paths as inline code.
Use descriptive link text. Avoid raw URLs and weak anchors such as "here" or "read more."
Use title case consistently for technical documentation headings.
Introduce code blocks, lists, tables, and images with complete sentences.
Write procedures as imperative steps. Keep steps parallel and split long procedures into smaller tasks.
Prefer active voice, present tense, short sentences, contractions, and plain English.
Usecanfor possibility and reservemayfor permission.
Useafterfor temporal relationships instead ofonce.
Preferrefer tooverseewhen the wording points readers to another resource.
Avoid culture-specific idioms, unnecessary Latinisms, jokes, and marketing exaggeration in technical docs.
Spell out months in body text, avoid ordinal dates, and use clear time zones.
Spell out whole numbers from zero through nine unless they are technical values, parameters, versions, or UI values.
Use numerals for 10 or greater and include commas in thousands.
Do not add trademark symbols to learning-oriented docs unless the source, platform, or legal guidance explicitly requires them.
Files:
ATTRIBUTIONS-Rust.md
{docs/**,README.md,CONTRIBUTING.md,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Run docs link validation with
just docs-linkcheckwhen links change
Files:
ATTRIBUTIONS-Rust.md
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes
Files:
ATTRIBUTIONS-Rust.md
**/*.{md,mdx,py,sh,yaml,yml,toml,json}
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Keep package names, repo references, and build commands current
Files:
ATTRIBUTIONS-Rust.md
**/*.{html,md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Include SPDX license header in HTML and Markdown files using HTML comment syntax
Files:
ATTRIBUTIONS-Rust.md
ATTRIBUTIONS-Rust.md
📄 CodeRabbit inference engine (.agents/skills/update-project-version/SKILL.md)
Regenerate
ATTRIBUTIONS-Rust.mdwith./scripts/generate_attributions.sh rustif Cargo metadata changed and committed attribution files must stay fresh
Files:
ATTRIBUTIONS-Rust.md
🪛 LanguageTool
.github/designs/topology-aware-adaptive-controls.md
[grammar] ~48-~48: Ensure spelling is correct
Context: ... tool cohort shape changes sharply. - Shed learned adaptive hints below a configur...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
ATTRIBUTIONS-Rust.md
[warning] 36414-36414: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 36417-36417: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above
(MD022, blanks-around-headings)
[warning] 36417-36417: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 36418-36418: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 36418-36418: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 36443-36443: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 36446-36446: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above
(MD022, blanks-around-headings)
[warning] 36446-36446: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 36447-36447: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 36447-36447: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (2)
crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs (1)
362-368: LGTM!crates/adaptive/src/acg_learner.rs (1)
69-82: 📐 Maintainability & Code QualityPlease confirm the required validation matrix ran for this adaptive Rust change.
This PR touches
crates/adaptive, so the repo guidance calls forjust test-rust,cargo fmt --all,cargo clippy --workspace --all-targets -- -D warnings, and the full language matrix before review.
As per coding guidelines, "Ifcrates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly."Also applies to: 164-312
Source: Coding guidelines
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs (1)
386-389: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAssert header injection on the manual-override branch.
This branch only verifies the body payload.
inject_agent_hintsalso writesAGENT_HINTS_HEADER_KEY, so a header-only regression would still pass here.Proposed test hardening
assert_eq!( manual_request.content["nvext"]["agent_hints"]["latency_sensitivity"], serde_json::json!(11.0) ); + assert!(manual_request.headers.get(AGENT_HINTS_HEADER_KEY).is_some());As per path instructions, "Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs` around lines 386 - 389, The manual-override test only checks the request body and misses the header behavior added by inject_agent_hints, so it can still pass if AGENT_HINTS_HEADER_KEY is not written. Update this test to also assert the expected header is present on the manual_request, using AGENT_HINTS_HEADER_KEY and the same manual-override setup, so the branch verifies both body and header injection.Source: Path instructions
crates/adaptive/src/acg_learner.rs (1)
224-285: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winBest-profile selection is order-dependent when cached and non-cached profiles mix.
The cached branch compares against
best_aggregate_stability, but the non-cached branch compares againstbest_profile_seed. If a stronger cached profile is seen first, the first non-cached profile always overwrites it becausebest_profile_seedis stillNone. Sincegrouped_observationsis aHashMap, the aggregate winner becomes nondeterministic and can regress the agent-level stability cached later in this method.Suggested fix
- let replace_best = best_profile_seed - .as_ref() - .map(|(_, current)| { + let replace_best = best_aggregate_stability + .as_ref() + .map(|current| { ( stability_result.stable_prefix_length, stability_result.total_observations, ) > (current.stable_prefix_length, current.total_observations) }) .unwrap_or(true); if replace_best { best_profile_seed = Some((observations_vec.clone(), stability_result.clone())); best_aggregate_stability = Some(stability_result.clone()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/acg_learner.rs` around lines 224 - 285, The best-profile selection logic in ACG learner is split between the cached path and the non-cached path, which makes the aggregate winner depend on iteration order. Update the selection in acg_learner.rs so both branches compare against the same shared “best” state instead of using best_aggregate_stability for cached profiles and best_profile_seed for fresh profiles; use the existing symbols best_profile_seed, best_aggregate_stability, and stability_result to centralize the comparison. Ensure a cached profile cannot be overwritten just because the first non-cached profile is encountered, and keep the winner deterministic across HashMap iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/designs/topology-aware-adaptive-controls.md:
- Around line 93-118: The Mermaid diagram under the Architecture Flow heading
starts without an introductory sentence, so add a short complete lead-in
sentence immediately before the fenced mermaid block in the
topology-aware-adaptive-controls.md section. Keep the existing diagram content
intact and ensure the new sentence clearly introduces the flowchart, matching
the documentation style used elsewhere in the document.
In `@ATTRIBUTIONS-Rust.md`:
- Around line 36414-36418: Fix the markdownlint regressions in the attribution
entries by updating the affected blocks around the strum entry and the other
referenced attribution block. Add the required blank lines around the heading
and the fenced code block, and change the fence to use a text info string so the
content satisfies MD022, MD031, and MD040. Use the existing attribution section
structure to keep the formatting consistent across these entries.
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 97-114: The drift calculation in the ACG learner is currently a
binary flag based only on whether stable_prefix_length is zero, which can
trigger convergence too early. Update the logic in acg_learner.rs around the
drift/stable_prefix_len computation to derive drift from the unstable fraction
of the profile instead of a 0.0/1.0 switch, using the available stability fields
in the learner flow so the drift gate reflects how much of the suffix remains
unstable.
In `@crates/adaptive/tests/unit/acg_learner_tests.rs`:
- Around line 445-480: The empty-observation repair test in acg_learner_tests.rs
only checks that a repaired observation is written back, but it should also
verify the stale converged state is cleared. Update
acg_learner_repairs_converged_stability_with_empty_observations to assert the
observation stored by SeedObservationBackend after AcgLearner::process_run no
longer has converged set to true, matching the behavior already expected by the
neighboring missing-observation repair case.
---
Outside diff comments:
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 224-285: The best-profile selection logic in ACG learner is split
between the cached path and the non-cached path, which makes the aggregate
winner depend on iteration order. Update the selection in acg_learner.rs so both
branches compare against the same shared “best” state instead of using
best_aggregate_stability for cached profiles and best_profile_seed for fresh
profiles; use the existing symbols best_profile_seed, best_aggregate_stability,
and stability_result to centralize the comparison. Ensure a cached profile
cannot be overwritten just because the first non-cached profile is encountered,
and keep the winner deterministic across HashMap iteration order.
In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs`:
- Around line 386-389: The manual-override test only checks the request body and
misses the header behavior added by inject_agent_hints, so it can still pass if
AGENT_HINTS_HEADER_KEY is not written. Update this test to also assert the
expected header is present on the manual_request, using AGENT_HINTS_HEADER_KEY
and the same manual-override setup, so the branch verifies both body and header
injection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: bcb7c090-5d4f-4500-8584-8801f2beb8ff
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.github/designs/topology-aware-adaptive-controls.mdATTRIBUTIONS-Rust.mdcrates/adaptive/src/acg_learner.rscrates/adaptive/src/topology.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg_learner_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/configuration.mdx
💤 Files with no reviewable changes (2)
- docs/adaptive-plugin/about.mdx
- docs/adaptive-plugin/configuration.mdx
📜 Review details
🔇 Additional comments (2)
crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs (1)
362-368: LGTM!crates/adaptive/src/acg_learner.rs (1)
69-82: 📐 Maintainability & Code QualityPlease confirm the required validation matrix ran for this adaptive Rust change.
This PR touches
crates/adaptive, so the repo guidance calls forjust test-rust,cargo fmt --all,cargo clippy --workspace --all-targets -- -D warnings, and the full language matrix before review.
As per coding guidelines, "Ifcrates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly."Also applies to: 164-312
Source: Coding guidelines
🛑 Comments failed to post (4)
.github/designs/topology-aware-adaptive-controls.md (1)
93-118: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Introduce the Mermaid block with a complete sentence.
The diagram starts immediately under the heading. Add a short lead-in sentence before the code block so this section follows the Markdown doc rules consistently.
As per coding guidelines, "Introduce every code block with a complete sentence."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/designs/topology-aware-adaptive-controls.md around lines 93 - 118, The Mermaid diagram under the Architecture Flow heading starts without an introductory sentence, so add a short complete lead-in sentence immediately before the fenced mermaid block in the topology-aware-adaptive-controls.md section. Keep the existing diagram content intact and ensure the new sentence clearly introduces the flowchart, matching the documentation style used elsewhere in the document.Source: Coding guidelines
ATTRIBUTIONS-Rust.md (1)
36414-36418: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Fix the markdownlint regressions in these attribution entries.
Both inserted blocks need blank lines around the headings and fenced code, and the fences should be tagged as
textto satisfy MD022/MD031/MD040.Also applies to: 36443-36447
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 36414-36414: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below(MD022, blanks-around-headings)
[warning] 36417-36417: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above(MD022, blanks-around-headings)
[warning] 36417-36417: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below(MD022, blanks-around-headings)
[warning] 36418-36418: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 36418-36418: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ATTRIBUTIONS-Rust.md` around lines 36414 - 36418, Fix the markdownlint regressions in the attribution entries by updating the affected blocks around the strum entry and the other referenced attribution block. Add the required blank lines around the heading and the fenced code block, and change the fence to use a text info string so the content satisfies MD022, MD031, and MD040. Use the existing attribution section structure to keep the formatting consistent across these entries.Source: Linters/SAST tools
crates/adaptive/src/acg_learner.rs (1)
97-114: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Compute drift from the unstable fraction instead of a binary flag.
This collapses every nonzero stable prefix to
0.0drift. A profile with one stable block and a large unstable suffix will satisfy the drift gate as quickly as a nearly fully stable profile, which makes convergence fire too early.Suggested fix
- let drift = if stability.stable_prefix_length == 0 { - 1.0 - } else { - 0.0 - }; + let unstable_spans = total_spans.saturating_sub(stability.stable_prefix_length); + let drift = if total_spans == 0 { + 1.0 + } else { + unstable_spans as f64 / total_spans as f64 + };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/acg_learner.rs` around lines 97 - 114, The drift calculation in the ACG learner is currently a binary flag based only on whether stable_prefix_length is zero, which can trigger convergence too early. Update the logic in acg_learner.rs around the drift/stable_prefix_len computation to derive drift from the unstable fraction of the profile instead of a 0.0/1.0 switch, using the available stability fields in the learner flow so the drift gate reflects how much of the suffix remains unstable.crates/adaptive/tests/unit/acg_learner_tests.rs (1)
445-480: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Also assert that empty-observation repair clears the stale
convergedflag.This test only verifies that one observation is written back. If repair still re-saved
converged = true, the regression would pass here even though the neighboring missing-observation test correctly expects learning to reopen.Suggested fix
let repaired_observations = backend .load_observations(&learning_key) .await .unwrap() .expect("empty observations should be repaired instead of trusting converged stability"); assert_eq!(repaired_observations.len(), 1); + + let repaired_stability = backend + .load_stability(&learning_key) + .await + .unwrap() + .expect("repaired stability should be stored"); + assert!( + !repaired_stability.converged, + "a repaired single-observation profile should re-enter normal convergence" + );As per path instructions, "Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.#[tokio::test(flavor = "current_thread")] async fn acg_learner_repairs_converged_stability_with_empty_observations() { let learner = AcgLearner::new_with_convergence( "agent-a", 20, StabilityThresholds::default(), Some(ConvergenceConfig { enabled: true, epsilon: 0.001, stability_window: 3, }), ); let request = sample_request("gpt-4o", "Stable system", "Stable prompt"); let learning_key = derive_acg_learning_key("agent-a", &request); let seed_observation = build_prompt_ir(&request).unwrap(); let mut stale_stability = analyze_stability( std::slice::from_ref(&seed_observation), &StabilityThresholds::default(), ); stale_stability.converged = true; let backend = SeedObservationBackend::new(&learning_key, Vec::new()); backend.seed_stability(&learning_key, stale_stability); learner .process_run(&sample_run(vec![request]), &backend, &empty_cache()) .await .unwrap(); let repaired_observations = backend .load_observations(&learning_key) .await .unwrap() .expect("empty observations should be repaired instead of trusting converged stability"); assert_eq!(repaired_observations.len(), 1); let repaired_stability = backend .load_stability(&learning_key) .await .unwrap() .expect("repaired stability should be stored"); assert!( !repaired_stability.converged, "a repaired single-observation profile should re-enter normal convergence" ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/acg_learner_tests.rs` around lines 445 - 480, The empty-observation repair test in acg_learner_tests.rs only checks that a repaired observation is written back, but it should also verify the stale converged state is cleared. Update acg_learner_repairs_converged_stability_with_empty_observations to assert the observation stored by SeedObservationBackend after AcgLearner::process_run no longer has converged set to true, matching the behavior already expected by the neighboring missing-observation repair case.Source: Path instructions
I don't see anything presented in your comments above here. Questions:
|
Thanks, fair questions.
|
Yes, I clearly mentioned this already. This should stay in draft until you have a compelling story with clear measured benefit and no latency degradation. |
Overview
This pull request adds optional topology-aware controls to the existing
nemo-relay-adaptiveruntime. The controls are internal adaptive behavior, not a new public topology package: the separatenemo-relay-adaptive-topologycrate has been removed, and the proposed Pythonnemo_relay.adaptive_topologyprimitive surface has been removed.Reviewer-facing design note:
.github/designs/topology-aware-adaptive-controls.md.This design note is intentionally not published in Fern documentation. It records the goals, non-goals, benefit gates, observable signals, sample evidence, and rollout expectations for the internal controls.
Details
This change keeps topology-aware behavior inside the adaptive plugin contract:
crates/adaptive/src/topology.rsas an internal module with the bounded primitives needed by the adaptive learners:ConvergenceDetector,DriftDetector,GeometricGovernor, andBettiNumbers.crates/adaptive-topologyworkspace crate and its Cargo workspace dependency.crates/adaptive/tests/unit/topology_tests.rs..github/designs/.The three measurable controls are:
Sample evidence captured in the design note:
Local validation:
Validation notes:
could not start locally because
uvfailed while removing.venv\lib64with WindowsAccess is denied (os error 5)before test collection.could not start in this Windows shell because
justcould not findcygpathto translate the recipe shebang path. Running Fern directly passedfern check --warnings; direct broken-link validation still reports missing ignored generated API reference pages because the docs API-reference generation step could not be run throughjust docsin this shell.Where should the reviewer start?
Start with the design and scope reduction:
Then review the runtime integration points:
Then check binding and docs parity for configuration-only public surfaces:
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation