feat: [DSM-142] Add ColdStats aggregate to CanisterStates#10286
Open
alin-at-dfinity wants to merge 4 commits into
Open
feat: [DSM-142] Add ColdStats aggregate to CanisterStates#10286alin-at-dfinity wants to merge 4 commits into
ColdStats aggregate to CanisterStates#10286alin-at-dfinity wants to merge 4 commits into
Conversation
Maintains a small `ColdStats` aggregate over the canisters in the `cold` pool, updated incrementally on every transition into / out of `cold`. This lets the "touch every canister" aggregate queries — `total_compute_allocation`, `total_canister_memory_usage`, `memory_taken`, `callback_count`, `guaranteed_response_message_memory_taken`, `best_effort_message_memory_taken` — run in `O(|hot|)` instead of `O(|all canisters|)`, which is the primary motivation for the hot/cold split on subnets with a long tail of idle canisters. The aggregates are derived (not persisted) and are reconstructed by `CanisterStates::new` on checkpoint load. `debug_assert_invariants` (now also runs an `O(|cold|)` recompute and compares against the live aggregate) ensures every mutating method keeps them in sync, and the `ColdStats` struct stays module-private — callers always reach the totals through the public aggregator methods on `CanisterStates`. `MemoryTaken`'s fields are bumped from private to `pub(crate)` so that `CanisterStates::memory_taken` can construct the struct directly, keeping `MemoryTaken` in its current home in `replicated_state.rs`. `CanisterStates::memory_taken` itself is `pub(crate)` and will be wired up to `ReplicatedState::memory_taken` in the next PR; an `#[allow(dead_code)]` keeps the build warning-free until then. Aggregator behaviour is exercised by two new tests (`memory_aggregators_combine_hot_and_cold`, `callback_count_combines_hot_and_cold`) and the bookkeeping discipline is exercised by an extended set of `*_updates_cold_stats*` tests covering `insert`, `remove`, `try_cool*`, `for_each_mut`, `try_for_each_mut`, and `retain`. Co-authored-by: Cursor <cursoragent@cursor.com>
…ry. Rename raw_memory to execution_memory, so it better matches the equivalent MemoryTaken field. Update documentation and tests.
A canister can satisfy `CanisterState::is_cold()` while still holding a guaranteed-response slot reservation: `is_cold()` only requires empty input/output *messages* (the pool count) and no unexpired best-effort callback, both of which are independent of whether the canister has in-flight guaranteed-response requests. A canister that has pushed a guaranteed-response request that's already been moved to an outgoing stream still keeps the input-slot reservation for the eventual response, which contributes `MAX_RESPONSE_COUNT_BYTES` to its `guaranteed_response_message_memory_usage()`. The previous commit dropped this field from `ColdStats` on the assumption it was always zero. It isn't, and the consequence is that `guaranteed_response_message_memory_taken()` quietly under-reports subnet-wide memory: promoting a cold canister with a reservation to `hot` (e.g. on the next `get_mut`) makes the subnet total jump up out of nowhere, breaking conservation invariants in downstream code (stream handler `debug_assert!`s, in particular). Restore the field and the corresponding `add`/`sub` bookkeeping, fold it into `guaranteed_response_message_memory_taken`, `total_canister_memory_usage`, and `memory_taken`, and add a focused test (`cold_canister_with_guaranteed_response_reservation_is_aggregated`) exercising the case via `push_output_request` followed by draining the output queue. Best-effort message memory remains hot-only: an unexpired best-effort callback forces the canister into `hot`, and any expired best-effort callback shows up as a pending input which also forces `hot`. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Maintains a small
ColdStatsaggregate over the canisters in thecoldpool, updated incrementally on every transition into / out ofcold. This lets the "touch every canister" aggregate queries —total_compute_allocation,total_canister_memory_usage,memory_taken,callback_count,guaranteed_response_message_memory_taken,best_effort_message_memory_taken— run inO(|hot|)instead ofO(|all canisters|), which is the primary motivation for the hot/cold split on subnets with a long tail of idle canisters.The aggregates are derived (not persisted) and are reconstructed by
CanisterStates::newon checkpoint load.debug_assert_invariantsensures every mutating method keeps them in sync, and theColdStatsstruct stays module-private — callers always reach the totals through the public aggregator methods onCanisterStates.MemoryTaken's fields are bumped from private topub(crate)so thatCanisterStates::memory_takencan construct the struct directly, keepingMemoryTakenin its current home inreplicated_state.rs.CanisterStates::memory_takenitself ispub(crate)and will be wired up toReplicatedState::memory_takenin the next PR.