feat: Add ComponentStatus model in Flow — derive, persist, and surface per-component readiness#2264
feat: Add ComponentStatus model in Flow — derive, persist, and surface per-component readiness#2264kunzhao-nv wants to merge 8 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
156e122 to
0e415ad
Compare
Introduces the Flow-side representation of a component's operability,
derived from core's per-component state machine:
- Phase: coarse lifecycle bucket shared by compute, nvswitch, and
power shelf (Unknown / Initializing / Ready / InUse / Error /
Deleting).
- ComponentStatus: Phase + Reason + BlockedOperations, with
IsReady / Blocks helpers.
- MapComponentStatus: per-type translation from core's raw
controller_state to ComponentStatus.
Compute uses ManagedHostState's path-form Display ("Ready",
"Assigned/...", etc.); switch and power shelf use the serde-tagged
JSON form ({"state":"ready"}). Unrecognized inputs map to
PhaseUnknown so downstream gating fails closed.
No callers yet; integration follows in subsequent commits.
Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Adds the wire types for the new component status concept:
- Phase enum mirroring pkg/types.Phase (Unknown / Initializing /
Ready / InUse / Error / Deleting).
- ComponentStatus message with phase, reason, and blocked_operations.
- Component.status field (= 9) carrying the live status of each
component returned by Flow's inventory APIs.
Regenerates flow.pb.go, flow_grpc.pb.go, and docs/grpc-api.{md,html}.
Server code does not populate the field yet — wired up in subsequent
commits.
Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Adds a jsonb `status` column to the component table to persist Flow's per-component ComponentStatus (phase, reason, blocked_operations) computed by the inventory loop from core's controller_state. Single jsonb so the shape can evolve without further DDL. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Wires the per-type ComponentStatus mapper into the inventory loop and writes the result to the new component.status column on every cycle. - nicoapi: FindSwitchControllerStates / FindPowerShelfControllerStates expose the raw controller_state string Core returns for switches and power shelves (compute already carried it on MachineDetail.State). Mock helpers (SetSwitchControllerState / SetPowerShelfControllerState) follow the existing rack-id pattern. - model.Component: SetStatusByComponentID writes status by external_id. - inventorysync: syncMachineStatuses uses the pre-fetched MachineDetail map (no extra RPC); syncSwitchStatuses and syncPowershelfStatuses each add one nicoapi round-trip. persistComponentStatuses centralises the delta-detect-and-write pattern. - pkg/types: ComponentStatus.Equal lets the delta check avoid pointless writes (struct contains a slice and is therefore not == comparable). Status is only written when it actually changes. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Carry the persisted ComponentStatus through the DB → domain → proto chain so ListComponents / GetComponent callers see the Flow-derived view of operability. - domain Component (pkg/inventoryobjects/component) gains an optional Status pointer; nil means "no status computed yet". - dao.ComponentFrom copies model.Component.Status through. - protobuf.ComponentTo populates the new pb.Component.status field via the new ComponentStatusTo / PhaseTo converters. - Added operationTypeFromTypesTo for the types.OperationType → proto enum mapping (kept separate from the existing OperationTypeToProto, which converts from taskcommon.TaskType). No reverse converter: status is read-only over the API surface. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
ReadinessGate holds mutating component operations until every target component's persisted ComponentStatus permits them. Inventory sync already writes the status, so the gate reads it from the component table instead of polling Core's state-machine endpoints on every iteration. Layout in this commit: - gate.go defines the Gate / StatusReader interfaces and DBGate, the production polling loop. Permissive on missing status (fail-open on transient gaps); the rack-scoped form resolves rack -> host components. - db_reader.go implements StatusReader against bun.IDB by reading the component table. - gate_test.go drives DBGate with an in-memory fake reader: covers empty / nil-gate short-circuits, ready / missing / blocking / partial-blocking / op-scoped, transition-mid-poll, dedup, context cancellation, and the rack delegation path. Production call sites are not migrated in this commit. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Call sites in compute/nvswitch/powershelf managers have Core (external) component / rack IDs as []string — that's what flows through the Temporal task target. Rekey the gate to match so future callers don't have to convert at every call site. - StatusReader / Gate methods now take []string. DBReader joins by component.external_id (string) and component.rack_id (uuid parsed from string). - MemReader is a new exported in-memory StatusReader for test packages outside readiness — mirrors nicoapi.NewMockClient so manager tests can build a realistic gate without spinning up a DB. The package-internal gate tests keep their own fakeReader so they can count poll iterations. - Test IDs renamed away from words like "ready"/"blocked" to avoid matching substrings of the gate's own log / error messages. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Mirrors the convention already used by rack, nvldomain, and task_schedule: a single updated_at column stamped by the shared set_updated_at trigger on every UPDATE. This gives callers one freshness signal for the row regardless of which field changed (power_state, firmware_version, status, description, ...), rather than per-field timestamps. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
0e415ad to
be6abe4
Compare
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-05 21:15:24 UTC | Commit: be6abe4 |
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Description
Adds
ComponentStatus— Flow's unified view of a component's operability, derived from Core's per-type state machine (computeManagedHostState, switchSwitchControllerState, power-shelfPowerShelfControllerState). Inventorysync recomputes it every cycle, stores it as ajsonbcolumn oncomponent, and the gRPC API surfaces it on everyComponentresponse.ComponentStatushas three fields:Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes