Skip to content

Spec: deterministic network health scoring thresholds, formulas, and reason codes #122

@DarinShapiro

Description

@DarinShapiro

Parent epic: #121

Problem

The project needs a deterministic scoring model for Thread mesh health that can be computed in the backend, validated in replay, and reused by both the UI and the AI layer. Today, there is no exact definition of what counts as healthy, under-redundant, unstable, or bottleneck-prone.

Goal

Define the scoring specification in enough detail that two separate implementations would compute materially the same answers from the same source data.

Deliverables

  1. Define the atomic metrics.
  2. Define normalization and weighting.
  3. Define thresholds for healthy / warning / critical states.
  4. Define confidence and freshness penalties.
  5. Define explanation-ready reason codes.
  6. Define example calculations on representative topologies.

Required spec sections

1. Source signals

Specify which backend observations may feed the score, such as:

  • strongest available RSSI
  • LQI
  • TX retry rate or similar instability counters
  • packet-loss or error indicators if available
  • observation freshness / staleness
  • bidirectional asymmetry
  • parent-child relationship quality
  • topology role and node type

2. Edge-quality score

Define:

  • per-signal normalization to 0..1
  • weighting rules
  • hard caps or penalties for stale data
  • penalties for one-way or highly asymmetric links
  • classification buckets for strong / usable / weak / unknown

The spec should include default threshold bands, for example:

  • strong edge
  • usable edge
  • weak edge
  • unknown / stale edge

3. Router-health score

Define exact metrics for routers, including:

  • strong-neighbor count
  • usable-neighbor count
  • alternate-path count
  • best path quality
  • articulation risk / bridge dependence
  • load headroom where data exists

Include the default policy target that a healthy router should have at least two strong router neighbors, while still allowing the scoring system to degrade gracefully when the network is small.

4. End-device health score

Define exact metrics for sleepy or end devices, including:

  • parent quality
  • parent stability
  • retry/error trend if available
  • whether the parent itself is fragile or overloaded

5. Network-wide health score

Define how to aggregate node and edge metrics into:

  • overall network score
  • redundancy score
  • path-diversity score
  • bottleneck score
  • partition-risk score
  • confidence score

6. Reason codes and remediation hooks

For every degraded state, define machine-readable reason codes such as:

  • LOW_ROUTER_REDUNDANCY
  • WEAK_BRIDGE_EDGE
  • HIGH_BOTTLENECK_CENTRALITY
  • MARGINAL_PARENT_LINK
  • STALE_LINK_DATA

Each reason code should carry enough structured context for the UI and AI to explain the condition.

7. Validation examples

Include at least:

  • a healthy small mesh
  • a mesh with one weak bridge between two router clusters
  • a mesh with one articulation router
  • a mesh where end devices are healthy even though router redundancy is mediocre
  • a stale-data scenario

Acceptance criteria

  • The issue body or linked design doc contains explicit formulas or pseudocode for each score.
  • Thresholds are concrete enough for test fixtures, not just descriptive prose.
  • The spec distinguishes router health, end-device health, and overall mesh health.
  • The spec defines machine-readable reason codes for degraded health.
  • The spec is suitable for replay-based test fixture generation.

Out of scope

  • Final HTTP endpoint naming
  • UI visual design
  • Floorplan-aware hypothetical placement simulation beyond defining the inputs it would need later

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions