Skip to content

Epic: deterministic network health scoring and recommendation framework #121

@DarinShapiro

Description

@DarinShapiro

Problem

The project can expose topology, node status, and radio/link observations, but it does not yet produce a deterministic, auditable measure of overall Thread mesh health. That leaves both the UI and the AI layer without a stable basis for answering questions like:

  • Is the network healthy right now?
  • Which routers are under-redundant?
  • Which links are weak but structurally important?
  • Where would adding one more mains-powered router improve resilience the most?

Goal

Introduce a backend-owned network health model that computes:

  • edge quality scores for observed router and parent links
  • node-level resilience and redundancy metrics
  • network-wide health and bottleneck metrics
  • ranked improvement opportunities and placement candidates

The AI layer should consume these computed facts and explain them. It should not derive core health scores from raw payloads on its own.

Why this matters

  • Gives users a fast, interpretable answer to overall mesh health.
  • Lets the UI visualize weak links, bottlenecks, and redundancy gaps.
  • Lets AI provide evidence-grounded recommendations such as adding a Thread router between two weakly connected regions.
  • Creates a durable contract for replay validation and future floorplan-aware placement work.

Scope

This epic covers two foundational deliverables:

Follow-on implementation issues should build from those artifacts rather than inventing ad hoc scoring in multiple places.

Initial model outline

Edge quality

Use observed link evidence such as:

  • strongest available RSSI
  • LQI
  • TX retry rate or equivalent instability signals
  • observation freshness
  • directional symmetry when both sides are known

Node health

For routers, compute:

  • strong-neighbor count
  • usable-neighbor count
  • alternate-path count
  • best path quality toward leader or border-router-adjacent backbone
  • bottleneck / route concentration risk
  • child or load headroom where available

For end devices, compute:

  • parent quality
  • parent stability
  • signal margin / reliability trend where available

Network health

Compute mesh-wide metrics such as:

  • percent of routers meeting the target strong-neighbor threshold
  • path diversity / alternate path coverage
  • articulation-point and bridge-edge risk
  • weakest-cut risk for partitions or near-partitions
  • bottleneck concentration on one or two routers
  • stale-data and confidence penalties

Recommendation engine inputs

Rank opportunities from computed facts such as:

  • routers with fewer than target strong neighbors
  • weak bridge edges between clusters
  • articulation routers with high dependency
  • groups of devices depending on poor parent quality
  • candidate placements that would improve alternate-path coverage

Non-goals

  • Full floorplan-aware RF simulation in this first pass
  • LLM-owned scoring logic
  • UI-specific formatting logic in the backend response model

Related issues

Acceptance criteria

  • There is a written, reviewable scoring model with explicit thresholds and formulas.
  • There is a written, reviewable API contract with example payloads for both current health and placement recommendations.
  • The model cleanly separates deterministic backend computation from AI explanation.
  • The design is strong enough to support replay-based validation before live deployment work begins.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions