Idea
Allow multiple NeuralDrive instances on a network to discover each other and pool their compute resources. Adding more USB-booted nodes would scale inference capacity — more GPUs, more VRAM, more concurrent requests.
Open Questions
- Discovery: mDNS/Avahi auto-discovery vs manual node registration?
- Load balancing vs distributed inference: Simple request-level load balancing (route whole requests to nodes with capacity) is straightforward. Distributed inference across nodes (tensor parallelism over the network) is much harder and may not be practical over commodity networking.
- Model placement: Which node holds which model? Replicate popular models across nodes, or dedicate nodes to specific models?
- Coordination: Does one node act as a primary/coordinator, or is it fully peer-to-peer?
- API surface: Should the cluster present a single unified API endpoint, or does each node remain independently addressable?
- State: Shared model registry? Synchronized config? Or fully independent nodes behind a load balancer?
Initial Thought
A simple load-balancer approach (Caddy upstream pool, round-robin or least-connections across discovered nodes) would be the fastest path to something useful. More sophisticated distributed inference could come later.
Status
Needs design work before implementation. Logging for future consideration.
Idea
Allow multiple NeuralDrive instances on a network to discover each other and pool their compute resources. Adding more USB-booted nodes would scale inference capacity — more GPUs, more VRAM, more concurrent requests.
Open Questions
Initial Thought
A simple load-balancer approach (Caddy upstream pool, round-robin or least-connections across discovered nodes) would be the fastest path to something useful. More sophisticated distributed inference could come later.
Status
Needs design work before implementation. Logging for future consideration.