On-demand automatic model loading/unloading (VRAM management)

## Summary

Add intelligent automatic VRAM management for Ollama models — preloading models into VRAM before they're requested and evicting them when VRAM pressure is high.

## Current Behavior

Ollama loads models into VRAM on first inference request (cold start adds several seconds). Models stay resident until Ollama's built-in LRU eviction kicks them out. Users have no control over which models are warm or how VRAM is budgeted.

## Desired Behavior

- **Warm model list**: User designates models that should always be loaded in VRAM when the system is idle
- **Preloading**: On boot or after a model download, warm models are automatically loaded without waiting for a chat request
- **VRAM budget**: User sets a VRAM reservation (e.g. "keep 2 GB free") so the system doesn't over-commit
- **Priority-based eviction**: When VRAM is needed, evict least-recently-used non-warm models first; warm models are evicted last
- **TUI integration**: Show VRAM budget and warm model configuration in the Models screen or a dedicated VRAM management screen

## Technical Notes

- Ollama's `POST /api/generate` with `keep_alive` controls per-model VRAM residency
- `GET /api/ps` shows currently loaded models with VRAM usage
- `nvidia-smi` provides total/used VRAM for budget enforcement
- Could be implemented as a lightweight background service or within the TUI's periodic refresh loop

## Related

- Manual load/unload is being added to the TUI Models screen (current sprint)
- VRAM display is being added to the Dashboard (current sprint)

## Priority

Nice to have — the manual load/unload covers the immediate need. This is the automated layer on top.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-demand automatic model loading/unloading (VRAM management) #14

Summary

Current Behavior

Desired Behavior

Technical Notes

Related

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On-demand automatic model loading/unloading (VRAM management) #14

Description

Summary

Current Behavior

Desired Behavior

Technical Notes

Related

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions