pool

Minimal host roster + remote exec + one-shot bootstrap. Single hexa file, zero deps, state at ~/.pool/pool.json.

Install

# 1. Install hexa-lang (gives you `hexa` + `hx` package manager)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/dancinlab/hexa-lang/main/install.sh)"

# 2. Install pool
hx install pool

hx wires the pool shim into ~/.hx/bin/ (must be on PATH).

Verbs

pool add <name> <user@host> [--sudo]   # register a host (ssh-probes `uname -s` for OS)
pool rm  <name>                         # remove from roster
pool list [--json] [--fast]             # roster + live cpu/ram/gpu/disk per host (--fast skips probe; --json embeds usage)
pool on  <name> <cmd...>                # ssh + exec on host
pool on  all <cmd...> [--jobs N] [--timeout S]  # parallel fanout to every shared host (skips [ded]; generic, zero domain knowledge)
pool status                             # live reachability probe (each host)
pool refresh                            # re-probe os for every host
pool init                               # bootstrap every shared host with all features (skips [ded])
pool clean <name>|all [--dry-run] [--deep]  # run disk-cleanup now — safe tier; --deep adds model caches + docker
pool clean on|off <name>                # exclude / include a host in disk-cleanup
pool desc <name> [text...]              # set a host description (shown in `pool list`)

`pool init` features

pool init runs every bootstrap feature on every shared host in order (skipping any host with "shared": false). Adding a feature is one labels.push(...) / scripts.push(...) pair in cmd_init inside bin/pool.hexa.

feature	linux	macos
tailscale (headless)	`curl tailscale.com/install.sh \| sudo sh`	`brew install tailscale` + `brew services start tailscale` (CLI, not the GUI cask — avoids menubar conflicts)
`/etc/cron.daily/disk-cleanup`	the two-tier `pool clean` script, logged to `/var/log/pool-cleanup.log`. Safe tier daily (journalctl vacuum · apt autoremove+clean · snap disabled-revs · `/var/{log,crash,tmp}` · `/tmp` · `~/.cache/{uv,pip,puppeteer,mathlib,thumbnails}`); deep tier when disk ≥ 85% (`~/.cache/{huggingface,torch,npm,…}` · `docker system prune`). Uniform 3-day `mtime` retention.	skipped (newsyslog + systemd-tmpfiles equivalents builtin)
no-sleep	skipped (Linux servers don't display-sleep)	`pmset -a displaysleep=0 sleep=0 disksleep=0` + screensaver `idleTime=0` — FileVault Macs sever sshd-reachable interfaces on display lock, breaking long-running headless builds
persistent-journal	`mkdir /var/log/journal` + restart journald (post-mortem logs survive reboot)	skipped (macOS uses unified log, persistent by default)
panic-recovery	`kernel.panic=10` + `kernel.panic_on_oom=0` via `/etc/sysctl.d/60-pool-recovery.conf` — kernel auto-reboots 10s after panic instead of hanging	skipped (Darwin kernel auto-restarts on panic)
earlyoom	`apt install earlyoom` + enable — userspace OOM-killer fires at <10% free RAM+swap, converting kernel hangs into clean per-process SIGKILL	skipped (macOS has Jetsam built in)
swapfile	32 GB `/swapfile-pool` + `/etc/fstab` entry + `vm.swappiness=10` — only if existing swap < 30 GB	skipped (macOS manages dynamic swap automatically)
watchdog	`softdog` + systemd `RuntimeWatchdogSec=30s` (auto-reset on kernel hang)	skipped (no equivalent userspace API)

tailscale up still needs interactive auth — pool init only installs.

The OOM-resilience block (persistent-journal · panic-recovery · earlyoom · swapfile · watchdog) was added after a host-OOM-on-concurrent-heavy-workload incident: a 17 GB ckpt eval + a 17 GB HF upload dispatched to the same 30 GB host in parallel saturated RAM, the kernel OOM-killer hung, and the host needed a physical reboot. Each feature is one independent line of defense — together they convert that failure mode into "earlyoom kills the lower-priority job; if it still escalates, swap absorbs the spike; if the kernel still hangs, the watchdog auto-resets within 30 s; persistent journal preserves the evidence; panic-recovery reboots in 10 s instead of forever." The structural dispatcher fix (a pool reservation/commitment ledger) is a separate roadmap item.

pool clean

pool clean <name>|all runs disk-cleanup immediately — no waiting for the daily cron — and reports freed space (df before/after):

tier	when	targets
safe	always	journalctl vacuum · `apt-get autoremove`/`clean` · snap disabled revisions · `/var/{log,crash,tmp}` · `/tmp` · `~/.cache/{uv,pip,puppeteer,mathlib,thumbnails}`
deep	`--deep`, or disk ≥ 85%	model caches `~/.cache/{huggingface,torch,torch_extensions,npm,yarn,go-build}` · `docker system prune`

--dry-run counts what each tier would remove and deletes nothing. Uniform 3-day mtime retention — atime is unreliable on noatime / relatime mounts.

pool clean off <name> excludes a host: pool clean skips it and pool init installs no cron hook there. For dedicated, non-shared machines (the clean: false roster key).

State

~/.pool/pool.json:

{
  "hosts": [
    {"name": "ubu-1",     "ssh": "ubu-1",                   "sudo": true, "os": "linux"},
    {"name": "ubu-2",     "ssh": "ubu-2",                   "sudo": true, "os": "linux"},
    {"name": "mini",      "ssh": "mini",                    "sudo": true, "os": "macos"},
    {"name": "pi5-akida", "ssh": "ubuntu@192.168.50.155",   "sudo": true, "os": "linux",
     "shared": false, "clean": false, "description": "anima — dedicated Akida neuromorphic host"}
  ]
}

Override path with POOL_STATE=<path>. Optional per-host keys:

key	default	effect
`description`	—	free text · shown in `pool list`
`shared`	`true`	when `false`: `[ded]` marker in `pool list`; excluded from `pool on all` / `pool init` / `pool clean all`. Explicit `pool on <name>` / `pool clean <name>` still works.
`clean`	`true`	when `false`: excluded from `pool clean` and the `disk-cleanup` feature in `pool init` (orthogonal to `shared` — a shared host can opt out of disk-cleanup only)

`pool list`

pool list probes every host concurrently (8-wide async fan-out, ~8s probe timeout) and renders load · ram · gpu · disk inline. Shared hosts get a blank flag column; dedicated (shared: false) hosts get a [ded] marker:

                                  load  ram    gpu  disk      sudo  desc
mini              mini    macos   0.42  8/16G  -    100/512G  sudo
ubu-1             ubu-1   linux   0.01  3/30G  0%   546/915G  sudo
ubu-2             ubu-2   linux   4.02  4/30G  0%   721/915G  sudo  shared CI runner
pi5-akida  [ded]  ubuntu@192…     -     -      -    -         sudo  anima — dedicated Akida host

--fast skips the probe for a quick roster-only view (no ssh). pool list --json embeds the probe result as a per-host usage object:

{"name": "ubu-1", "ssh": "ubu-1", "os": "linux",
 "usage": {"load": "0.01", "ram": "3/30G", "gpu": "0%", "disk": "546/915G"}}

Locale-independent (LC_ALL=C forced on remote). Linux free -g parsed by row/column position. macOS uses vm_stat + sysctl hw.memsize. GPU = nvidia-smi utilization when present, - otherwise.

Roadmap

See TODO.md (dispatcher-side reservation ledger · future verbs).

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
bin		bin
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
FORK_GUARD.md		FORK_GUARD.md
INBOX.log.md		INBOX.log.md
INBOX.md		INBOX.md
LATTICE_POLICY.md		LATTICE_POLICY.md
LICENSE		LICENSE
POOL.log.md		POOL.log.md
POOL.md		POOL.md
README.md		README.md
TODO.md		TODO.md
hexa.toml		hexa.toml
install.hexa		install.hexa
project.tape		project.tape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pool

Install

Verbs

`pool init` features

pool clean

State

`pool list`

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pool

Install

Verbs

pool init features

pool clean

State

pool list

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`pool init` features

`pool list`

Packages