feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated) by bussyjd · Pull Request #639 · ObolNetwork/obol-stack

bussyjd · 2026-06-13T15:56:54Z

Decentralized auto-research subsystem

Publish a research program (an objective + an acceptance criterion), let workers on other obol-stacks opt in over the open internet, have them contribute results to a private collective knowledge base, and distribute rewards proportional to validated improvement — all composed from layers the stack already has (escrow, tunnels, the controller-never-signs invariant), with one small new self-contained package.

This was built and validated live on real hardware (two NVIDIA GB10 DGX Sparks as remote workers, owner published from a laptop over a real Cloudflare tunnel, no tailscale) running a real nanoGPT-style val_bpb-minimization training workload end-to-end.

What's in it

Area	Path	Role
Membership	`internal/research/groupauth/`	RFC 8628 device-authorization flow, re-implemented dependency-free (stdlib, in-memory). Owner `Approve(groupID, userCode)` is the membership decision that keeps the KB private. One-time SHA-256-hashed member tokens (`obol-research-mt-…`), `VerifyToken` / `Revoke`.
Knowledge base	`internal/research/kb/`	Append-only result log. KEEP-if-beats-champion acceptance, `impact = Δmetric` (directional), champion promotion, first-verified-wins de-dup, by-impact proportional payouts.
Server	`internal/research/server/`	HTTP surface: device-auth endpoints + KB endpoints, `member()` / `ownerOnly()` middleware. Open-membership programs auto-approve.
CLI	`cmd/obol/research.go`	`obol research publish \| approve \| status`. `publish` stands up the host server + a Cloudflare quick tunnel and writes program state to `$CONFIG/research/<name>.json`.
Skill	`internal/embed/skills/research-program/`	`SKILL.md` + `scripts/worker.py` — the off-chain runbook + a stdlib-only worker that joins a program, runs the (pluggable) train script, and submits its metric.

Design

Declarative & domain-agnostic. A program is {objective, criteria{metric, direction, accept, threshold}, baseline, pool, split}. metric is an arbitrary string, so any domain lands without a schema change — true to the open auto-research pattern (a metric to minimize/maximize + a KEEP rule).
Decentralization is in the workers, not the store. One owner-hosted authoritative KB; workers are anywhere. Cross-stack workers authenticate via the public device-auth endpoints over the owner's tunnel.
Private by membership, never by obscurity. The KB rides a public-but-membership-gated route class (the same shape as /services/* and x402 ForwardAuth, credential swapped payment→member token). It is never hostname-restricted-internal and never open — it sits exactly in the gated tier the tunnel-security rules allow.
Rewards. First-verified-wins de-dup + payout proportional to validated impact (split: by-impact). Settlement reuses the existing non-custodial escrow leg; the controller never signs and holds no keys.

Live validation (real hardware, no tailscale)

Owner published from this machine via a real Cloudflare quick tunnel.
Workers spark1 + spark2 (NVIDIA GB10 DGX Spark) joined an open-membership program over the open internet and ran the real training workload.
Result: champion spark2 (val_bpb 2.110563); spark1 2.192544. By-impact split → spark1 90.78 / spark2 9.22 OBOL.
The worker doubles as the record of real-world GB10 adaptations (FlashAttention-3 → SDPA; eager execution where torch.compile/Triton can't target sm_121; batch/eval-token sizing) — i.e. "the agent edits the train script" is the literal workflow.

Composability / blast radius

Self-contained. The Go packages import only internal/research/* + internal/config — zero dependency on the in-flight escrow/bounty PRs, so this reviews and merges independently.
cmd/obol/main.go: one line registers researchCommand(cfg).
Design notes + the peer report/pitch live in plans/ (gitignored) and are intentionally not part of this PR — code only.

Tests

go build ./... clean.
go test ./internal/research/... green (groupauth, kb, server).

Security invariants honored

KB/membership endpoints live only in the gated route class — never exposed to the tunnel beyond membership gating, never added to internal hostname-restricted routes.
Controller/owner server never signs or holds payout keys; settlement is on-chain canonical via the existing escrow leg.

…embership layer Grounds a ServiceBounty-adjacent 'decentralized auto-research' narrative in the AutoScientists framework (karpathy/autoresearch): publish a research ID, workers opt in, build a PRIVATE collective knowledge base around favourable hypotheses, rewards distributed on accepted KEEPs — with membership coordinated at the OAuth level. - internal/research/groupauth/: RFC 8628 device-authorization flow adapted dependency-free from the d-inference coordinator. The owner's Approve() is the membership decision that keeps the KB private to the group; issues one-time SHA-256-hashed member tokens; VerifyToken gates the KB. Full code→approve→token→verify + expiry/revoke tests. - plans/decentralized-auto-research.md: thin declarative schema (ResearchProgram = TASK.md frontmatter: metric/direction/accept; reuses EvaluatorEnrollment, evaluator market, x402-escrow) + the private (never-tunnelled) collective-KB service. Schemas kept minimal and true to autoresearch; LAUNCH.md-style operational hooks stay off-chain.

… research CLI, worker Implements the decentralized auto-research design (plans/decentralized-auto-research.md): - internal/research/kb: collective KB + acceptance (autoresearch KEEP) + impact + by-impact/champion-takes-all payout. First-verified-wins. Tested. - internal/research/server: owner-hosted HTTP — device-auth (groupauth) endpoints + membership-gated KB (/task,/champion,/results,/status). Owner-or-member gate. Tested. - cmd/obol research publish|approve|status: host server + Cloudflare quick tunnel so remote runners join over the open internet (no tailscale); state file for approve/status. - internal/embed/skills/research-program: SKILL.md runbook + scripts/worker.py (stdlib runner: device-login → run real karpathy/autoresearch train.py → post val_bpb).

…rk1+spark2 Live smoke PASSED: owner published nanogpt-valbpb on this machine over a real Cloudflare tunnel (no tailscale); spark1 + spark2 (NVIDIA GB10) joined the private group and ran real karpathy/autoresearch nanoGPT training, submitting val_bpb over the open internet: Champion: spark2 val_bpb=2.110563 #1 spark1 2.192544 impact 0.807456 ; #2 spark2 2.110563 impact 0.081981 by-impact payout: spark1 90.78 OBOL, spark2 9.22 OBOL Real-world train.py/config adaptations the worker applies (autoresearch's own loop is 'agent edits train.py'), each a diagnosed hardware fit: - FlashAttention-3 → torch SDPA (FA3 has no GB10 sm_121 kernel image) - eager, not torch.compile (GB10 is cuda-cap 12.1 > torch max 12.0; Triton/ ptxas can't assemble inductor kernels) - shrink EVAL_TOKENS (eager eval over 21M tokens was the hang) - shrink model + DEVICE_BATCH_SIZE 128→8 (OOM on memory-pressured GPUs) - TOTAL_BATCH_SIZE 2**19→2**14 so grad_accum=1 (~35s/step → ~1s/step)

bussyjd added 3 commits June 14, 2026 09:54

bussyjd force-pushed the feat/decentralized-auto-research branch from 6355cc6 to c6c2be2 Compare June 14, 2026 05:56

bussyjd mentioned this pull request Jun 14, 2026

feat(dataset): decentralized fine-tuning — dataset subsystem (P2–P6) #641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639

feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639
bussyjd wants to merge 3 commits into
mainfrom
feat/decentralized-auto-research

bussyjd commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bussyjd commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Decentralized auto-research subsystem

What's in it

Design

Live validation (real hardware, no tailscale)

Composability / blast radius

Tests

Security invariants honored

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bussyjd commented Jun 13, 2026 •

edited

Loading