Skip to content

feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639

Open
bussyjd wants to merge 3 commits into
mainfrom
feat/decentralized-auto-research
Open

feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639
bussyjd wants to merge 3 commits into
mainfrom
feat/decentralized-auto-research

Conversation

@bussyjd

@bussyjd bussyjd commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Decentralized auto-research subsystem

Publish a research program (an objective + an acceptance criterion), let workers on other obol-stacks opt in over the open internet, have them contribute results to a private collective knowledge base, and distribute rewards proportional to validated improvement — all composed from layers the stack already has (escrow, tunnels, the controller-never-signs invariant), with one small new self-contained package.

This was built and validated live on real hardware (two NVIDIA GB10 DGX Sparks as remote workers, owner published from a laptop over a real Cloudflare tunnel, no tailscale) running a real nanoGPT-style val_bpb-minimization training workload end-to-end.

What's in it

Area Path Role
Membership internal/research/groupauth/ RFC 8628 device-authorization flow, re-implemented dependency-free (stdlib, in-memory). Owner Approve(groupID, userCode) is the membership decision that keeps the KB private. One-time SHA-256-hashed member tokens (obol-research-mt-…), VerifyToken / Revoke.
Knowledge base internal/research/kb/ Append-only result log. KEEP-if-beats-champion acceptance, impact = Δmetric (directional), champion promotion, first-verified-wins de-dup, by-impact proportional payouts.
Server internal/research/server/ HTTP surface: device-auth endpoints + KB endpoints, member() / ownerOnly() middleware. Open-membership programs auto-approve.
CLI cmd/obol/research.go obol research publish | approve | status. publish stands up the host server + a Cloudflare quick tunnel and writes program state to $CONFIG/research/<name>.json.
Skill internal/embed/skills/research-program/ SKILL.md + scripts/worker.py — the off-chain runbook + a stdlib-only worker that joins a program, runs the (pluggable) train script, and submits its metric.

Design

  • Declarative & domain-agnostic. A program is {objective, criteria{metric, direction, accept, threshold}, baseline, pool, split}. metric is an arbitrary string, so any domain lands without a schema change — true to the open auto-research pattern (a metric to minimize/maximize + a KEEP rule).
  • Decentralization is in the workers, not the store. One owner-hosted authoritative KB; workers are anywhere. Cross-stack workers authenticate via the public device-auth endpoints over the owner's tunnel.
  • Private by membership, never by obscurity. The KB rides a public-but-membership-gated route class (the same shape as /services/* and x402 ForwardAuth, credential swapped payment→member token). It is never hostname-restricted-internal and never open — it sits exactly in the gated tier the tunnel-security rules allow.
  • Rewards. First-verified-wins de-dup + payout proportional to validated impact (split: by-impact). Settlement reuses the existing non-custodial escrow leg; the controller never signs and holds no keys.

Live validation (real hardware, no tailscale)

  • Owner published from this machine via a real Cloudflare quick tunnel.
  • Workers spark1 + spark2 (NVIDIA GB10 DGX Spark) joined an open-membership program over the open internet and ran the real training workload.
  • Result: champion spark2 (val_bpb 2.110563); spark1 2.192544. By-impact split → spark1 90.78 / spark2 9.22 OBOL.
  • The worker doubles as the record of real-world GB10 adaptations (FlashAttention-3 → SDPA; eager execution where torch.compile/Triton can't target sm_121; batch/eval-token sizing) — i.e. "the agent edits the train script" is the literal workflow.

Composability / blast radius

  • Self-contained. The Go packages import only internal/research/* + internal/configzero dependency on the in-flight escrow/bounty PRs, so this reviews and merges independently.
  • cmd/obol/main.go: one line registers researchCommand(cfg).
  • Design notes + the peer report/pitch live in plans/ (gitignored) and are intentionally not part of this PR — code only.

Tests

  • go build ./... clean.
  • go test ./internal/research/... green (groupauth, kb, server).

Security invariants honored

  • KB/membership endpoints live only in the gated route class — never exposed to the tunnel beyond membership gating, never added to internal hostname-restricted routes.
  • Controller/owner server never signs or holds payout keys; settlement is on-chain canonical via the existing escrow leg.

bussyjd added 3 commits June 14, 2026 09:54
…embership layer

Grounds a ServiceBounty-adjacent 'decentralized auto-research' narrative in
the AutoScientists framework (karpathy/autoresearch): publish a research ID,
workers opt in, build a PRIVATE collective knowledge base around favourable
hypotheses, rewards distributed on accepted KEEPs — with membership
coordinated at the OAuth level.

- internal/research/groupauth/: RFC 8628 device-authorization flow adapted
  dependency-free from the d-inference coordinator. The owner's Approve() is
  the membership decision that keeps the KB private to the group; issues
  one-time SHA-256-hashed member tokens; VerifyToken gates the KB. Full
  code→approve→token→verify + expiry/revoke tests.
- plans/decentralized-auto-research.md: thin declarative schema
  (ResearchProgram = TASK.md frontmatter: metric/direction/accept; reuses
  EvaluatorEnrollment, evaluator market, x402-escrow) + the private
  (never-tunnelled) collective-KB service. Schemas kept minimal and true to
  autoresearch; LAUNCH.md-style operational hooks stay off-chain.
… research CLI, worker

Implements the decentralized auto-research design (plans/decentralized-auto-research.md):
- internal/research/kb: collective KB + acceptance (autoresearch KEEP) + impact +
  by-impact/champion-takes-all payout. First-verified-wins. Tested.
- internal/research/server: owner-hosted HTTP — device-auth (groupauth) endpoints +
  membership-gated KB (/task,/champion,/results,/status). Owner-or-member gate. Tested.
- cmd/obol research publish|approve|status: host server + Cloudflare quick tunnel so
  remote runners join over the open internet (no tailscale); state file for approve/status.
- internal/embed/skills/research-program: SKILL.md runbook + scripts/worker.py
  (stdlib runner: device-login → run real karpathy/autoresearch train.py → post val_bpb).
…rk1+spark2

Live smoke PASSED: owner published nanogpt-valbpb on this machine over a real
Cloudflare tunnel (no tailscale); spark1 + spark2 (NVIDIA GB10) joined the
private group and ran real karpathy/autoresearch nanoGPT training, submitting
val_bpb over the open internet:
  Champion: spark2 val_bpb=2.110563
  #1 spark1 2.192544 impact 0.807456 ; #2 spark2 2.110563 impact 0.081981
  by-impact payout: spark1 90.78 OBOL, spark2 9.22 OBOL

Real-world train.py/config adaptations the worker applies (autoresearch's
own loop is 'agent edits train.py'), each a diagnosed hardware fit:
  - FlashAttention-3 → torch SDPA (FA3 has no GB10 sm_121 kernel image)
  - eager, not torch.compile (GB10 is cuda-cap 12.1 > torch max 12.0; Triton/
    ptxas can't assemble inductor kernels)
  - shrink EVAL_TOKENS (eager eval over 21M tokens was the hang)
  - shrink model + DEVICE_BATCH_SIZE 128→8 (OOM on memory-pressured GPUs)
  - TOTAL_BATCH_SIZE 2**19→2**14 so grad_accum=1 (~35s/step → ~1s/step)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant