feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639
Open
bussyjd wants to merge 3 commits into
Open
feat(research): decentralized auto-research — KB, groupauth membership, obol research CLI (GB10-validated)#639bussyjd wants to merge 3 commits into
bussyjd wants to merge 3 commits into
Conversation
…embership layer Grounds a ServiceBounty-adjacent 'decentralized auto-research' narrative in the AutoScientists framework (karpathy/autoresearch): publish a research ID, workers opt in, build a PRIVATE collective knowledge base around favourable hypotheses, rewards distributed on accepted KEEPs — with membership coordinated at the OAuth level. - internal/research/groupauth/: RFC 8628 device-authorization flow adapted dependency-free from the d-inference coordinator. The owner's Approve() is the membership decision that keeps the KB private to the group; issues one-time SHA-256-hashed member tokens; VerifyToken gates the KB. Full code→approve→token→verify + expiry/revoke tests. - plans/decentralized-auto-research.md: thin declarative schema (ResearchProgram = TASK.md frontmatter: metric/direction/accept; reuses EvaluatorEnrollment, evaluator market, x402-escrow) + the private (never-tunnelled) collective-KB service. Schemas kept minimal and true to autoresearch; LAUNCH.md-style operational hooks stay off-chain.
… research CLI, worker Implements the decentralized auto-research design (plans/decentralized-auto-research.md): - internal/research/kb: collective KB + acceptance (autoresearch KEEP) + impact + by-impact/champion-takes-all payout. First-verified-wins. Tested. - internal/research/server: owner-hosted HTTP — device-auth (groupauth) endpoints + membership-gated KB (/task,/champion,/results,/status). Owner-or-member gate. Tested. - cmd/obol research publish|approve|status: host server + Cloudflare quick tunnel so remote runners join over the open internet (no tailscale); state file for approve/status. - internal/embed/skills/research-program: SKILL.md runbook + scripts/worker.py (stdlib runner: device-login → run real karpathy/autoresearch train.py → post val_bpb).
…rk1+spark2 Live smoke PASSED: owner published nanogpt-valbpb on this machine over a real Cloudflare tunnel (no tailscale); spark1 + spark2 (NVIDIA GB10) joined the private group and ran real karpathy/autoresearch nanoGPT training, submitting val_bpb over the open internet: Champion: spark2 val_bpb=2.110563 #1 spark1 2.192544 impact 0.807456 ; #2 spark2 2.110563 impact 0.081981 by-impact payout: spark1 90.78 OBOL, spark2 9.22 OBOL Real-world train.py/config adaptations the worker applies (autoresearch's own loop is 'agent edits train.py'), each a diagnosed hardware fit: - FlashAttention-3 → torch SDPA (FA3 has no GB10 sm_121 kernel image) - eager, not torch.compile (GB10 is cuda-cap 12.1 > torch max 12.0; Triton/ ptxas can't assemble inductor kernels) - shrink EVAL_TOKENS (eager eval over 21M tokens was the hang) - shrink model + DEVICE_BATCH_SIZE 128→8 (OOM on memory-pressured GPUs) - TOTAL_BATCH_SIZE 2**19→2**14 so grad_accum=1 (~35s/step → ~1s/step)
6355cc6 to
c6c2be2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Decentralized auto-research subsystem
Publish a research program (an objective + an acceptance criterion), let workers on other obol-stacks opt in over the open internet, have them contribute results to a private collective knowledge base, and distribute rewards proportional to validated improvement — all composed from layers the stack already has (escrow, tunnels, the controller-never-signs invariant), with one small new self-contained package.
This was built and validated live on real hardware (two NVIDIA GB10 DGX Sparks as remote workers, owner published from a laptop over a real Cloudflare tunnel, no tailscale) running a real
nanoGPT-styleval_bpb-minimization training workload end-to-end.What's in it
internal/research/groupauth/Approve(groupID, userCode)is the membership decision that keeps the KB private. One-time SHA-256-hashed member tokens (obol-research-mt-…),VerifyToken/Revoke.internal/research/kb/impact = Δmetric(directional), champion promotion, first-verified-wins de-dup, by-impact proportional payouts.internal/research/server/member()/ownerOnly()middleware. Open-membership programs auto-approve.cmd/obol/research.goobol research publish | approve | status.publishstands up the host server + a Cloudflare quick tunnel and writes program state to$CONFIG/research/<name>.json.internal/embed/skills/research-program/SKILL.md+scripts/worker.py— the off-chain runbook + a stdlib-only worker that joins a program, runs the (pluggable) train script, and submits its metric.Design
{objective, criteria{metric, direction, accept, threshold}, baseline, pool, split}.metricis an arbitrary string, so any domain lands without a schema change — true to the open auto-research pattern (a metric to minimize/maximize + a KEEP rule)./services/*and x402 ForwardAuth, credential swapped payment→member token). It is never hostname-restricted-internal and never open — it sits exactly in the gated tier the tunnel-security rules allow.split: by-impact). Settlement reuses the existing non-custodial escrow leg; the controller never signs and holds no keys.Live validation (real hardware, no tailscale)
spark1+spark2(NVIDIA GB10 DGX Spark) joined an open-membership program over the open internet and ran the real training workload.spark2(val_bpb2.110563);spark12.192544. By-impact split → spark1 90.78 / spark2 9.22 OBOL.torch.compile/Triton can't targetsm_121; batch/eval-token sizing) — i.e. "the agent edits the train script" is the literal workflow.Composability / blast radius
internal/research/*+internal/config— zero dependency on the in-flight escrow/bounty PRs, so this reviews and merges independently.cmd/obol/main.go: one line registersresearchCommand(cfg).plans/(gitignored) and are intentionally not part of this PR — code only.Tests
go build ./...clean.go test ./internal/research/...green (groupauth,kb,server).Security invariants honored