Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
92429a8
research(connectome): initial 8-doc deep-dive on RuVector as embodied…
Apr 22, 2026
757f4fa
feat(examples): connectome-fly SOTA example + ADR-154
Apr 22, 2026
7a83adf
feat(examples): connectome-fly SOTA closure — SIMD + GPU + AC-3 split…
Apr 22, 2026
b8373a9
docs(connectome-fly): align ADR-154 + README with shipped state
Apr 22, 2026
bd26c4e
bench(connectome-fly): measure SIMD saturated-regime speedup — 1.013×…
Apr 22, 2026
b805d71
feat(connectome-fly): sparse-Fiedler observer path for N > 1024
Apr 22, 2026
cf21327
feat(connectome-fly): FlyWire v783 ingest module + fixture tests
Apr 22, 2026
a3cca1c
feat(connectome-fly): Opt D — delay-sorted CSR for saturated-regime s…
Apr 22, 2026
c0f5696
merge: feat/connectome-flywire-ingest — FlyWire v783 ingest module
Apr 22, 2026
1b3f034
merge: feat/observer-sparse-fiedler — sparse Fiedler detector for N >…
Apr 22, 2026
2f24252
merge: feat/lif-delay-sorted-csr — Opt D delay-sorted CSR (1.0× satur…
Apr 22, 2026
98273a2
docs(connectome-fly): consolidate 3-agent swarm — FlyWire + sparse-Fi…
Apr 22, 2026
3a6b70d
bench(connectome-fly): measured — sparse-Fiedler threshold drop is a …
Apr 22, 2026
3c2377f
feat(observer): adaptive detect cadence — first ≥ 2× saturated-regime…
Apr 22, 2026
316f59f
feat(connectome-fly): streaming FlyWire loader + degree-stratified nu…
Apr 22, 2026
04cb48e
feat(observer): incremental Fiedler accumulator — ADR-154 §16 lever 3
Apr 22, 2026
af12679
feat(observer): Lanczos-with-full-reorthog for sparse Fiedler at path…
Apr 22, 2026
d369e7a
feat(connectome-fly): DiskANN/Vamana motif index — AC-2 target
Apr 22, 2026
15ffe86
merge: feat/observer-incremental-fiedler — incremental Fiedler accumu…
Apr 22, 2026
fd39d10
merge: feat/observer-lanczos-fiedler — Lanczos sparse Fiedler for pat…
Apr 22, 2026
fe059b8
merge: feat/analysis-diskann-motif — Vamana motif index for AC-2
Apr 22, 2026
994d61f
Revert "merge: feat/observer-lanczos-fiedler — Lanczos sparse Fiedler…
Apr 22, 2026
a7a5ee5
Revert "merge: feat/analysis-diskann-motif — Vamana motif index for A…
Apr 22, 2026
c8759bd
Revert "merge: feat/observer-incremental-fiedler — incremental Fiedle…
Apr 22, 2026
247adef
docs(adr-154): §13 follow-up roll-up + §17 nine-discovery table
Apr 22, 2026
70794b6
docs(adr-154): 10th measurement-driven discovery — SDPA encoder is pr…
Apr 22, 2026
7000311
feat(analysis): multi-level Louvain baseline — 11th discovery (over-m…
Apr 22, 2026
1874d01
docs: name the project — Connectome OS
Apr 22, 2026
d06e80f
feat(analysis): rate-histogram motif encoder + A/B vs SDPA — ADR-154 …
Apr 22, 2026
31d3d71
merge: feat/analysis-rate-encoder — 12th discovery: encoder axis rule…
Apr 22, 2026
02ebdd1
docs(adr-154): §17 discovery #12 — encoder axis empirically ruled out…
Apr 22, 2026
0430231
feat(analysis): raster-regime labels test — 13th discovery, labels ax…
Apr 22, 2026
8f59197
feat(analysis): Leiden refinement phase — ADR-154 §13 Leiden-pairing
Apr 22, 2026
07cbb8d
feat(connectome-fly): Part-3 exotic-application scaffolding — embodim…
Apr 22, 2026
f58f0c9
merge: feat/analysis-leiden — Leiden refinement phase delivers perfec…
Apr 22, 2026
7d949ed
feat(lif): canonical in-bucket ordering + cross-path determinism enve…
Apr 22, 2026
3571ed8
bench(connectome-fly): post-sort median = 1.67s (+6.4%) — honest reco…
Apr 23, 2026
753db36
perf(lif): lazy-skip length-1 buckets in drain_due — measured null at…
Apr 23, 2026
6cc6f79
docs(adr-154 §14): two new risk-register rows from this iteration's f…
Apr 23, 2026
17cdbcb
feat(analysis): CPM-Leiden first cut — null at scale, 16th discovery …
Apr 23, 2026
484427c
feat(leiden): weight-normalized CPM — ARI=1.000 planted SBM, 17th dis…
Apr 23, 2026
78df97b
test(leiden-cpm): full-partition ARI — CPM at γ=2 scores 0.393 vs 0.1…
Apr 23, 2026
1f085dc
test(leiden-cpm): fine-γ sweep refines peak to ARI=0.425 @ γ∈{2.25, 2.5}
Apr 23, 2026
cfdcb8b
test(ac-3a): wire full-partition ARI — greedy beats Leiden, discovery…
Apr 23, 2026
6cf5246
test(leiden-cpm): seed-sweep reproducibility — CPM wins 5/5 at mean 3…
Apr 23, 2026
d691643
feat(analysis): CPM N-scaling sweep — 22nd discovery, 4× headline is …
Apr 23, 2026
4171706
feat(analysis): per-scale γ peak sweep — 23rd discovery, N=512 beats …
Apr 23, 2026
236f3e1
feat(analysis): smaller-N + fine-γ sweep — 24th discovery, new ceilin…
Apr 23, 2026
75b0ede
feat(analysis): CPM-specific refinement tested and ruled out — 25th d…
Apr 23, 2026
fe35308
feat(ui): Vite + three.js scaffold for Connectome OS demo UI
Apr 23, 2026
77dab54
fix(ui): THREE-not-defined + favicon 404
Apr 23, 2026
b9d3810
feat(analysis): N=512 module-count sweep — 26th discovery, 0.599 new …
Apr 23, 2026
5e32933
feat(ui): real Rust backend streaming live LIF data to browser (SSE)
Apr 23, 2026
b4d3ea4
feat(analysis): cross-scale constant-density sweep — 27th discovery
Apr 23, 2026
7e682ea
feat(analysis): hub-fraction + density sweeps at N=1024 — 28th & 29th…
Apr 23, 2026
b1e34fc
feat(analysis): fine 2-D grid at N=512 — 30th discovery, new best 0.671
Apr 23, 2026
6ec5679
feat(ui_server): FlyWire v783 TSV substrate + smoke-tested end-to-end
Apr 23, 2026
dd73067
feat(flywire): Princeton CSV loader + live 115k-neuron fly brain
Apr 23, 2026
011b181
feat(ui): VITE_BASE env var for GitHub Pages deploy
Apr 23, 2026
5f96915
feat(ui): welcome modal with intro, 3-card tutorial, and GitHub link
Apr 23, 2026
65f2908
fix(ui): welcome modal opens on every page load
Apr 23, 2026
cdcb2bb
feat(ui): mock simulator fallback when the Rust backend is offline
Apr 23, 2026
b04ea1f
feat(ui): dedicated fly simulation view (data-view="fly-sim")
Apr 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ members = [
"examples/real-eeg-analysis",
# Multi-seizure cross-patient analysis: all 7 chb01 seizures
"examples/real-eeg-multi-seizure",
# Connectome-driven embodied brain demonstrator (ADR-154)
"examples/connectome-fly",
]
resolver = "2"

Expand Down
520 changes: 520 additions & 0 deletions docs/adr/ADR-154-connectome-embodied-brain-example.md

Large diffs are not rendered by default.

215 changes: 215 additions & 0 deletions docs/research/connectome-ruvector/00-master-plan.md

Large diffs are not rendered by default.

229 changes: 229 additions & 0 deletions docs/research/connectome-ruvector/01-architecture.md

Large diffs are not rendered by default.

230 changes: 230 additions & 0 deletions docs/research/connectome-ruvector/02-connectome-layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
# 02 - Connectome Layer: FlyWire Ingest and Graph Schema

> Framing reminder: this is a graph-native embodied connectome runtime. No upload, no consciousness claims. See `./00-master-plan.md` §1 and `./07-positioning.md`.

## 1. Purpose

Specify the node and edge schema for a Drosophila whole-brain connectome persisted in `ruvector-graph`, the ingest pipeline from FlyWire's public release, and the cost/throughput envelope. Consumers: `./03-neural-dynamics.md` reads this schema to wire the LIF kernel; `./05-analysis-layer.md` reads it to mount mincut/sparsifier/coherence.

## 2. Source dataset: FlyWire

FlyWire is the community-proofread adult female Drosophila melanogaster brain connectome derived from serial-section electron microscopy. The v783 release (Dorkenwald et al., 2024, Nature; Matsliah et al., 2024, Nature) provides approximately 139,255 neurons and 54.5 million chemical synapses with predicted neurotransmitter identity for ~130M synaptic predictions (consolidated to per-edge aggregates). Key tables published:

- `neurons.csv` — per-neuron metadata (id, super-class, class, sub-class, cell-type, hemilineage, side, nerve, soma position).
- `connections.csv` — pre→post pairs with synapse count, neuropil, predicted neurotransmitter.
- `classification.csv` — cell-type assignments with community-voted labels.
- `meshes/` — per-neuron triangle meshes (optional for morphology hashing).
- `nt_predictions.csv` — per-synapse NT predictions (ACh, Glu, GABA, DA, 5-HT, OA, histamine).

The Janelia hemibrain (`v1.2.1`, Scheffer et al., 2020) covers roughly half the brain (~25K neurons) with higher manual proof-reading density. FlyWire is the primary source; hemibrain is kept as a cross-validation target (see `./06-prior-art.md` §Hemibrain).

The 2024 Nature whole-fly-brain LIF paper is the ground-truth proof that behavior — feeding, grooming, and sensorimotor transformations — can emerge from a FlyWire-scale LIF model with no trained parameters. Our schema must preserve every feature that paper depended on: cell-type, neurotransmitter, synapse count per edge, and neuropil labels.

## 3. Graph schema

We use `ruvector-graph` labeled property graph with typed nodes and edges. Schema is versioned (`schema_version = "connectome/2026.04"`) and stored in the graph root properties for replay.

### 3.1 Node: Neuron

```rust
pub struct Neuron {
pub id: NeuronId, // u64, stable FlyWire root_id
pub dataset: DatasetId, // FlyWire | Hemibrain | Custom
pub dataset_version: String, // e.g., "flywire-v783"
pub super_class: SuperClass, // Central | OpticLobe | Ascending | Descending | Motor | SensoryPeriph
pub class: Option<ClassId>, // e.g., "Kenyon cell"
pub sub_class: Option<String>,
pub cell_type: Option<CellTypeId>,
pub hemilineage: Option<String>,
pub side: Side, // Left | Right | Center | Bilateral
pub region: RegionId, // interned neuropil label (MB, EB, FB, LAL, ...)
pub soma_xyz: Option<[f32; 3]>,
pub neurotransmitter: NT, // ACh|Glu|GABA|DA|5-HT|OA|Hist|Unknown
pub nt_confidence: f32, // [0,1]
pub morphology_hash: Option<u64>, // LSH over skeleton or mesh
pub flags: NeuronFlags, // bitflags: Motor, Sensory, ProofEdited, Flagged, ...
}
```

`NeuronId` is 64-bit, globally unique across datasets via `(dataset, flywire_root_id)` pair mixed into a SipHash. The `flags` bitfield is the hinge for layers 2-4: `Motor`, `Sensory`, `VisualPR` (photoreceptor), `Chemosensory`, `Mechanosensory`, `Chordotonal`, etc. These flags are what `BodySim` and `DynamicsEngine` key off when routing sensory injection and motor readout.

Interning: `RegionId`, `CellTypeId`, `ClassId` are `u32` indices into intern tables stored as properties on the graph root. Keeps each `Neuron` under 120 bytes.

### 3.2 Edge: Synapse

```rust
pub struct Synapse {
pub pre: NeuronId,
pub post: NeuronId,
pub neuropil: RegionId,
pub nt: NT,
pub sign: i8, // +1 excitatory, -1 inhibitory, 0 unknown/graded
pub weight: f32, // initial effective weight (count * gain)
pub count: u32, // raw synapse count from FlyWire
pub delay_ms: f32, // estimated axonal + synaptic delay
pub confidence: f32, // [0,1]
pub weight_source: WeightSource, // Explicit | NtDefault | MorphologyEst
pub edge_flags: EdgeFlags, // Gap, Electrical, Recurrent, LongRange, ...
}
```

`sign` is derived from `nt`: ACh/Glu default to +1 in central brain circuits, GABA to -1, Glu in the optic lobe frequently +1 with known local exceptions. Where the sign is not safely inferable we set `sign = 0` and `weight_source = NtDefault`; the LIF kernel treats these as excitatory for the first pass and exposes the set for sensitivity analysis.

`delay_ms` is a hard problem. FlyWire does not publish conduction delays. We estimate as `delay_ms = base + k * soma_distance_microns` with `base ≈ 1.0 ms` and `k ≈ 0.003 ms/µm` (fly axonal conduction ~300 µm/ms), clamped to `[0.5, 20.0]`. Where neuron meshes are absent we fall back to `delay_ms = 2.0`. The field is explicit so it can be recalibrated per-region without schema change.

`EdgeFlags::Gap` marks electrical synapses (from gap-junction datasets where available; sparse in FlyWire but non-zero). `EdgeFlags::Recurrent` is set after a topological pass so layer 2 can optimize event handling for strongly connected components.

### 3.3 Hyperedges: motifs

`ruvector-graph::Hyperedge` captures discovered motifs (winner-take-all triplets, feedforward inhibition triads, reciprocal pairs). Populated by layer 4. Schema:

```rust
pub struct Motif {
pub kind: MotifKind, // WTA | FFI | Reciprocal | Custom(u32)
pub members: Vec<NeuronId>,
pub confidence: f32,
pub discovered_at: Time,
pub supporting_edges: Vec<EdgeId>,
}
```

Motifs are side-channels, not part of the runtime dynamics. They exist so analyses survive restarts and so `./05-analysis-layer.md` can index them in AgentDB.

### 3.4 Indexes

Required secondary indexes on the graph:

- `by_region: RegionId → Vec<NeuronId>` (scan by neuropil).
- `by_class: ClassId → Vec<NeuronId>`.
- `by_nt: NT → Vec<NeuronId>`.
- `motor_neurons: HashSet<NeuronId>` (flags bit test cached).
- `sensory_by_modality: Modality → Vec<NeuronId>`.
- `outgoing_csr: CSR<NeuronId, EdgeId>` (hot path for event dispatch in layer 2).
- `incoming_csr: CSR<NeuronId, EdgeId>` (for backward push / analysis).

`ruvector-graph::index` already supports property indexes; the CSR pair is a derived view materialized at ingest and refreshed on mutation.

## 4. Ingest pipeline

```
FlyWire release ──┐
(csv + meshes) │
┌──────────────────────┐
│ flywire-loader │ (Rust, streaming CSV, no Python)
│ · validate schema │
│ · intern region/type│
│ · predict sign/delay│
│ · hash morphology │
└──────────┬───────────┘
┌──────────────────────┐
│ graph_writer │ (ruvector-graph transactions)
│ · batched Node insert
│ · batched Edge insert (CSR-friendly order)
│ · build indexes
│ · materialize CSR
└──────────┬───────────┘
┌──────────────────────┐
│ agentdb_embedder │ (per-neuron vector)
│ · ONNX MiniLM L6 v2 │
│ · DiskANN index │
└──────────┬───────────┘
rvf on-disk snapshot (dataset_hash captured)
```

The loader is a new Rust binary under `crates/ruvector-connectome/src/bin/flywire-loader.rs`. It streams FlyWire CSVs through `csv` + `serde`, builds `Neuron` / `Synapse` records, looks up interned IDs, and emits batched transactions into `GraphDB`. Batch size is 10K edges per transaction to keep WAL writes amortized.

Neurotransmitter → sign mapping table:

| NT | Default sign | Notes |
|---|---|---|
| ACh | +1 | Typical fast excitation in Drosophila central brain |
| Glu | +1 in most central circuits; context-dependent in optic lobe | Flagged for per-region override |
| GABA | -1 | Fast inhibition |
| DA | 0 (neuromodulatory) | Weight propagates via slow pool, not fast LIF |
| 5-HT | 0 (neuromodulatory) | Same |
| OA | 0 (neuromodulatory) | Same |
| Histamine | -1 | Photoreceptor output |
| Unknown | 0 | `weight_source = NtDefault`, excitatory fallback for v1 |

Neuromodulators are *not* routed through the event-driven LIF dispatcher in v1; they are aggregated into slower per-region concentration fields (see `./03-neural-dynamics.md` §Neuromodulation).

### 4.1 Morphology hashing

`morphology_hash` is an optional 64-bit LSH fingerprint of the per-neuron mesh or skeleton, built with an adapted version of `ruvector-cnn`'s locality-sensitive hashing pipeline. The hash lets AgentDB answer "neurons morphologically similar to X" without re-running mesh comparison. For v1 we can skip meshes and derive the hash from the tuple `(cell_type, region, side, hemilineage)` — crude, but useful until proper mesh embeddings are available.

## 5. Scale and cost analysis

### 5.1 Raw record sizes

| Kind | Fields | Bytes/record (packed) |
|---|---|---|
| Neuron | id, flags, enums, soma, NT, morph hash | ~112 |
| Synapse | pre, post, neuropil, NT, sign, weight, count, delay, conf, flags | ~56 |
| Motif | kind, 4-8 members, confidence | ~128 |

### 5.2 Totals (v1 FlyWire v783)

- Neurons: 139,255 × 112 B ≈ **15.6 MB** raw.
- Synapses (consolidated to per-edge): ~50 M × 56 B ≈ **2.8 GB** raw.
- CSR indexes (outgoing + incoming): ~2 × (139K × 8 B + 50M × 12 B) ≈ **1.2 GB**.
- Embeddings: 139K × 384 × f32 ≈ **214 MB**; with INT8 DiskANN ≈ **53 MB**.
- Motif store: bounded; target <100 MB.

Total on-disk budget: **~5 GB** for a full replay bundle. That is trivially SSD-resident and fits on the pi-brain node class (ADR-150). RAM working set for a run: ~3-4 GB with CSR warm plus LIF state (see `./03-neural-dynamics.md` §Memory).

### 5.3 Ingest throughput

A single-threaded loader on a modern laptop CPU should hit ~150K edges/s in Rust streaming CSV mode (bounded by CSV parsing, not graph writes). At 50M edges: **~5-6 minutes** for the full connectome. GraphDB transaction batching in `ruvector-graph` can absorb this without WAL blowup; we set `batch_size = 10_000` and use `IsolationLevel::ReadCommitted` during bulk ingest to avoid holding a global lock.

Consistency check after ingest:

- `node_count == published_node_count` (exact).
- `edge_count` within ±1% of published (synapse consolidation varies).
- Every `cell_type` referenced in edges resolves to a `Neuron` (no dangling FKs).
- Every NT prediction has `confidence >= 0.0 && <= 1.0`.

### 5.4 Query envelope

| Query | Expected latency (warm cache) |
|---|---|
| `neuron(id)` | <5 µs |
| `outgoing(id)` via CSR | O(deg) unbounded; p99 ~20 µs for avg degree |
| `by_region(region)` | 0.5-2 ms for largest neuropils |
| `by_nt(nt)` | <1 ms |
| Motif lookup (indexed) | <1 ms |
| Vector neighbor "neurons morphologically similar" via DiskANN | <10 ms @ k=50 |
| Full adjacency snapshot for sparsifier rebuild | ~200 ms single-thread, ~50 ms with rayon |

These numbers are inside the budget the architecture doc (`./01-architecture.md`) sets for a 25 Hz control-rate closed-loop run.

## 6. Incremental updates

Proof-reading and new FlyWire releases will change edges. The loader supports delta mode:

- `--since v780 --until v783` ingests only new/changed edges.
- Each neuron and edge carries `source_version`; queries can filter by version.
- `ruvector-mincut` and `ruvector-sparsifier` both support dynamic insert/delete; a FlyWire delta triggers incremental updates rather than full rebuild.

## 7. Cross-dataset support

Hemibrain is the obvious cross-validation target. The schema already supports multi-dataset because `NeuronId` is `(dataset, flywire_root_id)`-hashed. Loader: `hemibrain-loader` mirrors `flywire-loader` over `neuPrint`-exported CSVs. OpenWorm's `C. elegans` connectome (302 neurons, ~7K synapses) trivially fits and is useful as a sanity test bed (one run completes in seconds).

## 8. Data governance

- FlyWire citation attached to every replay bundle manifest.
- No proprietary data. No re-distribution of FlyWire meshes beyond project-internal storage.
- `nt_confidence < 0.5` edges are flagged in `EdgeFlags::LowConfidenceNT` so analyses can exclude them.
- Loader emits a manifest: `{dataset_version, loader_version, ingest_utc, node_count, edge_count, schema_hash}` so every downstream run is traceable.

## 9. Open questions for Phase 1

1. Should dendritic compartments (reduced multi-compartment per neuron) be modeled here or pushed into layer 2 state? The schema supports it via synthetic child nodes but doubles node count. Recommendation: defer to v2; use `ruvector-nervous-system::dendrite` in layer 2 for coincidence detection without schema changes.
2. Should gap junctions be distinct hyperedges or regular edges with `EdgeFlags::Gap`? We pick flags for simpler ingest; revisit if electrical coupling is required for a target behavior.
3. Neuromodulatory edges — keep as synapses or route to a separate region-level diffusion field? We keep them as synapses with `sign = 0` and let layer 2 route them to the slow pool.
4. Morphology hash provider — pure `(cell_type, region, side)` crude, or real mesh embedding from `ruvector-cnn`? Start crude, upgrade in M2.

See `./03-neural-dynamics.md` for how the LIF kernel consumes this schema, and `./05-analysis-layer.md` for the analyses that depend on the CSR indexes specified here.
Loading
Loading