Skip to content

pablo-chacon/RadBot

Repository files navigation

RadBot

Private, compounding R&D reasoning engine. Runs entirely on your hardware. No cloud. No external memory. No vendor dependencies.

Beta License Python

Read the Manifesto

RadBot is a local AI agent architecture for serious research and development work. It combines a local LLM with a network of MCP containers, a three-lane censorship-resistant search system, a compounding cross-domain knowledge base, and self-improving retrieval — packaged for single-command deployment on consumer hardware.

Beta notice: RadBot v0.1.0-beta is functional and architecturally stable. Cross-domain R&D validation is ongoing. Report issues via GitHub Issues.


Why RadBot

R&D capability has historically required institutional infrastructure, teams, and budgets. RadBot removes those constraints.

The architecture is efficient by design — not by compromise:

  • The model is air-gapped. It never touches the internet directly
  • Three search lanes find what surface search filters out
  • The knowledge base compounds across sessions and projects
  • Cross-domain fusion detects emergent conclusions at domain intersections
  • Background workers continuously improve retrieval without manual curation
  • Voluntary BTFS sharing lets researchers distribute findings without gatekeepers

Quick Start

git clone https://github.com/pablo-chacon/RadBot
cd RadBot

# 1. Configure environment
cp .env.example .env
# Edit .env — at minimum set:
#   POSTGRES_PASSWORD  (must change from default)
#   SEARXNG_SECRET_KEY (run: openssl rand -hex 32)

# 2. Deploy
./deploy.sh

Open http://localhost:7860

The deploy script runs scripts/check_env.sh automatically and will refuse to start if unsafe defaults are detected.

CPU-only:

./deploy.sh --cpu

Teardown:

./deploy.sh --down    # stop containers, keep data
./deploy.sh --reset   # stop containers, destroy all data

Architecture

System Overview

╔══════════════════════════════════════════════════════════════════════════╗
║                            HOST MACHINE                                  ║
║                                                                          ║
║  ┌───────────────────────────────────────────────────────────────────┐   ║
║  │                     radbot-internal network                        │   ║
║  │                    (no external routing)                           │   ║
║  │                                                                    │   ║
║  │  ┌──────────┐   ┌──────────────────────────────────────────────┐  │   ║
║  │  │  Ollama  │◄──│                Orchestrator                  │  │   ║
║  │  │  :11434  │   │            (model + UI  :7860)               │  │   ║
║  │  └──────────┘   └────────────────────┬─────────────────────────┘  │   ║
║  │                                      │                             │   ║
║  │            ┌─────────────────────────┼──────────────────────┐     │   ║
║  │            │                         │                      │     │   ║
║  │      ┌─────▼──────┐         ┌────────▼──────┐    ┌─────────▼──┐  │   ║
║  │      │ mcp-memory │         │  Three-Lane   │    │ Synthesis  │  │   ║
║  │      │ (DB layer) │         │    Search     │    │            │  │   ║
║  │      └─────┬──────┘         │ surface+tor   │    └────────────┘  │   ║
║  │            │                │ +ooni         │                    │   ║
║  │      ┌─────▼──────┐         └───────┬───────┘                    │   ║
║  │      │ PostgreSQL │                 │                             │   ║
║  │      │ + pgvector │    ┌────────────▼────────────────────────┐   │   ║
║  │      ├────────────┤    │         Processing Lanes             │   │   ║
║  │      │  KMeans    │    │  mcp-fetch  mcp-text  mcp-struct    │   │   ║
║  │      │  XGBoost   │    │  mcp-geo    mcp-code                │   │   ║
║  │      └────────────┘    └─────────────────────────────────────┘   │   ║
║  └──────────────────────────────────────────────┬────────────────────┘   ║
║                                                 │                        ║
║  ┌──────────────────────────────────────────────▼────────────────────┐   ║
║  │                      radbot-egress network                         │   ║
║  │    searxng   mcp-search   mcp-fetch   mcp-tor   mcp-ooni           │   ║
║  │    mcp-btfs  tor (sidecar)                                         │   ║
║  └──────────────────────────────────────────────┬────────────────────┘   ║
╚═════════════════════════════════════════════════╪════════════════════════╝
                                                  │
                                               Internet

Three-Lane Search

RadBot searches from three distinct vantage points simultaneously:

mcp-search     Surface web via SearXNG (aggregates Google, Bing,
               DDG, Brave, arXiv, Semantic Scholar simultaneously)
               + DuckDuckGo fallback

mcp-tor        Full internet via Tor network — geo-neutral,
               uncensored, supports .onion addresses

mcp-ooni       OONI censorship measurement — detects what is
               being filtered, where, and by whom

               delta_score = surface vs Tor divergence
               high delta = suppression signal on that topic

Results from all three lanes feed synthesis. XGBoost learns which lanes produce reliable signal for which query domains over time.

Data Flow

User prompt
      │
      ▼
Orchestrator (model)
      │
      ├── loads project conclusions + principles from mcp-memory
      ├── determines relevant processing lanes
      │
      ▼
Parallel dispatch
  ┌───────────┬───────────┬───────────┬──────────┐
  │ mcp-search│  mcp-tor  │ mcp-ooni  │mcp-fetch │ ...
  └─────┬─────┴─────┬─────┴─────┬─────┴────┬─────┘
        │           │           │          │
        └───────────┴─────┬─────┴──────────┘
                          │
                          ▼
                      Synthesis
                 lane agreement scores
                 delta / suppression signal
                 combined confidence
                 conflict flagging
                          │
                          ▼
                    Orchestrator
                 resolves conflicts
                 derives conclusions
                 stores via mcp-memory
                          │
                          ▼
                   Response to user

Knowledge Compounding Loop

Session
  │
  ├── retrieves conclusions (project-scoped, XGBoost-weighted)
  ├── retrieves principles (cross-project, injected into prompt)
  │
  ▼
Model reasons with grounded context
  │
  ▼
New conclusions stored
  │
  ├── KMeans clusters conclusions (auto-K, silhouette-selected)
  │   detects cross-domain clusters → fusion candidates
  │
  ├── XGBoost reranker retrained on chunk usage feedback
  │   learns project-specific retrieval patterns
  │
  └── principle candidates checked
        if confirmed across N projects → promoted to principles
        principles injected into system prompt next session

Cross-Domain Fusion

Project: biology        Project: robotics
conclusions             conclusions
      │                       │
      └──────────┬────────────┘
                 │
         fuse_projects("BioMechanics")
                 │
         queries BOTH knowledge bases
         simultaneously
                 │
         detects emergent conclusions —
         findings that exist at neither
         parent domain alone
         emergent = TRUE
                 │
         strongest principle candidates
         in the system

The database is the intelligence. Containers are generic processors. No new research domain requires new containers — it requires a new project and sessions that compound knowledge into the existing schema.

Network Isolation

radbot-internal (internal: true — zero external routing)
  postgres        ollama          mcp-memory
  mcp-text        mcp-struct      mcp-geo
  mcp-code        synthesis       kmeans
  xgboost         orchestrator

radbot-egress (external routing)
  searxng         mcp-search      mcp-fetch
  mcp-tor         mcp-ooni        mcp-btfs
  tor (sidecar)

The model is permanently air-gapped. All external access is containerized, logged, and auditable.


Knowledge Layer

Schema Overview

projects ──► project_domains     (biology, robotics, chemistry...)
  │       └► project_relationships + fusion_conclusions
  │
  └── sessions
        ├── pages
        │     └── chunks          (embedded, flagged, XGBoost-reranked)
        ├── conclusions            (confidence-rated, supersedable)
        ├── query_lanes            (synthesis audit trail)
        └── search_lane_results   (surface/tor/ooni per query)

principle_candidates  ←  promoted when confirmed across N projects
principles            ←  injected into system prompt at session start
knowledge_clusters    ←  KMeans output, auto-updated
conclusion_weights    ←  XGBoost scores, read at query time
btfs_packages         ←  published / imported voluntary sharing
btfs_conclusions      ←  external cited conclusions, provenance preserved

Confidence Levels

Level Meaning
speculative Logical but no evidence yet
plausible Consistent with known evidence, not yet tested
supported Evidence exists, causal chain is coherent
verified Tested, reproducible, chain confirmed

Principle Promotion

Conclusion confirmed in Project A  ─┐
Conclusion confirmed in Project B  ─┼─► principle_candidates
Conclusion confirmed in Project C  ─┘        │
                                    threshold met?
                                              │
                                              ▼
                                       principles table
                                              │
                                              ▼
                                    injected into system prompt
                                    of every future project

BTFS Voluntary Knowledge Sharing

RadBot is private by default. Nothing leaves your machine unless you choose.

When a researcher decides their findings are worth sharing, mcp-btfs packages conclusions and principles and seeds them to the BitTorrent DHT network under domain topic keys:

publish_findings(project_id, domains=["biology", "biochemistry"])
  → packages conclusions + verified principles only
  → signs with content hash for integrity verification
  → announces under radbot:biology on DHT
  → returns content_hash for reference

discover_packages(domains=["biology"])
  → searches DHT for published packages
  → returns metadata, does not import

import_package(content_hash, trust_level="reference")
  → fetches package, verifies integrity
  → imports as cited external source
  → never merged as own conclusions
  → provenance permanently marked

Raw data, session logs, and embeddings are never shared. The distributed network has no central server — packages survive as long as anyone seeds them.


MCP Container Pattern

Every container follows the same contract. Adding a processing lane requires no understanding of the full system.

class YourServer(RadBotMCPServer):
    def _register_tools(self):
        return [{
            "name":        "your_tool",
            "description": "What this does. Model reads this.",
            "parameters":  {"query": "string"},
            "handler":     self.your_tool,
        }]

    def your_tool(self, query, context=None):
        return do_your_thing(query)

if __name__ == "__main__":
    YourServer().serve()

The base class handles logging, error wrapping, tool dispatch, health endpoint, and FastAPI serving. See docs/adding-mcp-container.md.


Deployment

Single Command

./deploy.sh              # GPU mode, full stack
./deploy.sh --cpu        # CPU-only
./deploy.sh --down       # stop, keep data
./deploy.sh --reset      # stop, destroy data

Hardware Requirements

Tier GPU VRAM System RAM Storage Model Response time
Minimum 8GB 16GB 256GB SSD Qwen3-14B Q4 30-60s
Recommended 16-24GB 32GB 512GB NVMe Qwen3-30B-A3B Q4 8-20s
Over-recommended 24GB+ 64GB 1TB NVMe Qwen3-30B Q8 4-10s
CPU-only 64GB 512GB NVMe Qwen3-14B Q4 2-5 min

The recommended tier runs Qwen3-30B-A3B — a Mixture of Experts model activating ~3B parameters per token while reasoning at 30B depth. Efficient by model architecture on top of an efficient system architecture.

Deployment Targets

RadBot runs anywhere Docker runs. Same deploy.sh everywhere.

Local machine          fully private, zero cost
Home server            always-on, LAN accessible
VPS (Hetzner, etc.)    accessible anywhere, your instance
On-prem server         team deployment, shared knowledge base

Remote Access

Tailscale (recommended)    zero-config private network, no open ports
Nginx + Caddy              reverse proxy with automatic TLS
Direct bind                0.0.0.0:7860 on trusted network

Editor Integration

Ollama exposes a full OpenAI-compatible API at :11434/v1. Any editor accepting a custom OpenAI endpoint works:

{ "openaiBaseUrl": "http://localhost:11434/v1" }

Continue.dev is the recommended choice over Copilot — Ollama-native, nothing routed externally.


Configuration

All behavior is controlled via .env:

# Model
OLLAMA_MODEL=qwen3:30b-a3b

# Database
POSTGRES_PASSWORD=your_password_here        # required — change default

# Search
SEARXNG_SECRET_KEY=                         # required — openssl rand -hex 32

# Knowledge compounding
PRINCIPLE_MIN_PROJECTS=2                    # projects to confirm a principle
RERANKER_COLD_START_SESSIONS=5             # sessions before XGBoost activates

# Background workers
KMEANS_POLL_INTERVAL=60                    # seconds between cluster checks
KMEANS_TRIGGER_THRESHOLD=10               # new conclusions before re-cluster
XGBOOST_POLL_INTERVAL=120                 # seconds between reranker checks

# Censorship measurement
OONI_PROBE_ENABLED=false                   # set true to run active probes
OONI_PROBE_CONTRIBUTE=false               # set true to contribute to dataset

# BTFS voluntary sharing
BTFS_ANNOUNCE=true                         # announce packages on DHT

Project Structure

RadBot/
├── .env.example
├── .gitignore
├── deploy.sh
├── docker-compose.yml
├── README.md
├── MANIFESTO.md
├── CHANGELOG.md
├── LICENSE.md
│
├── containers/
│   ├── shared/              ← base class, DB, models, exceptions
│   ├── orchestrator/        ← UI + agent loop (port 7860)
│   ├── mcp-memory/          ← sole database access layer
│   ├── mcp-tor/             ← uncensored search + fetch via Tor
│   ├── mcp-ooni/            ← censorship measurement
│   ├── mcp-btfs/            ← voluntary knowledge sharing
│   ├── tor/                 ← Tor daemon sidecar
│   ├── searxng/             ← self-hosted meta-search
│   ├── synthesis/           ← parallel lane aggregation
│   ├── kmeans/              ← background clustering worker
│   ├── xgboost/             ← background reranker worker
│   └── processing/
│       ├── mcp-search/      ← surface search (SearXNG + DDG)
│       ├── mcp-fetch/       ← page retrieval + transparent processing
│       ├── mcp-text/        ← NLP lane
│       ├── mcp-struct/      ← structured data lane
│       ├── mcp-geo/         ← geospatial lane
│       └── mcp-code/        ← code analysis lane
│
├── db/
│   └── init/                ← schema, indexes, seed (auto-runs on boot)
│
├── docs/
│   ├── architecture.md
│   ├── MANIFESTO.md
│   └── adding-mcp-container.md
│
├── scripts/
│   └── check_env.sh
│
├── utility/
│   ├── healthcheck.sh
│   └── teardown.sh
│
└── volumes/                 ← gitignored, persisted data
    ├── postgres/
    ├── ollama/
    ├── workspace/
    ├── repos/
    └── btfs/

Design Principles

Privacy by topology — the model is air-gapped from the internet. External access is containerized and auditable. Privacy is enforced by network policy, not by trust.

Efficiency by architecture — parallel processing lanes, MoE model selection, hybrid BM25 + vector retrieval, XGBoost reranking. Lightweight not by cutting capability but by designing each layer to do exactly its job.

Compounding knowledge — conclusions build on conclusions. Patterns verified across projects become principles. Principles feed future reasoning. The system gets more capable the more it is used.

The database is the intelligence — containers are generic processors. Domain specialization lives entirely in the data. No new research domain requires new infrastructure.

Censorship resistance — three search lanes with suppression delta measurement. High surface-vs-Tor divergence is itself a research signal. Knowledge that exists is reachable.

Voluntary distribution — private by default, shareable by choice. BTFS packages contain conclusions and principles only. Provenance is always preserved. No central server to take down.

Extensible by design — every MCP container follows the same pattern. Adding a new processing lane is a template-following exercise, not custom engineering.


Legal Disclaimer

Read the full disclaimer before deploying RadBot.

RadBot is open-source software provided for lawful research, educational, and personal productivity purposes. Use is entirely at your own risk.

Key Points

No warranty. RadBot is provided "as is" without warranty of any kind. The author(s) make no representations regarding correctness, reliability, security, or fitness for any purpose.

AI outputs. RadBot interfaces with locally deployed AI models that may produce inaccurate, incomplete, biased, or otherwise unreliable outputs. No output constitutes professional advice of any kind including legal, medical, financial, scientific, or engineering advice.

Tor integration. Use of Tor may be restricted or prohibited in certain jurisdictions. The operator is solely responsible for determining whether Tor use is lawful in their location before enabling it.

OONI integration. Active OONI probing is disabled by default. When enabled, it transmits measurement data governed by OONI's own policies. The operator is responsible for compliance with applicable law before enabling.

BTFS distribution. Content published to the BitTorrent DHT network is outside the operator's control once distributed. The operator is solely responsible for the legality of any content they choose to publish.

Unrestricted model. The absence of content filtering is not authorization to produce harmful outputs. The operator is solely responsible for all use of AI capabilities.

Operator responsibility. The person deploying RadBot is solely and exclusively responsible for legal compliance in their jurisdiction, all data processed, all access granted, all outputs generated, and all consequences of use.

Limitation of liability. To the maximum extent permitted by applicable law, the author(s) accept no liability of any kind for any damages, losses, legal consequences, or regulatory actions arising from use of RadBot.

Indemnification. By using RadBot, the operator agrees to indemnify and hold harmless the author(s) from any claims, liabilities, or costs arising from their use.

Prohibited uses include but are not limited to: unlawful activities, unauthorized system access, CSAM, weapons of mass destruction development, targeted harassment, and election manipulation.

The full disclaimer in DISCLAIMER.md governs all use. By deploying RadBot you acknowledge having read it and accept sole responsibility for your use.


License

MIT

About

Private local R&D agent with compounding knowledge base, cross-domain fusion, and self-healing retrieval. Runs on consumer hardware. No cloud. No telemetry.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors