Skip to content

rotsl/CFAdv

CFAdv

License: Apache-2.0 Python 3.11+ Release Tests Coverage Token Reduction API Context Reduction Modes Visitors GitHub last commit GitHub repo size

CFAdv is a context compiler for LLMs. It ingests files of any format, scores and selects the most relevant content under a token budget, and assembles provider-ready packets for OpenAI, Anthropic, Ollama, and compatible APIs.

Built on context-fusion and extended with an attention fusion layer (AttnRes-inspired) that reorders selected context by query relevance, so the most useful content appears first in the prompt.

Features

  • Multiformat ingestion: text, PDF, DOCX, CSV, JSON, images (OCR), code, Markdown
  • Normalization: uniform ContextBlock objects with token counts, trust and freshness scores
  • Task-specific compact representations: QA, code, agent, and universal variants
  • Utility and risk scoring: relevance, trust, freshness, structure, diversity, hallucination proxy
  • Multi-objective planner: value density + token + latency + cacheability ranking
  • Attention-based fusion: query-dependent softmax weighting inspired by AttnRes (arxiv 2603.15031)
  • Two-level block attention: intra-block ranking + cross-block mean-pooled ordering
  • Canonical IR and delta fusion: ContextPacket, CacheSegment, incremental ContextDelta
  • Dedup and fingerprinting: exact + near-duplicate collapse with provenance retention
  • Multi-provider adapters: OpenAI, Anthropic, Ollama, and OpenAI-compatible APIs
  • Provider-aware compilation: chat, qa, code, agent packers with mode-aware system prompts
  • Cache-aware assembly: stable/dynamic segment split for reuse across repeated turns
  • MCP server: expose CFAdv tools and resources over MCP
  • Framework integrations: retriever wrappers for LangChain and LlamaIndex
  • Precompute pipeline: fingerprints, summaries, token stats, compact variants, features
  • Compression pipeline: JSON minify, schema prune, citation compaction
  • Ablation studies: identify which context blocks contribute most to outcomes
  • Memory management: persistent storage with compaction and retention policies
  • Web UI: local browser app to run and inspect pipeline outputs

Quick Start

Installation

pip install context-portfolio-optimizer

For development:

git clone https://github.com/rotsl/CFAdv.git
cd CFAdv
make install-dev

Copy .env.example to .env and fill in your API keys:

cp .env.example .env

Basic Usage

from context_portfolio_optimizer import PipelineRunner

runner = PipelineRunner()
result = runner.run(["document.pdf", "code.py"], budget=3000)

print(result["context"])  # Optimized, attention-ranked context string
print(result["stats"])    # Processing statistics

CLI Usage

# Ingest and display content
cpo ingest ./data

# Run full optimization pipeline
cpo run ./data --budget 3000 --query "Summarize architecture" --output context.txt \
  --provider openai --model gpt-5-mini --mode chat --profile openai_chat

# Plan context for a task
cpo plan "Summarize these documents" --budget 5000

# Compile provider-ready packet (qa/code/agent/chat modes)
cpo compile ./data \
  --task "Answer with citations" \
  --provider openai \
  --model gpt-5-mini \
  --mode qa \
  --budget 4000 \
  --compression light \
  --delta

# Precompute artifacts for latency reduction
cpo precompute ./data --store-dir .cpo_cache/precompute --semantic-dedup

# Run MCP-style server
cpo serve-mcp --host <host> --port 8765

# Inspect cache + precompute store
cpo inspect-cache

# Run ablation study
cpo ablate ./data --budget 3000

# Launch local visualization UI
cpo ui --host <host> --port 8080

Provider/client mapping:

  • ChatGPT / OpenAI API: --provider openai with OPENAI_API_KEY
  • Claude AI / Claude API: --provider anthropic with ANTHROPIC_API_KEY
  • Local models with Ollama: --provider ollama (no cloud key required)
  • OpenAI-compatible APIs (Grok, DeepSeek, etc.): --provider openai_compatible with OPENAI_COMPAT_BASE_URL
  • MCP clients: cpo serve-mcp --host <host> --port <port>

What Most Users Need

  1. README.md for setup and commands
  2. .env for provider API keys
  3. configs/ for provider and budget config overrides (optional)
  4. examples/gui_input/ for quick GUI test inputs
  5. CLI commands: run, compile, ui, serve-mcp

Usage Flowcharts

Normal User Path (Chat + Agent)

flowchart TD
    A[Install CFAdv] --> B[Add API keys in .env]
    B --> C{Pick workflow}
    C --> D[Chat workflow]
    C --> E[Agent workflow]
    D --> D1[Choose model: gpt-5-mini or claude-sonnet-4-6 or local ollama]
    E --> E1[Choose agentic model: claude-sonnet-4-6 or gpt-5-mini or tool-using model]
    D1 --> F[Run cpo compile or cpo run]
    E1 --> F
    F --> G[Provider adapter builds request]
    G --> H[Model response + citations + context stats]
Loading

Developer Path (Build + Evaluate)

flowchart TD
    A[Prepare corpus] --> B[Run cpo precompute]
    B --> C[Run benchmarks and tests]
    C --> D{Serve path}
    D --> E[CLI and app integration]
    D --> F[Web UI]
    E --> G{Runtime mode}
    F --> G
    G --> H[Chat or QA packer]
    G --> I[Agent packer + delta fusion]
    H --> J[Provider adapter]
    I --> J
    J --> K[OpenAI or Anthropic or Ollama or compatible]
    K --> L[Track token, latency, and cache metrics]
Loading

Architecture

CFAdv uses a middleware pipeline:

Ingest → Normalize → Canonical IR → Precompute → Dedup/Fingerprint
→ Query Classify → Candidate Retrieval → Fast Rerank → Budget Planner
→ Context Compression → Attention Fusion → Delta Fusion → Provider Adapter → Cache-Aware Assemble
  1. Ingest: Extract content from multiple file formats
  2. Normalize: Convert to uniform ContextBlock objects
  3. Represent: Generate alternative compact representations per block
  4. Precompute: Persist compact variants, token stats, retrieval features, and fingerprints
  5. Retrieve: Query classify → top-100 lexical retrieval → top-20/25 rerank
  6. Plan: Multi-objective latency-aware representation selection under token budget
  7. Fuse: Query-dependent attention ranking (AttnRes-inspired) + ContextDelta for agent turns
  8. Assemble: Build cache segments and canonical ContextPacket
  9. Compile: Build provider-specific request-ready payloads

See docs/architecture.md for full component detail and docs/attention_fusion.md for the attention fusion design.

Supported Formats

Format Extensions Dependencies
Text .txt, .log
Documents .pdf pdfminer.six
.docx python-docx
Structured .csv, .tsv pandas
.json, .jsonl
Images .png, .jpg, .tiff Pillow, pytesseract
Code .py, .js, .ts, .go, .rs, etc. tree-sitter (optional)
Markdown .md

Configuration

Copy configs/default.yaml or create your own config.yaml:

budget:
  instructions: 1000
  retrieval: 3000
  memory: 2000
  examples: 1500
  tool_trace: 1000
  output_reserve: 1000

scoring:
  utility_weights:
    retrieval: 0.25
    trust: 0.20
    freshness: 0.15
    structure: 0.15
    diversity: 0.15
    token_cost: -0.10

provider:
  name: anthropic
  model: claude-sonnet-4-6

features:
  use_attention_fusion: true
  attention_temperature: 1.0

Available providers: openai, anthropic, ollama, openai_compatible.

Algorithm

CFAdv formulates context selection as a multi-objective knapsack problem:

maximize Σ(
    w_u * utility_i
  - w_r * risk_i
  - w_t * token_cost_i
  - w_l * latency_cost_i
  + w_c * cacheability_i
  + w_d * diversity_i
) * z_i

subject to:
    Σ(token_i * z_i) <= token_budget
    z_i ∈ {0, 1}

After selection, contexts are reordered by query-dependent attention weights (softmax over cosine similarity between the query embedding and each context embedding), so the most relevant content appears first. See docs/algorithm.md.

Attention Fusion

CFAdv adds AttentionContextFusion and BlockAttentionFusion on top of the base planner, inspired by Block Attention Residuals (AttnRes, arxiv 2603.15031):

  • Each context is embedded with bow_embedding (64-dim, L2-normalized, vocabulary-aware)
  • Query-to-context cosine similarity scores are computed and passed through temperature-scaled softmax
  • Contexts are reordered by descending weight, so highest relevance appears first
  • BlockAttentionFusion applies the same hierarchy to named blocks (system / history / retrieval / tools), using mean-pooled embeddings as block representatives for cross-block ranking

See docs/attention_fusion.md for the full design and formulas.

Precompute Workflow

cpo precompute ./data --store-dir .cpo_cache/precompute --semantic-dedup

Stores fingerprints, summaries, compact variants, and retrieval features in .cpo_cache/precompute. Use --precomputed-only in run/compile to avoid regeneration on cache hits.

Chat Mode vs Agent Mode

Mode Packing strategy
chat Concise context for standard conversation prompts
qa Extractive evidence + citation-first packing
code Signatures, changed regions, dependency-focused packing
agent Working-memory and constraint deltas with optional incremental fusion

Compression Pipeline

Compression levels (none, light, medium, aggressive) apply:

  • Citation map compaction (Source URI[id])
  • JSON minification
  • Schema field pruning for structured payloads

Delta Fusion

Use --delta with run or compile to compute incremental packet changes across turns:

  • added blocks, updated blocks, removed blocks, unchanged block IDs

Cache-Aware Assembly

Each packet splits into stable and dynamic segments:

  • stable: task/system instructions, citation maps, cacheable blocks
  • dynamic: non-cacheable or volatile blocks

Enables reuse across repeated chat/agent turns and lowers effective prompt churn.

Examples

python examples/multiformat_ingestion_demo.py  # multi-format ingestion
python examples/rag_context_optimizer.py       # RAG-optimized context selection
python examples/memory_compaction_demo.py      # memory management
python examples/ablation_demo.py               # ablation studies

make examples  # run all four

See examples/EXAMPLE_RESULTS.md for latest run outputs.

Web UI

cpo ui --host <host> --port 8080
# or
make ui

Open http://<host>:8080 to:

  1. Choose Input Mode (Directory or File list) and enter a path (e.g. ./examples/gui_input)
  2. Set Task Mode (chat, qa, code, agent) and enter a query
  3. Set Budget (token budget)
  4. Pick Provider and Model (default: anthropic / claude-sonnet-4-6)
  5. Click Run Pipeline

The results panel shows run stats, representation usage, selected blocks, context preview, and model answer.

Improvements over context-fusion

CFAdv is built on context-fusion and adds:

Capability context-fusion CFAdv
Multiformat ingestion, normalization, scoring
Knapsack budget planner + BM25 retrieval
Compact representations, delta fusion, providers
Query-dependent context ordering
Two-level block attention hierarchy
Vocabulary-aware 64-dim embeddings (L2-norm)
docs/attention_fusion.md
Test count ~49 72

For a detailed side-by-side, see docs/comparison.md.


Benchmarks

make benchmark          # tiny eval (baseline vs cf_uniform vs cf_attention)
make benchmark-weights  # same with attention weight detail
make benchmark-api      # live Anthropic API benchmark (requires .env)
make benchmark-all      # all local benchmarks

Latest tiny benchmark (2026-03-21, local deterministic):

Mode Avg tokens Success vs baseline
baseline 99.0 100%
cf_uniform 3.7 100% −96.3%
cf_attention 3.7 100% −96.3%

Latest Claude API benchmark (2026-03-21, claude-sonnet-4-6):

Mode Avg context tokens Success
with_cfadv 10.3 100%
without_cfadv 947.0 100%

Context-token reduction with CFAdv: 98.9%

Tiny benchmark — context tokens (lower is better)
With CFAdv    3.7   | █
Without CFAdv 99.0  | ████████████████████████

Claude API — context tokens (lower is better)
With CFAdv    10.3  | █
Without CFAdv 947.0 | ████████████████████████████████████████

See benchmarks/BENCHMARK_RESULTS.md, benchmarks/BENCHMARK_API_RESULTS.md, and benchmarks/BENCHMARK_SUPPLEMENTAL_RESULTS.md for full per-task detail.

Testing

make test           # run full suite
make test-cov       # with coverage report
make test-integration

Latest run (2026-03-21): 72 passed, 0 failed. See tests/TEST_RESULTS.md.

Coverage highlights: attention_fusion.py 83%, planner.py 95%, bm25.py 97%, registry.py 98%.

Validation Snapshot

Latest local smoke checks (2026-03-21):

  • pipeline: cpo run ./docs --budget 600 --query "Summarize key architecture points" — passed
  • GUI: cpo ui --host <host> --port 8081 — HTML served, /api/run responded with JSON

Development

make bootstrap      # first-time setup
make install-dev    # install package + dev tools + pre-commit hooks
make lint           # ruff check
make format         # ruff format
make type-check     # mypy
make all-checks     # format + lint + type-check + test
make build          # build sdist + wheel
make docs           # build MkDocs site

Project Structure

CFAdv/
├── README.md
├── CITATION.cff
├── CONTRIBUTING.md
├── SECURITY.md
├── pyproject.toml
├── Makefile
├── requirements.txt
├── requirements-dev.txt
├── .env.example
├── src/context_portfolio_optimizer/
│   ├── ingestion/          # File loaders (text, PDF, DOCX, CSV, JSON, image, code)
│   ├── normalization/      # ContextBlock building
│   ├── representations/    # Compact representation variants
│   ├── retrieval/          # BM25 + reranker + query classifier
│   ├── scoring/            # Utility and risk models
│   ├── allocation/         # Budget + knapsack + multi-objective planner
│   ├── dedup/              # Fingerprinting + duplicate collapse
│   ├── compression/        # JSON/citation/schema compression
│   ├── caching/            # Cache segment and packet cache
│   ├── fusion/             # Attention fusion + delta computation
│   ├── assembly/           # Provider-aware packet compiler
│   ├── ir/                 # Canonical ContextPacket IR
│   ├── providers/          # Provider adapters + registry
│   ├── precompute/         # Offline precompute pipeline + bow_embedding
│   ├── orchestration/      # Pipeline runner
│   ├── memory/             # Memory storage + compaction
│   ├── agents/             # Agent loop support
│   ├── integrations/       # LangChain / LlamaIndex wrappers
│   ├── mcp_server/         # MCP-style server
│   ├── web_ui.py           # Local visualization server
│   └── cli.py              # Command-line interface (`cpo`)
├── configs/                # Provider and runtime YAML configs
├── docs/                   # Architecture, algorithm, attention_fusion, comparison, CLI
├── benchmarks/             # Benchmark runners + result reports
├── examples/               # Demo scripts + GUI input samples
└── tests/                  # Test suite (72 tests)

Local-only artifacts excluded by .gitignore: .env, virtualenvs, caches, coverage outputs.

Citation

@software{r2026cfadv,
  author       = {Rohan R},
  title        = {CFAdv},
  year         = {2026},
  url          = {https://github.com/rotsl/CFAdv},
  version      = {0.1.0},
  orcid        = {0009-0005-9225-1775}
}

License

Apache-2.0. See LICENSE for details.

Contributing

See CONTRIBUTING.md for guidelines.

Roadmap

  • Additional file format support (EPUB, HTML)
  • Learned utility models from feedback
  • Distributed processing for large datasets
  • Tighter integration with popular RAG frameworks

Acknowledgments

CFAdv builds on ideas from information retrieval and operations research. The attention fusion module is inspired by Block Attention Residuals (AttnRes, arxiv 2603.15031).


CFAdv: less context, more signal.

About

Context compiler for LLMs. Scores, selects and reorders context under a token budget using attention-based fusion. Built on Context-Fusion.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Contributors