Composable vector retrieval with SQL.
flexvec is a Python library that reshapes vector search scores before selection. Suppress a topic, weight by recency, spread across subtopics, project a direction through embedding space — all in one SQL statement. Runs in-process on any SQLite database. No server, no index.
pip install flexvecAny SQLite database with an embedding column works.
CREATE TABLE chunks (
id TEXT PRIMARY KEY,
content TEXT,
embedding BLOB -- float32, L2-normalized
);Load embeddings into memory once. Every query after that is a matmul.
import sqlite3
from flexvec import VectorCache, register_vec_ops, execute, get_embed_fn
db = sqlite3.connect("my.db")
cache = VectorCache()
cache.load_from_db(db, "chunks", "embedding", "id")
register_vec_ops(db, {"chunks": cache}, get_embed_fn())Write SQL. flexvec handles the vector math behind the scenes.
rows = execute(db, """
SELECT v.id, v.score, c.content
FROM vec_ops('similar:authentication patterns') v
JOIN chunks c ON v.id = c.id
ORDER BY v.score DESC LIMIT 5
""")Find authentication patterns without drowning in deployment and testing discussions.
SELECT v.id, v.score, c.content
FROM vec_ops(
'similar:authentication patterns
diverse suppress:deployment suppress:testing',
'SELECT id FROM chunks WHERE length(content) > 200') v
JOIN chunks c ON v.id = c.id
ORDER BY v.score DESC LIMIT 10suppress: pushes deployment and testing content out of the results. diverse spreads across subtopics instead of returning ten variations of the same match. The pre-filter scopes to chunks over 200 characters — cutting out noise before anything gets scored.
Find the session where you actually fixed that OOM error — not just the logs.
SELECT k.id, k.rank, v.score, c.content
FROM keyword('OOM') k
JOIN vec_ops('similar:memory limit debugging worker crash fix') v ON k.id = v.id
JOIN chunks c ON k.id = c.id
ORDER BY v.score DESC LIMIT 10keyword('OOM') finds every chunk containing the term. vec_ops() scores by relevance to debugging and fixing. The JOIN keeps only chunks that match both — exact term plus semantic relevance.
Tokens reshape scores. They compose freely in a single string.
| token | what it does |
|---|---|
similar:TEXT |
search for this concept |
suppress:TEXT |
push this topic out of results (stackable) |
diverse |
spread across subtopics instead of ten versions of the same answer |
decay:N |
favor recent content — N-day half-life |
centroid:id1,id2 |
"more like these" — search from the average of examples |
from:A to:B |
find content along a conceptual arc |
pool:N |
how many candidates to score (default 500) |
'similar:auth diverse suppress:oauth decay:7' — four operations, one query.
Every query runs three phases in one SQL statement.
SQL pre-filter → numpy modulation → SQL compose
- SQL pre-filter narrows what enters scoring — by date, type, length, or any SQL expression.
- numpy modulation scores candidates and reshapes the score array with tokens before selection.
- SQL compose joins results back to your tables for grouping, filtering, or reranking.
The database is never modified. Results materialize as a temp table that SQL composes over.
No index. Brute-force matmul on a numpy matrix.
| corpus | matmul | full pipeline |
|---|---|---|
| 250K | 5ms | 19ms |
| 500K | 7ms | 37ms |
| 1M | 17ms | 82ms |
128 dimensions, Nomic Embed v1.5 (Matryoshka). Pre-filtering narrows candidates before the matmul — scoped queries run in single-digit ms.
pip install flexvec # core (numpy only)
pip install flexvec[embed] # + ONNX embedder
pip install flexvec[embed,graph] # everything- arXiv paper — architecture and evaluation
- flex — search and retrieval for AI agents (uses flexvec)
- getflex.dev
MIT · Python 3.10+ · SQLite · numpy