NLWeb supports PostgreSQL with the pgvector extension for vector similarity search. This provides a powerful and scalable option for storing and retrieving vector embeddings using standard SQL database technology.
- PostgreSQL database (version 11 or higher recommended)
- pgvector extension installed in the database
- A table with the following schema (or compatible) - note that this should be done for you when you first load data.
CREATE TABLE documents (
id TEXT PRIMARY KEY, -- Document ID (URL or other unique identifier)
url TEXT NOT NULL, -- URL of the document
name TEXT NOT NULL, -- Name of the document (title or similar)
schema_json JSONB NOT NULL, -- JSON schema of the document
site TEXT NOT NULL, -- Site or domain of the document
embedding vector(1536) NOT NULL -- Vector embedding (adjust dimension to match your model)
);
-- Create a vector index for faster similarity searches
CREATE INDEX IF NOT EXISTS embedding_cosine_idx
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);Update the .env file with this information after you have deployed your Postgres database:
# If using Postgres connection string
POSTGRES_CONNECTION_STRING="postgresql://<HOST>:<PORT>/<DATABASE>?user=<USERNAME>&sslmode=require"
POSTGRES_PASSWORD="<PASSWORD>"Configure PostgreSQL in the config_retrieval.yaml file:
preferred_endpoint: postgres # Set this to use PostgreSQL as default
endpoints:
postgres:
# Database connection details
api_endpoint_env: POSTGRES_CONNECTION_STRING # Database connection details (i.e. "postgresql://<HOST>:<PORT>/<DATABASE>?user=<USERNAME>&sslmode=require")
# Password for authentication
api_key_env: POSTGRES_PASSWORD
index_name: documents
# Specify the database type
db_type: postgres
NOTE: If you are using Azure Postgres Flexible server make sure you have vector extension allow-listed
To setup your PostgreSQL configuration, run the following setup script:
In the python directory run
# Setup the Postgres server
python misc/postgres_load.pyYou can provide credentials directly or via environment variables (recommended for security).
The following will be automatically installed when you run the Setup Schema or call the Postgres Client:
psycopg- The PostgreSQL adapter for Python (psycopg3)psycopg[binary]- Binary dependencies for psycopgpsycopg[pool]- Connection pooling supportpgvector- Support for pgvector operations (vector types and indexing)
The PostgreSQL vector client implements the full VectorDBClientInterface and supports all standard operations:
- Vector similarity search
- Document upload with vector embeddings
- URL-based document lookup
- Site-specific filtering
- Document deletion