Skip to content

nimaboubanian/text2query-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Text-to-Query

Convert natural language to SQL queries using local LLMs. Using TPC-H benchmark to evaluate the models.

Quick Start

# Start all services
docker compose up -d

# Enter interactive mode
docker compose exec app text2query

# Start benchmark
docker compose --profile benchmark up --build orchestrator

Configuration

All user-configurable settings are at the top of compose.yml in the x-config block:

x-config: &config
  DEFAULT_MODEL:       "qwen2.5-coder:7b"     # SQL generation model
  FRONTDESK_MODEL:     "qwen2.5:3b"           # Intent routing model
  BENCHMARK_MODELS:    "llama3.2:3b,qwen2.5-coder:7b"
  BENCHMARK_NUM_SEEDS: "1" # Number of repetitation for more reliable results

After changing models, recreate the Ollama container to pull them:

docker compose up -d --force-recreate ollama
docker compose logs -f ollama   # watch download progress

Mini Database

A simple e-commerce database (customers, products, orders) loads automatically for testing.

Example queries: "What are the customers' names?", "Top 3 best-selling products", "Show customers who spent more than $500 total"

Reset with docker compose down -v.

REPL Commands

Command Description
/help Show available commands
/schema Display the database schema
/sql Display the SQL query
/model List available models and the active one
/model <name> Switch to a different model
/quit Exit

External Database

Edit DATABASE_URL in the app service in compose.yml and remove the postgres dependency:

environment:
  <<: *config
  DATABASE_URL: postgresql://user:pass@192.168.1.10:5432/mydb

For databases on the Docker host, add extra_hosts: ["host.docker.internal:host-gateway"] to the app service.

Benchmark

docker compose --profile benchmark up --build orchestrator

Runs a three-phase TPC-H pipeline: Setup (data generation, schema loading) → Generation (LLM query generation and execution) → Analysis (similarity metrics, reports, archiving).

Evaluation Metrics

Metric Purpose
Result F1 Primary correctness — did the query produce the right data?
AST Similarity Structural closeness of SQL to reference
BLEU / Token Jaccard Token-level overlap between queries
Clause Scores Per-clause breakdown (SELECT, WHERE, etc.)
Composite Weighted aggregate: F1 (50%), AST (30%), BLEU (20%)

Multi-Seed Mode

Set BENCHMARK_NUM_SEEDS in x-config to run each query multiple times with different random seeds for statistical robustness (mean, std, 95% CI).

Multi-Model Mode

Uncomment BENCHMARK_MODELS in x-config to compare up to 3 models side-by-side. Output includes per-model reports plus comparison.md and results.csv.

GPU Acceleration

Pass a compose override — all settings from compose.yml are preserved.

NVIDIA (Container Toolkit required):

docker compose -f compose.yml -f compose.nvidia.yml up -d

AMD (ROCm drivers required):

docker compose -f compose.yml -f compose.amd.yml up -d

Development

cd app && uv sync --extra dev
uv run pytest -v            # run all 99 tests

Project Structure

app/src/text2query/
  core/config.py          # Centralized configuration (env vars)
  llm/service.py          # Ollama streaming + SQL extraction
  llm/prompts.py          # Prompt templates
  database/executor.py    # SQL execution → DataFrame
  database/schema.py      # Schema introspection
  cli/repl.py             # Interactive REPL
  cli/frontdesk.py        # Intent classification + summarization
  cli/style.py            # Nord-themed TUI
  benchmark/              # TPC-H benchmark pipeline

About

A Docker-based microservices project converts natural language questions to SQL queries using local LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors