π§ Built as a 6th Semester NLP Project using classical Natural Language Processing techniques in Python.
Transform natural language into executable SQL β fully offline, privacy-first, and zero API cost.
QueryGenie was developed as part of a 6th Semester Natural Language Processing project, with a strong focus on implementing core NLP concepts from scratch using Python rather than relying on large language models.
This project emphasizes:
- Practical application of NLP pipelines
- Classical ML over black-box APIs
- Explainability and transparency in language understanding
- Gautam N Chipkar
- Ameena Naik
- Asma Inamdar
- Jasmine Mulla
QueryGenie is a Natural Language Interface to Database (NLIDB) that enables users to query structured databases using plain English or voice.
Unlike LLM-based systems, it is built entirely using:
- π Python-based NLP pipeline
- π§ Classical Machine Learning (scikit-learn)
- π§© Rule-based linguistic processing
This makes the system:
- π Fully offline
- πΈ Zero-cost
- β‘ Lightweight and fast
- π‘οΈ Privacy-preserving
"Show students who scored more than 80"
SELECT * FROM STUDENT WHERE MARKS > 80;- No APIs, no cloud
- Runs locally
- Suitable for secure environments
- Intent Classification
- Entity Extraction
- Semantic Similarity Matching
-
Query results visualization
-
Debug panel with:
- Intent + confidence
- Extracted entities
- Generated SQL
- Speech-to-text query support
This project demonstrates key NLP concepts typically covered in a semester course:
- Tokenization
- Lowercasing
- Stopword handling (implicit via TF-IDF)
- TF-IDF Vectorization
- Converts text into numerical feature space
- Logistic Regression (Supervised Learning)
- Maps user queries to predefined intents
- Cosine Similarity
- Handles ambiguous or unseen queries
-
Regex-based pattern matching
-
Extracts:
- Numerical values (e.g., 80)
- Conditions (>, <, =)
- Limits (Top N queries)
- Maps structured intent + entities β SQL templates
- Converts SQL results into readable responses
flowchart LR
A[User Input] --> B(TF-IDF Vectorization)
B --> C(Intent Classifier)
C --> D(Entity Extraction)
D --> E(SQL Generator)
E --> F(Database Execution)
F --> G(Response Generator)
G --> H(UI Output)
| Component | Technology | Purpose |
|---|---|---|
| Text Vectorization | TF-IDF | Converts natural language into numerical features |
| Similarity Engine | Cosine Similarity | Handles ambiguous queries via semantic matching |
| Intent Classification | Logistic Regression (scikit-learn) | Predicts user intent from query |
| Entity Extraction | Regex (re) | Extracts conditions, values, and limits |
| NLP Pipeline | Tokenization, Feature Engineering, Slot Filling | End-to-end language understanding workflow |
| Response Generation | Rule-based NLG | Converts SQL output into readable responses |
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3 | Core implementation language |
| Backend Logic | Modular Python Scripts | Handles pipeline orchestration |
| UI Framework | Streamlit | Interactive frontend + debug interface |
| Database | SQLite3 | Local query execution engine |
| Data Handling | Pandas | Data processing and formatting |
| Voice Input | SpeechRecognition | Converts speech to text |
| Security | Template-based SQL | Prevents SQL injection |
QueryGenie/
β
βββ app.py
βββ intent_classifier.py
βββ entity_extractor.py
βββ sql_generator.py
βββ response_generator.py
βββ speech_handler.py
βββ sql.py
βββ student.db
βββ requirements.txt
βββ README.mdgit clone https://github.com/gee-46/querygenie.git
cd querygenie
python -m venv venvActivate:
venv\Scripts\activate # Windows
source venv/bin/activate # macOS/Linuxpip install -r requirements.txt
python sql.py
streamlit run app.py- "Show all students"
- "How many students are there?"
- "Top 3 performers"
- "Average marks"
- "Students scoring above 80"
- π Academic NLP demonstrations
- π Database querying without SQL knowledge
- π Offline enterprise tools
- π§ Learning end-to-end NLP pipelines
- Single-table schema
- Limited intent set
- No advanced NER (yet)
- spaCy-based Named Entity Recognition
- Multi-table JOIN support
- Offline speech models (Whisper/Vosk)
- Data visualization
Gautam N Chipkar GitHub: https://github.com/gee-46
- Star β
- Fork π΄
- Build π
MIT License
This project proves that powerful NLP systems can be built using Python and classical techniques β without relying on expensive APIs or large models.
Explainable. Offline. Academic. Practical.