Machine Learning predictions served live through the browser
Trained ensemble models taken out of Jupyter notebooks and deployed behind a Django web app — fill in a form, get an instant prediction. No notebooks, no code, just results in real time
RealTime ML demonstrates the full lifecycle of a practical machine learning project — from raw data to a clickable prediction in the browser:
Data exploration → feature engineering → model benchmarking → ensemble selection → serialization → live web deployment
It ships with two independent prediction services built on two different ML problem types, both powered by stacking ensembles chosen after benchmarking multiple algorithms against simpler baselines.
| 🩺 Service | 🎯 Problem Type | ❓ Question It Answers |
|---|---|---|
| Heart Disease Detection | Classification | Given a patient's clinical attributes, is heart disease likely present? |
| Fuel Consumption Prediction | Regression | Given a car's specifications, what is its estimated fuel economy (MPG)? |
- 🧩 Ensemble modeling — Voting and Stacking ensembles benchmarked against tuned individual models (Logistic Regression, SVM, k-NN, Decision Trees, Random Forest, and more).
- ⚙️ Production-style inference — preprocessing (Min–Max scaling + one-hot encoding) is reproduced exactly at request time so live predictions match training.
- 🔍 Systematic tuning — every candidate model optimized with
GridSearchCVand compared on consistent metrics (precision/accuracy vs. R²). - 💾 Decoupled training & serving — models and scalers serialized with
pickleand loaded by the web app, so serving needs no retraining.
Web
- Python 3.11.5
- Django 5.0.2
- SQLite
Machine Learning & Data
- scikit-learn 1.2.1
- pandas 2.2.0
- NumPy 1.26.4
- Pillow 10.2.0
- pickle for serialization
Notebooks & Visualization
- Jupyter Notebook
- Matplotlib
- Seaborn
Datasets
heart.csv— clinical heart-disease data (classification)auto-mpg.csv— automobile fuel-economy data (regression)
- Two live ML services behind a single web app — classification and regression.
- Interactive web forms collecting the exact features each model expects.
- Real-time inference — user input is converted to a DataFrame, preprocessed, and scored on every request.
- Automatic preprocessing pipeline:
- Min–Max scaling of continuous heart-disease features via a pre-fitted scaler (
min_max_scaler.pkl). - One-hot encoding of categorical inputs (sex, chest-pain type, ECG, exercise angina, ST slope) reconstructed to match the training schema.
- Min–Max scaling of continuous heart-disease features via a pre-fitted scaler (
- Pre-trained, serialized models loaded from
.pklfiles — no retraining to serve predictions. - Human-readable results on a dedicated page (e.g. "Heart disease detected!" or "The estimated gas consumption is: 27.431 MPG").
# 1. Clone the repository
git clone https://github.com/HarshTanwar1/RealTime_ML.git
cd RealTime_MLpython3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install numpy==1.26.4 pandas==2.2.0 pillow==10.2.0 django==5.0.2 scikit-learn==1.2.1python manage.py migratepython manage.py runserverThen open http://127.0.0.1:8000/ in your browser, pick a service, fill in the form, and get a live prediction. 🎉
⚠️ Version note: the models were pickled with scikit-learn 1.2.1. A different version may trigger unpickling warnings or errors — matching the version is recommended
RealTime_ML-main/
├── manage.py # Django entry point
├── RealTime_ML/ # Django project config (settings, urls, wsgi/asgi)
├── ML/ # Main Django app (views & urls — the inference logic)
├── Templates/ # HTML templates (index, classification, regression, prediction)
├── Classification/ # Heart disease notebook + heart.csv
├── Regression/ # MPG prediction notebook + auto-mpg.csv
├── stacking_classifier_model.pkl # Trained heart-disease ensemble
├── stacking_regressor_model.pkl # Trained MPG ensemble
├── min_max_scaler.pkl # Fitted scaler for classification features
└── db.sqlite3 # Django database
- Data exploration — loaded and inspected both datasets in Jupyter, handled missing values, and studied feature distributions with Matplotlib/Seaborn.
- Feature engineering — one-hot encoded categoricals (
pd.get_dummies) and Min–Max scaled continuous features for the heart-disease data; cleaned and prepared the auto-mpg data. - Model benchmarking — trained and tuned a range of algorithms with
GridSearchCV:- Classification: Gaussian Naive Bayes, Logistic Regression, SVM, k-NN, Decision Tree.
- Regression: Random Forest, Linear Regression, Decision Tree, ElasticNet, Lasso, k-NN.
- Ensembling — compared Voting and Stacking ensembles built from the best models, using
GridSearchCVto select the optimalfinal_estimator. - Model selection & serialization — picked the best stacking ensemble for each task and saved the models (and scaler) with
pickle. - Web integration — built a Django app that loads the models, reproduces the exact preprocessing at request time, runs inference, and renders results.
- Bridging the notebook-to-production gap — preprocessing must be replicated exactly at inference (same scaler, same one-hot column order) or predictions silently break.
- Ensemble methods in practice — Voting vs. Stacking, and why combining diverse base learners under a meta-estimator can beat any single tuned model.
- Systematic hyperparameter tuning with
GridSearchCVand fair comparison across algorithms using consistent metrics. - Model persistence — serializing and reloading models/scalers so training and serving stay decoupled.
- Django fundamentals — routing, templating, form handling, CSRF — and how to wire an inference step into the request/response cycle.
- End-to-end ownership — covering the full path from raw CSV to a clickable prediction.
- 🔌 Expose a REST API (Django REST Framework) so the models can power other apps, not just the built-in forms.
- 🛡️ Input validation & error handling for missing, out-of-range, or malformed submissions.
- 🐳 Containerize with Docker for reproducible, environment-independent deployment.
- 🎨 Improve UI/UX — move CSS to static files, add a framework, surface prediction confidence/probabilities, improve accessibility.
- 🗄️ Persist predictions to the database for history, analytics, and model monitoring.
- 📈 Monitoring & retraining pipeline — track input drift and automate periodic retraining.
- ⚡ Load models once at startup (and from a versioned registry) instead of reading
.pklfiles on every request.
⭐ If you found this project interesting, consider giving it a star!