Predictive Analytics Platform

👨‍💻 About The Project

Hi there! I built this platform to bridge the gap between raw customer data and actionable business insights. In the real world, data doesn't just sit there; it tells a story. My goal was to build a system that listens to that story and predicts the next chapter.

This isn't just a collection of scripts; it's a production-grade architecture demonstrating how ML models can be deployed as microservices, complete with real-time APIs, interactive dashboards, and automated pipelines.

🧠 The Intelligence Core

I implemented three distinct machine learning approaches to tackle different aspects of customer behavior:

Customer Lifetime Value (CLV) Forecasting
- Technique: Gradient Boosting (XGBoost)
- Why? CLV data is often non-linear and zero-inflated. XGBoost handles these irregularities beautifully while offering high predictive power.
- Outcome: Predicts exactly how much value a customer brings over their lifespan.
Churn Risk Detection
- Technique: Random Forest Classifier
- Why? I needed a robust model that resists overfitting and provides interpretable feature importance (so we know why someone is leaving).
- Outcome: Flags at-risk customers before they leave, enabling proactive retention.
Behavioral Segmentation
- Technique: K-Means Clustering
- Why? Unsupervised learning allows us to discover natural groupings in the data without bias.
- Outcome: Identifies 5 distinct personas (e.g., "Loyal Regulars", "High-Value Engaged") for targeted marketing.

🏗️ System Architecture

I designed this as a modular system to mimic a real-world microservices environment:

graph TD
    Data[Raw Data] --> ETL[ETL & Feature Eng.]
    ETL --> Models[Model Training Pipeline]
    Models --> Registry[Model Registry]
    
    Registry --> API[FastAPI Service]
    API --> Dashboard[Plotly Dash UI]
    API --> External[External Apps]

Data Layer: Synthetic data generator that mimics real-world distributions (Pareto principles in spending, etc.).
ML Layer: Scikit-learn & XGBoost pipelines with automated feature engineering.
Serving Layer: FastAPI for high-performance, asynchronous inference.
Presentation Layer: Plotly Dash for interactive, business-intelligence grade visualizations.

🚀 Quick Start

1. Setup

# Clone the repo
git clone https://github.com/probablynotnmp/predictive-analytics-platform.git
cd predictive-analytics-platform

# Install dependencies
pip install -r requirements.txt

2. Training

Run the training pipeline to generate data and train the models from scratch:

python src/models/model_trainer.py

You'll see the training logs as it engineers features and optimizes the models.

3. Launch the Platform

Fire up the API and Dashboard:

# Terminal 1: API
uvicorn src.api.main:app --reload --port 8000

# Terminal 2: Dashboard
python src/visualization/dashboard.py

Visit the Dashboard at http://localhost:8050, and Done!

📊 API Documentation

I believe good code documents itself. The API comes with full UI documentation. Once running, visit: http://localhost:8000/docs

Sample Prediction Request:

POST /predict/comprehensive
{
  "age": 28,
  "account_type": "Premium",
  "tenure_days": 450,
  "recency": 5,
  "frequency": 4.2,
  "monetary": 1200.0,
  "email_open_rate": 85.5,
  ...
}

Sample Output:

{
  "clv": {
    "customer_lifetime_value": 2450.50,
    "clv_segment": "Medium Value",
    "confidence": "high"
  },
  "churn": {
    "churn_probability": 0.1250,
    "churn_prediction": 0,
    "risk_category": "Low Risk",
    "top_risk_factors": [
      "recency",
      "frequency",
      "monetary"
    ]
  },
  "segment": {
    "segment_id": 1,
    "segment_name": "Loyal Regulars",
    "distance_to_center": 1.2345,
    "recommended_actions": [
      "Send loyalty reward",
      "Offer annual plan discount"
    ]
  }
}

🛠️ Tech Stack & Design Choices

Python 3.10: For modern type hinting and performance.
FastAPI: Chosen over Flask for its native async support and automatic Pydantic validation.
Plotly Dash: Preferred over Tableau for this project to keep the entire stack Python-native and version-controllable.
Docker: Containerized for "works on my machine" guarantees.

📈 Future Improvements

If I were to take this to production, I would:

Replace the CSV storage with PostgreSQL or Snowflake.
Implement MLflow for experiment tracking.
Add Redis caching for the prediction endpoints to reduce latency.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
public		public
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
INSTRUCTIONS.md		INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.js		vite.config.js
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Analytics Platform

👨‍💻 About The Project

🧠 The Intelligence Core

🏗️ System Architecture

🚀 Quick Start

1. Setup

2. Training

3. Launch the Platform

📊 API Documentation

🛠️ Tech Stack & Design Choices

📈 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive Analytics Platform

👨‍💻 About The Project

🧠 The Intelligence Core

🏗️ System Architecture

🚀 Quick Start

1. Setup

2. Training

3. Launch the Platform

📊 API Documentation

🛠️ Tech Stack & Design Choices

📈 Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages