Hi there! I built this platform to bridge the gap between raw customer data and actionable business insights. In the real world, data doesn't just sit there; it tells a story. My goal was to build a system that listens to that story and predicts the next chapter.
This isn't just a collection of scripts; it's a production-grade architecture demonstrating how ML models can be deployed as microservices, complete with real-time APIs, interactive dashboards, and automated pipelines.
I implemented three distinct machine learning approaches to tackle different aspects of customer behavior:
-
Customer Lifetime Value (CLV) Forecasting
- Technique: Gradient Boosting (XGBoost)
- Why? CLV data is often non-linear and zero-inflated. XGBoost handles these irregularities beautifully while offering high predictive power.
- Outcome: Predicts exactly how much value a customer brings over their lifespan.
-
Churn Risk Detection
- Technique: Random Forest Classifier
- Why? I needed a robust model that resists overfitting and provides interpretable feature importance (so we know why someone is leaving).
- Outcome: Flags at-risk customers before they leave, enabling proactive retention.
-
Behavioral Segmentation
- Technique: K-Means Clustering
- Why? Unsupervised learning allows us to discover natural groupings in the data without bias.
- Outcome: Identifies 5 distinct personas (e.g., "Loyal Regulars", "High-Value Engaged") for targeted marketing.
I designed this as a modular system to mimic a real-world microservices environment:
graph TD
Data[Raw Data] --> ETL[ETL & Feature Eng.]
ETL --> Models[Model Training Pipeline]
Models --> Registry[Model Registry]
Registry --> API[FastAPI Service]
API --> Dashboard[Plotly Dash UI]
API --> External[External Apps]
- Data Layer: Synthetic data generator that mimics real-world distributions (Pareto principles in spending, etc.).
- ML Layer: Scikit-learn & XGBoost pipelines with automated feature engineering.
- Serving Layer: FastAPI for high-performance, asynchronous inference.
- Presentation Layer: Plotly Dash for interactive, business-intelligence grade visualizations.
# Clone the repo
git clone https://github.com/probablynotnmp/predictive-analytics-platform.git
cd predictive-analytics-platform
# Install dependencies
pip install -r requirements.txtRun the training pipeline to generate data and train the models from scratch:
python src/models/model_trainer.pyYou'll see the training logs as it engineers features and optimizes the models.
Fire up the API and Dashboard:
# Terminal 1: API
uvicorn src.api.main:app --reload --port 8000
# Terminal 2: Dashboard
python src/visualization/dashboard.pyVisit the Dashboard at http://localhost:8050, and Done!
I believe good code documents itself. The API comes with full UI documentation.
Once running, visit: http://localhost:8000/docs
Sample Prediction Request:
POST /predict/comprehensive
{
"age": 28,
"account_type": "Premium",
"tenure_days": 450,
"recency": 5,
"frequency": 4.2,
"monetary": 1200.0,
"email_open_rate": 85.5,
...
}Sample Output:
{
"clv": {
"customer_lifetime_value": 2450.50,
"clv_segment": "Medium Value",
"confidence": "high"
},
"churn": {
"churn_probability": 0.1250,
"churn_prediction": 0,
"risk_category": "Low Risk",
"top_risk_factors": [
"recency",
"frequency",
"monetary"
]
},
"segment": {
"segment_id": 1,
"segment_name": "Loyal Regulars",
"distance_to_center": 1.2345,
"recommended_actions": [
"Send loyalty reward",
"Offer annual plan discount"
]
}
}- Python 3.10: For modern type hinting and performance.
- FastAPI: Chosen over Flask for its native async support and automatic Pydantic validation.
- Plotly Dash: Preferred over Tableau for this project to keep the entire stack Python-native and version-controllable.
- Docker: Containerized for "works on my machine" guarantees.
If I were to take this to production, I would:
- Replace the CSV storage with PostgreSQL or Snowflake.
- Implement MLflow for experiment tracking.
- Add Redis caching for the prediction endpoints to reduce latency.