💰 XoX Price Prediction Model

A comprehensive machine learning project for predicting XoX product prices using multiple regression algorithms and ensemble methods.

📑 Table of Contents

📊 Project Overview
💼 Business Context
📁 Dataset
🤖 Machine Learning Models
📂 Project Structure
✨ Key Features
🚀 Installation
💻 Usage
📈 Model Performance
⚙️ Technical Implementation
📊 Results
🔮 Future Enhancements
🤝 Contributing
📝 License
👤 Author
🙏 Acknowledgments

📊 Project Overview

As an agency helping customers purchase XoX products from various makers, price estimation is critical for making informed purchasing decisions. This project develops and compares multiple machine learning models to accurately predict XoX prices based on product characteristics, dimensions, and other features.

💼 Business Context

Our agency needs to estimate XoX prices before purchase to:

🎯 Recommend products at optimal price points
📈 Identify pricing trends across different makers
💡 Make data-driven purchasing decisions
🔍 Understand which features most influence pricing

📁 Dataset

The project uses sales data containing the following features:

📊 Numerical Features

cost - Production cost
weight - Product weight
height, width, depth - Product dimensions
volume - Calculated as height × width × depth

🏷️ Categorical Features

product_type - Type classification (can be multi-valued)
product_level - Product tier/level
maker - Manufacturer name
ingredients - Product composition (can be multi-valued)

📅 Temporal Features

purchase_date - Date of purchase
Derived: year, month, weekday, day

🎯 Target Variable

price - XoX product price

🤖 Machine Learning Models

This project implements and compares 8 regression models with comprehensive evaluation:

#	Model	Type	Key Feature
1️⃣	Linear Regression	Baseline	Simple linear relationships
2️⃣	Ridge Regression	Regularized	L2 regularization
3️⃣	Lasso Regression	Regularized	L1 + feature selection
4️⃣	PLS Regression	Dimensionality Reduction	Partial Least Squares
5️⃣	Random Forest	Ensemble	Decision tree ensemble
6️⃣	Gradient Boosting	Ensemble	Sequential boosting
7️⃣	XGBoost	Ensemble	Optimized gradient boosting
8️⃣	Stacking Regressor	Meta-Ensemble	Combines top performers ⭐

📂 Project Structure

Price_predictor/
├── 📄 README.md                           # Project documentation
├── 📋 requirements.txt                    # Python dependencies
├── 📓 regression modeling.ipynb           # Main analysis notebook
├── 📁 data/
│   └── 📊 sample_data.csv                # Sample dataset
├── 📁 results/                            # Model performance visualizations
│   ├── 📈 performance.png                # Radar plots of all models
│   └── 📉 stacking regressor.png         # Stacking Regressor predictions
└── 📽️ XoX_Price_Prediction_Model.pptx    # Project presentation

✨ Key Features

🔧 1. Data Preprocessing & Cleaning

Custom transformation functions for price, cost, and dimension conversions
Handling of multi-valued categorical features
Missing value imputation strategies
Feature engineering including volume calculation

🎨 2. Feature Engineering

Temporal feature extraction (year, month, weekday, day)
Volume calculation from dimensions
Numerical feature scaling using MinMaxScaler
Categorical encoding with OneHotEncoder
Custom text processing for multi-valued categories

🎯 3. Model Training & Evaluation

Cross-validation for robust performance estimation
Hyperparameter tuning using GridSearchCV
Comprehensive metrics: MAE, MSE, RMSE, R²
Composite scoring system for model comparison

📊 4. Visualization

Distribution analysis of features and target
Correlation heatmaps
Model performance comparison charts
Radar plots for multi-metric evaluation
Prediction vs actual plots

🚀 Installation

📋 Prerequisites

Python 3.8 or higher
Jupyter Notebook

⚡ Setup

Clone the repository:

git clone https://github.com/sys0507/Price-Predictor.git
cd Price-Predictor

Install required packages:

pip install -r requirements.txt

Launch Jupyter Notebook:

jupyter notebook

Open and run:
- Open regression modeling.ipynb
- Run all cells ▶️

💻 Usage

🎬 Running the Analysis

Data Loading 📥 - The notebook loads data from data/sample_data.csv
Preprocessing 🔧 - Automatic data cleaning and transformation
Model Training 🏋️ - All models are trained with optimized hyperparameters
Evaluation 📊 - Comprehensive performance metrics are calculated
Visualization 📈 - Results are visualized in multiple formats

🔄 Modifying for Your Data

To use your own sales data:

Format your data to match the expected schema (see Dataset section)
Place your CSV file in the data/ folder
Update the file path in the notebook (Cell 1)
Run all cells

📈 Model Performance

📊 Quick Stats

Metric	Value
Total Models	8 (7 individual + 1 ensemble)
Best Performer	Stacking Regressor ⭐
Features Analyzed	10+ (numerical, categorical, temporal)
Evaluation Metrics	MAE, MSE, RMSE, R²

📏 Evaluation Metrics

The notebook includes detailed performance comparisons across all models using:

Train/Test MAE - Mean Absolute Error
Train/Test MSE - Mean Squared Error
Train/Test R² - Coefficient of Determination
Composite Score - Weighted metric combining all measures

📊 Performance Visualizations

Performance visualizations include:

📊 Bar charts comparing metrics across models
🗺️ Normalized heatmaps for multi-metric view
🎯 Stacked and individual radar plots
📈 Actual vs Predicted scatter plots

⚙️ Technical Implementation

🏗️ Pipeline Architecture

Pipeline([
    ('preprocessor', ColumnTransformer([
        ('numerical', MinMaxScaler(), numerical_features),
        ('categorical', OneHotEncoder(), categorical_features),
        ('temporal', StandardScaler(), temporal_features)
    ])),
    ('regressor', Model())
])

🎯 Hyperparameter Tuning

Each model undergoes GridSearchCV optimization with model-specific parameter grids to find optimal configurations.

🏗️ Stacking Strategy

The Stacking Regressor combines top-performing base models with a meta-learner to achieve superior prediction accuracy.

📊 Results

🔍 Model Performance Comparison

The following radar plots show the normalized performance metrics (MAE, MSE, R²) for all models across both training and test sets:

🔑 Key Observations:

⭐ XGBoost and Gradient Boosting show the most balanced performance across all metrics
🌲 Random Forest demonstrates strong R² scores on both train and test sets
📉 Lasso Regression shows signs of underfitting with lower overall performance
🔄 Models with PCA show different metric patterns compared to non-PCA versions

📊 Stacking Regressor Performance

The Stacking Regressor combines the best-performing models to achieve superior prediction accuracy:

🎯 Performance Highlights:

✅ Strong correlation between predicted and actual prices on both train and test sets
✅ Good generalization with similar performance patterns across train/test splits
✅ Effective handling of the full price range from low to high-cost products

💡 Model Insights

The notebook provides comprehensive analysis including:

📊 Feature importance rankings
📈 Temporal price trends
🏷️ Categorical feature impact analysis
⚖️ Model strengths and weaknesses
🚀 Recommendations for production deployment

🔮 Future Enhancements

Potential improvements include:

🧠 Deep learning models (Neural Networks)
🔧 Additional feature engineering
📅 Time series forecasting components
🎯 Automated feature selection
🔍 Model interpretability tools (SHAP, LIME)
🌐 Deployment as REST API

🛠️ Technologies Used

Category	Technologies
Language	Python 3.8+
ML Libraries	scikit-learn, XGBoost
Data Processing	Pandas, NumPy
Visualization	Matplotlib, Seaborn
Development	Jupyter Notebook
Version Control	Git, GitHub

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is open source and available for educational and commercial use under the MIT License.

👤 Author

Created by sys0507

Feel free to reach out for questions, suggestions, or collaborations!

🙏 Acknowledgments

🎓 Techlent ML Camp for project guidance
🔬 Scikit-learn and XGBoost teams for excellent ML libraries
💻 The open-source community for tools and inspiration

⭐ If you find this project helpful, please consider giving it a star!

Made with ❤️ for the ML community

📌 Note: The current sample_data.csv is placeholder data. For actual price predictions, please use real XoX sales data formatted according to the schema described above.

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
data		data
results		results
.gitignore		.gitignore
README.md		README.md
XoX_Price_Prediction_Model.pptx		XoX_Price_Prediction_Model.pptx
regression modeling.ipynb		regression modeling.ipynb
requirements.txt		requirements.txt

Uh oh!

sys0507/Price-Predictor

Folders and files

Latest commit

History

Repository files navigation

💰 XoX Price Prediction Model

📑 Table of Contents

📊 Project Overview

💼 Business Context

📁 Dataset

📊 Numerical Features

🏷️ Categorical Features

📅 Temporal Features

🎯 Target Variable

🤖 Machine Learning Models

📂 Project Structure

✨ Key Features

🔧 1. Data Preprocessing & Cleaning

🎨 2. Feature Engineering

🎯 3. Model Training & Evaluation

📊 4. Visualization

🚀 Installation

📋 Prerequisites

⚡ Setup

💻 Usage

🎬 Running the Analysis

🔄 Modifying for Your Data

📈 Model Performance

📊 Quick Stats

📏 Evaluation Metrics

📊 Performance Visualizations

⚙️ Technical Implementation

🏗️ Pipeline Architecture

🎯 Hyperparameter Tuning

🏗️ Stacking Strategy

📊 Results

🔍 Model Performance Comparison

📊 Stacking Regressor Performance

💡 Model Insights

🔮 Future Enhancements

🛠️ Technologies Used

🤝 Contributing

📝 License

👤 Author

🙏 Acknowledgments

⭐ If you find this project helpful, please consider giving it a star!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages