Skip to content

Machine learning project for predicting XoX product prices using multiple regression algorithms and ensemble methods

Notifications You must be signed in to change notification settings

sys0507/Price-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ’ฐ XoX Price Prediction Model

A comprehensive machine learning project for predicting XoX product prices using multiple regression algorithms and ensemble methods.

Python 3.8+ scikit-learn Pandas NumPy Matplotlib Jupyter Notebook

License Status GitHub stars GitHub forks


๐Ÿ“‘ Table of Contents


๐Ÿ“Š Project Overview

As an agency helping customers purchase XoX products from various makers, price estimation is critical for making informed purchasing decisions. This project develops and compares multiple machine learning models to accurately predict XoX prices based on product characteristics, dimensions, and other features.

๐Ÿ’ผ Business Context

Our agency needs to estimate XoX prices before purchase to:

  • ๐ŸŽฏ Recommend products at optimal price points
  • ๐Ÿ“ˆ Identify pricing trends across different makers
  • ๐Ÿ’ก Make data-driven purchasing decisions
  • ๐Ÿ” Understand which features most influence pricing

๐Ÿ“ Dataset

The project uses sales data containing the following features:

๐Ÿ“Š Numerical Features

  • cost - Production cost
  • weight - Product weight
  • height, width, depth - Product dimensions
  • volume - Calculated as height ร— width ร— depth

๐Ÿท๏ธ Categorical Features

  • product_type - Type classification (can be multi-valued)
  • product_level - Product tier/level
  • maker - Manufacturer name
  • ingredients - Product composition (can be multi-valued)

๐Ÿ“… Temporal Features

  • purchase_date - Date of purchase
  • Derived: year, month, weekday, day

๐ŸŽฏ Target Variable

  • price - XoX product price

๐Ÿค– Machine Learning Models

This project implements and compares 8 regression models with comprehensive evaluation:

# Model Type Key Feature
1๏ธโƒฃ Linear Regression Baseline Simple linear relationships
2๏ธโƒฃ Ridge Regression Regularized L2 regularization
3๏ธโƒฃ Lasso Regression Regularized L1 + feature selection
4๏ธโƒฃ PLS Regression Dimensionality Reduction Partial Least Squares
5๏ธโƒฃ Random Forest Ensemble Decision tree ensemble
6๏ธโƒฃ Gradient Boosting Ensemble Sequential boosting
7๏ธโƒฃ XGBoost Ensemble Optimized gradient boosting
8๏ธโƒฃ Stacking Regressor Meta-Ensemble Combines top performers โญ

๐Ÿ“‚ Project Structure

Price_predictor/
โ”œโ”€โ”€ ๐Ÿ“„ README.md                           # Project documentation
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt                    # Python dependencies
โ”œโ”€โ”€ ๐Ÿ““ regression modeling.ipynb           # Main analysis notebook
โ”œโ”€โ”€ ๐Ÿ“ data/
โ”‚   โ””โ”€โ”€ ๐Ÿ“Š sample_data.csv                # Sample dataset
โ”œโ”€โ”€ ๐Ÿ“ results/                            # Model performance visualizations
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ˆ performance.png                # Radar plots of all models
โ”‚   โ””โ”€โ”€ ๐Ÿ“‰ stacking regressor.png         # Stacking Regressor predictions
โ””โ”€โ”€ ๐Ÿ“ฝ๏ธ XoX_Price_Prediction_Model.pptx    # Project presentation

โœจ Key Features

๐Ÿ”ง 1. Data Preprocessing & Cleaning

  • Custom transformation functions for price, cost, and dimension conversions
  • Handling of multi-valued categorical features
  • Missing value imputation strategies
  • Feature engineering including volume calculation

๐ŸŽจ 2. Feature Engineering

  • Temporal feature extraction (year, month, weekday, day)
  • Volume calculation from dimensions
  • Numerical feature scaling using MinMaxScaler
  • Categorical encoding with OneHotEncoder
  • Custom text processing for multi-valued categories

๐ŸŽฏ 3. Model Training & Evaluation

  • Cross-validation for robust performance estimation
  • Hyperparameter tuning using GridSearchCV
  • Comprehensive metrics: MAE, MSE, RMSE, Rยฒ
  • Composite scoring system for model comparison

๐Ÿ“Š 4. Visualization

  • Distribution analysis of features and target
  • Correlation heatmaps
  • Model performance comparison charts
  • Radar plots for multi-metric evaluation
  • Prediction vs actual plots

๐Ÿš€ Installation

๐Ÿ“‹ Prerequisites

  • Python 3.8 or higher
  • Jupyter Notebook

โšก Setup

  1. Clone the repository:
git clone https://github.com/sys0507/Price-Predictor.git
cd Price-Predictor
  1. Install required packages:
pip install -r requirements.txt
  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open and run:
    • Open regression modeling.ipynb
    • Run all cells โ–ถ๏ธ

๐Ÿ’ป Usage

๐ŸŽฌ Running the Analysis

  1. Data Loading ๐Ÿ“ฅ - The notebook loads data from data/sample_data.csv
  2. Preprocessing ๐Ÿ”ง - Automatic data cleaning and transformation
  3. Model Training ๐Ÿ‹๏ธ - All models are trained with optimized hyperparameters
  4. Evaluation ๐Ÿ“Š - Comprehensive performance metrics are calculated
  5. Visualization ๐Ÿ“ˆ - Results are visualized in multiple formats

๐Ÿ”„ Modifying for Your Data

To use your own sales data:

  1. Format your data to match the expected schema (see Dataset section)
  2. Place your CSV file in the data/ folder
  3. Update the file path in the notebook (Cell 1)
  4. Run all cells

๐Ÿ“ˆ Model Performance

๐Ÿ“Š Quick Stats

Metric Value
Total Models 8 (7 individual + 1 ensemble)
Best Performer Stacking Regressor โญ
Features Analyzed 10+ (numerical, categorical, temporal)
Evaluation Metrics MAE, MSE, RMSE, Rยฒ

๐Ÿ“ Evaluation Metrics

The notebook includes detailed performance comparisons across all models using:

  • Train/Test MAE - Mean Absolute Error
  • Train/Test MSE - Mean Squared Error
  • Train/Test Rยฒ - Coefficient of Determination
  • Composite Score - Weighted metric combining all measures

๐Ÿ“Š Performance Visualizations

Performance visualizations include:

  • ๐Ÿ“Š Bar charts comparing metrics across models
  • ๐Ÿ—บ๏ธ Normalized heatmaps for multi-metric view
  • ๐ŸŽฏ Stacked and individual radar plots
  • ๐Ÿ“ˆ Actual vs Predicted scatter plots

โš™๏ธ Technical Implementation

๐Ÿ—๏ธ Pipeline Architecture

Pipeline([
    ('preprocessor', ColumnTransformer([
        ('numerical', MinMaxScaler(), numerical_features),
        ('categorical', OneHotEncoder(), categorical_features),
        ('temporal', StandardScaler(), temporal_features)
    ])),
    ('regressor', Model())
])

๐ŸŽฏ Hyperparameter Tuning

Each model undergoes GridSearchCV optimization with model-specific parameter grids to find optimal configurations.

๐Ÿ—๏ธ Stacking Strategy

The Stacking Regressor combines top-performing base models with a meta-learner to achieve superior prediction accuracy.


๐Ÿ“Š Results

๐Ÿ” Model Performance Comparison

The following radar plots show the normalized performance metrics (MAE, MSE, Rยฒ) for all models across both training and test sets:

Model Performance Radar Plots

๐Ÿ”‘ Key Observations:

  • โญ XGBoost and Gradient Boosting show the most balanced performance across all metrics
  • ๐ŸŒฒ Random Forest demonstrates strong Rยฒ scores on both train and test sets
  • ๐Ÿ“‰ Lasso Regression shows signs of underfitting with lower overall performance
  • ๐Ÿ”„ Models with PCA show different metric patterns compared to non-PCA versions

๐Ÿ“Š Stacking Regressor Performance

The Stacking Regressor combines the best-performing models to achieve superior prediction accuracy:

Stacking Regressor - Predicted vs Actual

๐ŸŽฏ Performance Highlights:

  • โœ… Strong correlation between predicted and actual prices on both train and test sets
  • โœ… Good generalization with similar performance patterns across train/test splits
  • โœ… Effective handling of the full price range from low to high-cost products

๐Ÿ’ก Model Insights

The notebook provides comprehensive analysis including:

  • ๐Ÿ“Š Feature importance rankings
  • ๐Ÿ“ˆ Temporal price trends
  • ๐Ÿท๏ธ Categorical feature impact analysis
  • โš–๏ธ Model strengths and weaknesses
  • ๐Ÿš€ Recommendations for production deployment

๐Ÿ”ฎ Future Enhancements

Potential improvements include:

  • ๐Ÿง  Deep learning models (Neural Networks)
  • ๐Ÿ”ง Additional feature engineering
  • ๐Ÿ“… Time series forecasting components
  • ๐ŸŽฏ Automated feature selection
  • ๐Ÿ” Model interpretability tools (SHAP, LIME)
  • ๐ŸŒ Deployment as REST API

๐Ÿ› ๏ธ Technologies Used

Category Technologies
Language Python 3.8+
ML Libraries scikit-learn, XGBoost
Data Processing Pandas, NumPy
Visualization Matplotlib, Seaborn
Development Jupyter Notebook
Version Control Git, GitHub

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“ License

This project is open source and available for educational and commercial use under the MIT License.


๐Ÿ‘ค Author

Created by sys0507

Feel free to reach out for questions, suggestions, or collaborations!


๐Ÿ™ Acknowledgments

  • ๐ŸŽ“ Techlent ML Camp for project guidance
  • ๐Ÿ”ฌ Scikit-learn and XGBoost teams for excellent ML libraries
  • ๐Ÿ’ป The open-source community for tools and inspiration

โญ If you find this project helpful, please consider giving it a star!

Made with โค๏ธ for the ML community


๐Ÿ“Œ Note: The current sample_data.csv is placeholder data. For actual price predictions, please use real XoX sales data formatted according to the schema described above.

โฌ† Back to Top

About

Machine learning project for predicting XoX product prices using multiple regression algorithms and ensemble methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 2

  •  
  •