A comprehensive machine learning project for predicting XoX product prices using multiple regression algorithms and ensemble methods.
- ๐ Project Overview
- ๐ผ Business Context
- ๐ Dataset
- ๐ค Machine Learning Models
- ๐ Project Structure
- โจ Key Features
- ๐ Installation
- ๐ป Usage
- ๐ Model Performance
- โ๏ธ Technical Implementation
- ๐ Results
- ๐ฎ Future Enhancements
- ๐ค Contributing
- ๐ License
- ๐ค Author
- ๐ Acknowledgments
As an agency helping customers purchase XoX products from various makers, price estimation is critical for making informed purchasing decisions. This project develops and compares multiple machine learning models to accurately predict XoX prices based on product characteristics, dimensions, and other features.
Our agency needs to estimate XoX prices before purchase to:
- ๐ฏ Recommend products at optimal price points
- ๐ Identify pricing trends across different makers
- ๐ก Make data-driven purchasing decisions
- ๐ Understand which features most influence pricing
The project uses sales data containing the following features:
cost- Production costweight- Product weightheight,width,depth- Product dimensionsvolume- Calculated as height ร width ร depth
product_type- Type classification (can be multi-valued)product_level- Product tier/levelmaker- Manufacturer nameingredients- Product composition (can be multi-valued)
purchase_date- Date of purchase- Derived:
year,month,weekday,day
price- XoX product price
This project implements and compares 8 regression models with comprehensive evaluation:
| # | Model | Type | Key Feature |
|---|---|---|---|
| 1๏ธโฃ | Linear Regression | Baseline | Simple linear relationships |
| 2๏ธโฃ | Ridge Regression | Regularized | L2 regularization |
| 3๏ธโฃ | Lasso Regression | Regularized | L1 + feature selection |
| 4๏ธโฃ | PLS Regression | Dimensionality Reduction | Partial Least Squares |
| 5๏ธโฃ | Random Forest | Ensemble | Decision tree ensemble |
| 6๏ธโฃ | Gradient Boosting | Ensemble | Sequential boosting |
| 7๏ธโฃ | XGBoost | Ensemble | Optimized gradient boosting |
| 8๏ธโฃ | Stacking Regressor | Meta-Ensemble | Combines top performers โญ |
Price_predictor/
โโโ ๐ README.md # Project documentation
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ regression modeling.ipynb # Main analysis notebook
โโโ ๐ data/
โ โโโ ๐ sample_data.csv # Sample dataset
โโโ ๐ results/ # Model performance visualizations
โ โโโ ๐ performance.png # Radar plots of all models
โ โโโ ๐ stacking regressor.png # Stacking Regressor predictions
โโโ ๐ฝ๏ธ XoX_Price_Prediction_Model.pptx # Project presentation
- Custom transformation functions for price, cost, and dimension conversions
- Handling of multi-valued categorical features
- Missing value imputation strategies
- Feature engineering including volume calculation
- Temporal feature extraction (year, month, weekday, day)
- Volume calculation from dimensions
- Numerical feature scaling using MinMaxScaler
- Categorical encoding with OneHotEncoder
- Custom text processing for multi-valued categories
- Cross-validation for robust performance estimation
- Hyperparameter tuning using GridSearchCV
- Comprehensive metrics: MAE, MSE, RMSE, Rยฒ
- Composite scoring system for model comparison
- Distribution analysis of features and target
- Correlation heatmaps
- Model performance comparison charts
- Radar plots for multi-metric evaluation
- Prediction vs actual plots
- Python 3.8 or higher
- Jupyter Notebook
- Clone the repository:
git clone https://github.com/sys0507/Price-Predictor.git
cd Price-Predictor- Install required packages:
pip install -r requirements.txt- Launch Jupyter Notebook:
jupyter notebook- Open and run:
- Open
regression modeling.ipynb - Run all cells
โถ๏ธ
- Open
- Data Loading ๐ฅ - The notebook loads data from
data/sample_data.csv - Preprocessing ๐ง - Automatic data cleaning and transformation
- Model Training ๐๏ธ - All models are trained with optimized hyperparameters
- Evaluation ๐ - Comprehensive performance metrics are calculated
- Visualization ๐ - Results are visualized in multiple formats
To use your own sales data:
- Format your data to match the expected schema (see Dataset section)
- Place your CSV file in the
data/folder - Update the file path in the notebook (Cell 1)
- Run all cells
| Metric | Value |
|---|---|
| Total Models | 8 (7 individual + 1 ensemble) |
| Best Performer | Stacking Regressor โญ |
| Features Analyzed | 10+ (numerical, categorical, temporal) |
| Evaluation Metrics | MAE, MSE, RMSE, Rยฒ |
The notebook includes detailed performance comparisons across all models using:
- Train/Test MAE - Mean Absolute Error
- Train/Test MSE - Mean Squared Error
- Train/Test Rยฒ - Coefficient of Determination
- Composite Score - Weighted metric combining all measures
Performance visualizations include:
- ๐ Bar charts comparing metrics across models
- ๐บ๏ธ Normalized heatmaps for multi-metric view
- ๐ฏ Stacked and individual radar plots
- ๐ Actual vs Predicted scatter plots
Pipeline([
('preprocessor', ColumnTransformer([
('numerical', MinMaxScaler(), numerical_features),
('categorical', OneHotEncoder(), categorical_features),
('temporal', StandardScaler(), temporal_features)
])),
('regressor', Model())
])Each model undergoes GridSearchCV optimization with model-specific parameter grids to find optimal configurations.
The Stacking Regressor combines top-performing base models with a meta-learner to achieve superior prediction accuracy.
The following radar plots show the normalized performance metrics (MAE, MSE, Rยฒ) for all models across both training and test sets:
๐ Key Observations:
- โญ XGBoost and Gradient Boosting show the most balanced performance across all metrics
- ๐ฒ Random Forest demonstrates strong Rยฒ scores on both train and test sets
- ๐ Lasso Regression shows signs of underfitting with lower overall performance
- ๐ Models with PCA show different metric patterns compared to non-PCA versions
The Stacking Regressor combines the best-performing models to achieve superior prediction accuracy:
๐ฏ Performance Highlights:
- โ Strong correlation between predicted and actual prices on both train and test sets
- โ Good generalization with similar performance patterns across train/test splits
- โ Effective handling of the full price range from low to high-cost products
The notebook provides comprehensive analysis including:
- ๐ Feature importance rankings
- ๐ Temporal price trends
- ๐ท๏ธ Categorical feature impact analysis
- โ๏ธ Model strengths and weaknesses
- ๐ Recommendations for production deployment
Potential improvements include:
- ๐ง Deep learning models (Neural Networks)
- ๐ง Additional feature engineering
- ๐ Time series forecasting components
- ๐ฏ Automated feature selection
- ๐ Model interpretability tools (SHAP, LIME)
- ๐ Deployment as REST API
| Category | Technologies |
|---|---|
| Language | Python 3.8+ |
| ML Libraries | scikit-learn, XGBoost |
| Data Processing | Pandas, NumPy |
| Visualization | Matplotlib, Seaborn |
| Development | Jupyter Notebook |
| Version Control | Git, GitHub |
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is open source and available for educational and commercial use under the MIT License.
Created by sys0507
Feel free to reach out for questions, suggestions, or collaborations!
- ๐ Techlent ML Camp for project guidance
- ๐ฌ Scikit-learn and XGBoost teams for excellent ML libraries
- ๐ป The open-source community for tools and inspiration
Made with โค๏ธ for the ML community
๐ Note: The current
sample_data.csvis placeholder data. For actual price predictions, please use real XoX sales data formatted according to the schema described above.

