Foodmart Data Mining Project

A comprehensive data mining analysis of the Foodmart dataset using K-medoids clustering and Decision Tree Regression to extract meaningful insights from retail sales data.

Project Overview

This project applies advanced data mining techniques to analyze the Foodmart dataset, which contains retail sales transaction data. The analysis combines unsupervised learning (K-medoids clustering) with supervised learning (Decision Tree Regression) to discover customer patterns and predict sales outcomes.

Objectives

Customer Segmentation: Use K-medoids clustering to identify distinct customer groups based on purchasing behavior
Sales Prediction: Implement Decision Tree Regressor to predict sales values based on various features
Pattern Discovery: Extract actionable insights from retail transaction data
Performance Evaluation: Assess model accuracy and clustering quality

Technologies Used

Python 3.x
Pandas - Data manipulation and analysis
NumPy - Numerical computing
Scikit-learn - Machine learning algorithms
Matplotlib/Seaborn - Data visualization
Jupyter Notebook - Interactive development environment

Dataset

The Foodmart dataset is a well-known retail sales dataset containing:

Customer transaction records
Product information
Sales figures
Time-based data
Store and location details

Methodology

1. K-medoids Clustering

Purpose: Customer segmentation and pattern identification
Algorithm: Partitioning Around Medoids (PAM)
Features: Customer purchasing behavior, transaction frequency, sales amounts
Output:3 Customer clusters with distinct characteristics

2. Decision Tree Regressor

Purpose: Sales prediction and feature importance analysis
Target Variable: Sales amount/revenue
Features: Product categories, customer segments, temporal factors
Output: Predicted sales values and feature importance rankings

Getting Started

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn jupyter

Installation

Clone the repository:

git clone https://github.com/yourusername/foodmart-data-mining.git
cd foodmart-data-mining

Install required dependencies:

pip install -r requirements.txt

Download the Foodmart dataset and place it in the data/ directory

Usage

Data Preprocessing:

python scripts/data_preprocessing.py

K-medoids Clustering:

python scripts/kmedoids_clustering.py

Decision Tree Analysis:

python scripts/decision_tree_regression.py

Run Complete Analysis:

python main.py

Results

K-medoids Clustering Results

Optimal Clusters: X clusters identified using silhouette analysis
Customer Segments:
- High-value customers
- Frequent buyers
- Seasonal shoppers
- Occasional purchasers

Decision Tree Regressor Performance

Mean Squared Error: 1.86 R² Score: 0.85 Best Parameters: {'max_depth': 7, 'min_samples_leaf': 4, 'min_samples_split': 10}

Key Insights

Most important features for sales prediction
Customer segment characteristics
Seasonal patterns in sales data
Product category performance

Visualizations

The project generates several visualizations:

Cluster visualization using PCA
Decision tree structure
Feature importance plots
Sales prediction accuracy charts
Customer segment analysis

Project Structure

foodmart-data-mining/
│
├── data/
│   ├── raw/                 # Raw Foodmart dataset
│   └── processed/           # Cleaned and preprocessed data
│
├── notebooks/
│   ├── exploratory_analysis.ipynb
│   ├── clustering_analysis.ipynb
│   └── regression_analysis.ipynb
│
├── scripts/
│   ├── data_preprocessing.py
│   ├── kmedoids_clustering.py
│   ├── decision_tree_regression.py
│   └── visualization.py
│
├── results/
│   ├── models/              # Saved models
│   ├── plots/               # Generated visualizations
│   └── reports/             # Analysis reports
│
├── requirements.txt
├── main.py
└── README.md

Key Features

Robust Data Preprocessing: Handles missing values, outliers, and data normalization
Optimal Clustering: Uses silhouette analysis to determine optimal number of clusters
Model Evaluation: Comprehensive evaluation metrics for both clustering and regression
Interactive Visualizations: Clear plots and charts for result interpretation
Scalable Code: Modular design for easy extension and modification

📋 Requirements

pandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
matplotlib>=3.4.0
seaborn>=0.11.0
jupyter>=1.0.0

Acknowledgments

Foodmart dataset providers
Scikit-learn community
Open source data mining community

This project demonstrates the application of unsupervised and supervised learning techniques for retail data analysis and business intelligence.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
FoodMArtAnalysis.ipynb		FoodMArtAnalysis.ipynb
README.md		README.md
Report.pdf		Report.pdf
StoresData.xlsx		StoresData.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Foodmart Data Mining Project

Project Overview

Objectives

Technologies Used

Dataset

Methodology

1. K-medoids Clustering

2. Decision Tree Regressor

Getting Started

Prerequisites

Installation

Usage

Results

K-medoids Clustering Results

Decision Tree Regressor Performance

Key Insights

Visualizations

Project Structure

Key Features

📋 Requirements

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Mariam-coder7/Data-mining-on-FoodMart

Folders and files

Latest commit

History

Repository files navigation

Foodmart Data Mining Project

Project Overview

Objectives

Technologies Used

Dataset

Methodology

1. K-medoids Clustering

2. Decision Tree Regressor

Getting Started

Prerequisites

Installation

Usage

Results

K-medoids Clustering Results

Decision Tree Regressor Performance

Key Insights

Visualizations

Project Structure

Key Features

📋 Requirements

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages