🧠 Predict2Protect – Company Failure Prediction Challenge

This repository contains my solution for the Predict2Protect Machine Learning Competition, where the goal was to predict whether a company is Alive or Failed based on its financial and operational features.

🏁 Competition Overview

The challenge involved developing a classification model that predicts a company’s status_label (Alive / Failed).
Data consisted of company-level records, each belonging to a specific Division and MajorGroup.
The evaluation metric was Macro F1-Score, emphasizing balanced performance on both classes.

🚀 Approach Summary

1. Data Understanding & EDA

Checked for missing values, data imbalance, and outliers.
Verified no company overlap between train and test (leakage prevention).
Analyzed Division and MajorGroup hierarchies (noting class imbalance).
Ensured that every company’s status_label is consistent (all rows share the same label).

2. Feature Engineering

Generated ratio-based and interaction features.
Aggregated statistics at the company level.
Applied label encoding for categorical columns.
Removed redundant or highly correlated features.

3. Model Building

Implemented LightGBM, CatBoost, XGBoost, and ExtraTrees models.
Used GroupKFold Cross-Validation (grouped by company_name) to avoid data leakage.
Optimized Macro F1-Score via threshold tuning.
Final predictions created using weighted averaging across model outputs.

4. Post-Processing

Enforced company-level label consistency:

If any row of a company was failed, the whole company was marked as failed.
Calibrated thresholds to maximize F1 on validation folds.

📊 Results

Metric	Score
Cross-Validation Macro F1	0.63 ± 0.01
Final Leaderboard Macro F1	0.635972
Final Rank	#14 / Top 15 Finalist

💬 Key Learnings

Importance of group-aware validation to prevent leakage.
Effectiveness of label consistency enforcement at company level.
Balancing precision and recall through threshold optimization.
Robust pipeline design beats leaderboard hacks in the long run.

🔮 Future Work

Advanced hyperparameter optimization (Optuna/Bayesian tuning).
Seed aggregation and model ensembling for variance reduction.
Hierarchical encoding of Division → MajorGroup relationships.

🧰 Tech Stack

Python, Pandas, NumPy
LightGBM, CatBoost, XGBoost, Scikit-Learn
Matplotlib, Seaborn, tqdm

🧑‍💻 Author

Devkanth Ravi
Top-15 Finalist, Predict2Protect 2025
📫 Connect: LinkedIn | Kaggle

“A consistent, explainable model always outlasts a lucky leaderboard spike.”

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Bankruptcy Features Table.pdf		Bankruptcy Features Table.pdf
LICENSE		LICENSE
Predict2Protect (1).ipynb		Predict2Protect (1).ipynb
Predict2Protect-ML Monarchs.pdf		Predict2Protect-ML Monarchs.pdf
README.md		README.md
predict2protect_final_submission (1).csv		predict2protect_final_submission (1).csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Predict2Protect – Company Failure Prediction Challenge

🏁 Competition Overview

🚀 Approach Summary

1. Data Understanding & EDA

2. Feature Engineering

3. Model Building

4. Post-Processing

📊 Results

💬 Key Learnings

🔮 Future Work

🧰 Tech Stack

🧑‍💻 Author

About

Uh oh!

Releases

Packages

Languages

License

MonarchofCoding/predict2protect-company-failure-prediction

Folders and files

Latest commit

History

Repository files navigation

🧠 Predict2Protect – Company Failure Prediction Challenge

🏁 Competition Overview

🚀 Approach Summary

1. Data Understanding & EDA

2. Feature Engineering

3. Model Building

4. Post-Processing

📊 Results

💬 Key Learnings

🔮 Future Work

🧰 Tech Stack

🧑‍💻 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages