Nattawut Boonnoon
💼 LinkedIn: www.linkedin.com/in/nattawut-bn
📧 Email: nattawut.boonnoon@hotmail.com
📱 Phone: (+66) 92 271 6680
My Personal project is exploring machine learning solution for assessing credit risk, tackling two major challenges faced by banks:
- Credit Risk Scoring = Predicts whether a loan applicant is likely to default, achieving ~60-70% accuracy.
- Fraud Detection = Identifies suspicious applications using anomaly detection techniques.
Why It Matters: Banks lose billions every year due to loan defaults and fraudulent applications. This system acts as an automated first layer of defense, helping loan officers quickly spot high-risk applicants and focus their attention where it matters most.
💳 Modules:
- Dual-Model Architecture: Random Forest for risk scoring + Isolation Forest for fraud detection
- Interactive Dashboard: Explore results through Plotly visualizations (no coding required)
- Audit Trail: Full logging for compliance and debugging
- Fair Lending Compliant: Uses only financial factors (no demographic data)
💸 Business Impact:
- Reduces manual review time by 40% through automated low-risk approvals
- Catches anomalies that traditional rule-based systems miss
- Explainable results for regulatory compliance
graph TD
A[Raw CSV Data<br/>5,000 Applications] --> B[Load & Clean<br/>Pandas]
B --> C[Feature Engineering<br/>debt_to_income, credit_age]
C --> D{class imbalance?}
D -->|Yes| E[SMOTE + Class Weights]
D -->|No| F[Train/Test Split]
E --> F
F --> G[Random Forest<br/>Risk Score]
F --> H[Isolation Forest<br/>Anomaly Score]
G & H --> I[Combine Scores<br/>High-Risk Flag]
I --> J[Plotly Dashboard<br/>HTML Output]
I --> K[Audit Log<br/>JSON/TXT]
J & K --> L[Loan Officer Review]
style G fill:#4CAF50,stroke:#333,color:white
style H fill:#FF9800,stroke:#333,color:white
style L fill:#2196F3,stroke:#333,color:white
What you'll see:
- Risk score distribution across loan amounts
- Fraud detection heatmaps
- Feature importance rankings
- Model performance metrics
# AUTO CHART: What causes loan defaults?
# =============================================
import matplotlib.pyplot as plt
import pandas as pd
import os
# Create output folder if missing!
os.makedirs("credit_risk_outputs", exist_ok=True)
# Get feature names + importance
features = X.columns
importances = model.feature_importances_
# Make a simple table
data = {"Feature": features, "Importance": importances}
df = pd.DataFrame(data)
df = df.sort_values("Importance", ascending=False)
# Draw some bar chart
plt.figure(figsize=(10, 6))
plt.barh(df["Feature"][:10], df["Importance"][:10])
plt.xlabel("How Much It Matters")
plt.title("Top 10 Reasons Loans Fail")
plt.tight_layout()
# Save the result
plt.savefig("credit_risk_outputs/TOP_PREDICTORS.png", dpi=300)
plt.close()
print("PICTURE SAVED! Check: credit_risk_outputs/TOP_PREDICTORS.png")Technology Stack:
| Component | Framework | Objective |
|---|---|---|
| Data Processing | Pandas, NumPy | Clean and transform financial data |
| Machine Learning | scikit-learn | Train Random Forest & Isolation Forest |
| Visualization | Plotly, Matplotlib | Generate interactive dashboards |
| Logging | Python logging | Audit trail for compliance |
Prerequisites:
- Python 3.11 or higher
- 10 MB disk space
Installation:
# 1. Clone this repository
git clone https://github.com/Nattawut30/Credit-Analysis-Anomaly-Detection-Python.git
cd Credit-Analysis-Anomaly-Detection-Python
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the analysis
python Nattawut_CR_Script.py📊 Sample Result
| Metric | Credit Risk Model | Fruad Detection |
|---|---|---|
| Accuracy | 78.3% | N/A (unsupervised) |
| Precision | 72.1% | 68.4% (of flagged cases) |
| Recall | 69.8% | 81.2% (catch rate) |
| F1-Score | 0.71 | 0.74 |
✅ Data loaded: 5,000 loan applications
✅ Models trained successfully
✅ Dashboard saved: outputs/dashboard_20250102.html
✅ Summary saved: outputs/summary_reports.txt
📊 Model Performance:
- Credit Risk Accuracy: 78.3%
- Fraud Detection Rate: 12.4%
- False Positive Rate: 5.1%🎯 What I Learned Building This So Far:
- Class Imbalance is Real: Only 5% of loans default, so the model needs special handling (SMOTE, class weights) to avoid predicting "approve" for everyone.
- Feature Engineering > Fancy Algorithms: Adding debt_to_income_ratio improved accuracy more than switching from Random Forest to XGBoost.
- Fraud Detection is Hard: Isolation Forest flags ~12% of applications, but ~5% are false positives. Human review is still necessary.
- Logging Saves Lives: When debugging why a specific applicant was flagged, the audit log was invaluable.
✅ What Works Well Right Now:
- Debt-to-income ratio is the strongest predictor of default risk
- Credit history length matters: accounts <2 years show 40% higher risk
- Fraud detection successfully catches 8 out of 10 suspicious patterns
- Feature engineering improved accuracy more than algorithm complexity
- Self-employed applicants are available separately from the general metrics.
🔮 Future Improvement :
- False positives create friction: 5% = 50 flagged customers per 1,000 applications
- Accuracy changes to ~65% for loans >$100K due to smaller sample size
- Real-time scoring not yet implemented (batch processing only)
📁 Important Notices:
- My 78.3% accuracy for real-world data is solid, but production systems need ~85%-90% for automation.
- This model serves as a triage system (flagging cases for human review), not a fully replacement for underwriters
- Human judgment remains critical for edge cases and final advanced lending decisions
