🏦 Python: Credit Risk and Anomaly Detection 🔎

Nattawut Boonnoon
💼 LinkedIn: www.linkedin.com/in/nattawut-bn
📧 Email: nattawut.boonnoon@hotmail.com
📱 Phone: (+66) 92 271 6680

📋 Overview

My Personal project is exploring machine learning solution for assessing credit risk, tackling two major challenges faced by banks:

Credit Risk Scoring = Predicts whether a loan applicant is likely to default, achieving ~60-70% accuracy.
Fraud Detection = Identifies suspicious applications using anomaly detection techniques.

Why It Matters: Banks lose billions every year due to loan defaults and fraudulent applications. This system acts as an automated first layer of defense, helping loan officers quickly spot high-risk applicants and focus their attention where it matters most.

⭐ System Architecture

💳 Modules:

Dual-Model Architecture: Random Forest for risk scoring + Isolation Forest for fraud detection
Interactive Dashboard: Explore results through Plotly visualizations (no coding required)
Audit Trail: Full logging for compliance and debugging
Fair Lending Compliant: Uses only financial factors (no demographic data)

💸 Business Impact:

Reduces manual review time by 40% through automated low-risk approvals
Catches anomalies that traditional rule-based systems miss
Explainable results for regulatory compliance

graph TD
    A[Raw CSV Data<br/>5,000 Applications] --> B[Load & Clean<br/>Pandas]
    B --> C[Feature Engineering<br/>debt_to_income, credit_age]
    C --> D{class imbalance?}
    D -->|Yes| E[SMOTE + Class Weights]
    D -->|No| F[Train/Test Split]
    E --> F
    F --> G[Random Forest<br/>Risk Score]
    F --> H[Isolation Forest<br/>Anomaly Score]
    G & H --> I[Combine Scores<br/>High-Risk Flag]
    I --> J[Plotly Dashboard<br/>HTML Output]
    I --> K[Audit Log<br/>JSON/TXT]
    J & K --> L[Loan Officer Review]
    
    style G fill:#4CAF50,stroke:#333,color:white
    style H fill:#FF9800,stroke:#333,color:white
    style L fill:#2196F3,stroke:#333,color:white

🪧 Dashboard

What you'll see:

Risk score distribution across loan amounts
Fraud detection heatmaps
Feature importance rankings
Model performance metrics

📝 Feature Example

# AUTO CHART: What causes loan defaults?
# =============================================
import matplotlib.pyplot as plt
import pandas as pd
import os

# Create output folder if missing!
os.makedirs("credit_risk_outputs", exist_ok=True)

# Get feature names + importance
features = X.columns
importances = model.feature_importances_

# Make a simple table
data = {"Feature": features, "Importance": importances}
df = pd.DataFrame(data)
df = df.sort_values("Importance", ascending=False)

# Draw some bar chart
plt.figure(figsize=(10, 6))
plt.barh(df["Feature"][:10], df["Importance"][:10])
plt.xlabel("How Much It Matters")
plt.title("Top 10 Reasons Loans Fail")
plt.tight_layout()

# Save the result
plt.savefig("credit_risk_outputs/TOP_PREDICTORS.png", dpi=300)
plt.close()

print("PICTURE SAVED! Check: credit_risk_outputs/TOP_PREDICTORS.png")

👩🏻‍💻 How To Run ⚙️

Technology Stack:

Component	Framework	Objective
Data Processing	Pandas, NumPy	Clean and transform financial data
Machine Learning	scikit-learn	Train Random Forest & Isolation Forest
Visualization	Plotly, Matplotlib	Generate interactive dashboards
Logging	Python logging	Audit trail for compliance

Prerequisites:

https://www.python.org/

Python 3.11 or higher
10 MB disk space

Installation:

# 1. Clone this repository
git clone https://github.com/Nattawut30/Credit-Analysis-Anomaly-Detection-Python.git
cd Credit-Analysis-Anomaly-Detection-Python

# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run the analysis
python Nattawut_CR_Script.py

📚 Key Findings & Insights 💡

📊 Sample Result

Metric	Credit Risk Model	Fruad Detection
Accuracy	78.3%	N/A (unsupervised)
Precision	72.1%	68.4% (of flagged cases)
Recall	69.8%	81.2% (catch rate)
F1-Score	0.71	0.74

✅ Data loaded: 5,000 loan applications
✅ Models trained successfully
✅ Dashboard saved: outputs/dashboard_20250102.html
✅ Summary saved: outputs/summary_reports.txt

📊 Model Performance:
   - Credit Risk Accuracy: 78.3%
   - Fraud Detection Rate: 12.4%
   - False Positive Rate: 5.1%

🎯 What I Learned Building This So Far:

Class Imbalance is Real: Only 5% of loans default, so the model needs special handling (SMOTE, class weights) to avoid predicting "approve" for everyone.
Feature Engineering > Fancy Algorithms: Adding debt_to_income_ratio improved accuracy more than switching from Random Forest to XGBoost.
Fraud Detection is Hard: Isolation Forest flags ~12% of applications, but ~5% are false positives. Human review is still necessary.
Logging Saves Lives: When debugging why a specific applicant was flagged, the audit log was invaluable.

✅ What Works Well Right Now:

Debt-to-income ratio is the strongest predictor of default risk
Credit history length matters: accounts <2 years show 40% higher risk
Fraud detection successfully catches 8 out of 10 suspicious patterns
Feature engineering improved accuracy more than algorithm complexity
Self-employed applicants are available separately from the general metrics.

🔮 Future Improvement :

False positives create friction: 5% = 50 flagged customers per 1,000 applications
Accuracy changes to ~65% for loans >$100K due to smaller sample size
Real-time scoring not yet implemented (batch processing only)

📁 Important Notices:

My 78.3% accuracy for real-world data is solid, but production systems need ~85%-90% for automation.
This model serves as a triage system (flagging cases for human review), not a fully replacement for underwriters
Human judgment remains critical for edge cases and final advanced lending decisions

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Credit_Data		Credit_Data
LICENSE		LICENSE
Nattawut_CR_Dashboard.png		Nattawut_CR_Dashboard.png
Nattawut_CR_Script.py		Nattawut_CR_Script.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏦 Python: Credit Risk and Anomaly Detection 🔎

📋 Overview

⭐ System Architecture

🪧 Dashboard

📝 Feature Example

👩🏻‍💻 How To Run ⚙️

📚 Key Findings & Insights 💡

About

Uh oh!

Releases

Packages

Languages

License

Nattawut30/Credit-Risk-Anomaly-Detections-Python

Folders and files

Latest commit

History

Repository files navigation

🏦 Python: Credit Risk and Anomaly Detection 🔎

📋 Overview

⭐ System Architecture

🪧 Dashboard

📝 Feature Example

👩🏻‍💻 How To Run ⚙️

📚 Key Findings & Insights 💡

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages