This project focuses on predicting the possibility of heart disease using Machine Learning techniques on structured medical data. Multiple classification algorithms were implemented and evaluated to analyze model performance and identify the most effective prediction model.
The project demonstrates the application of Machine Learning in healthcare analytics for assisting in early disease risk detection and supporting data-driven medical decision-making.
- Analyze healthcare data using Machine Learning techniques
- Predict heart disease risk based on patient medical attributes
- Compare multiple classification algorithms
- Evaluate models using advanced performance metrics
- Identify important healthcare features influencing predictions
The dataset used in this project contains patient medical information such as:
- Age
- Sex
- Chest Pain Type
- Cholesterol Level
- Blood Pressure
- Maximum Heart Rate
- Exercise-Induced Angina
- ST Depression (Oldpeak)
- Number of Major Vessels
- Thalassemia
Target Variable:
- Presence or absence of heart disease
- Python
- Jupyter Notebook
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- XGBoost
The following classification algorithms were implemented and compared:
- Logistic Regression
- Support Vector Machine (SVM)
- Random Forest Classifier
- XGBoost Classifier
- Data Collection
- Data Cleaning and Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Selection
- Train-Test Split
- Feature Scaling
- Model Training
- Model Evaluation
- Cross Validation
- ROC-AUC Analysis
- Feature Importance Analysis
- Healthcare Insights and Conclusion
The models were evaluated using:
- Accuracy Score
- Precision
- Recall
- F1-Score
- Confusion Matrix
- ROC-AUC Curve
- Cross Validation Accuracy
| Model | Accuracy |
|---|---|
| Logistic Regression | 87.8% |
| Support Vector Machine (SVM) | 89.2% |
| Random Forest | 88.3% |
| XGBoost | 87.8% |
Support Vector Machine (SVM) achieved the highest overall performance with strong accuracy and balanced evaluation metrics.
- Chest pain type, oldpeak, and maximum heart rate were among the most influential features in predicting heart disease.
- Ensemble models such as Random Forest and XGBoost demonstrated strong predictive capabilities.
- Machine Learning models can effectively assist in early disease risk assessment and healthcare analytics.
- Clone the repository
git clone https://github.com/your-username/CodeAlpha_AI_Powered_Healthcare_Analytics.git- Install required libraries
pip install -r requirements.txt- Open the Jupyter Notebook
jupyter notebook- Run all notebook cells
- Deploy as a real-time web application
- Integrate Explainable AI (XAI)
- Train on larger healthcare datasets
- Add deep learning models for comparison
- Build a healthcare dashboard interface
Machine Learning Intern at Code Alpha