An analysis of cardiovascular risk prediction using machine learning techniques.
This project focuses on predicting the 10-year risk of cardiovascular disease using demographic, clinical, and laboratory data. Various machine learning algorithms are applied and evaluated for their performance in predicting cardiovascular risk.
- Age and Gender: Age and gender are significant risk factors for cardiovascular disease, with men being more likely to develop CHD than women.
- Smoking: Smoking is a risk factor for CHD, and smoking intensity plays a role in determining the risk.
- Clinical Variables: High blood pressure, stroke, and diabetes are associated with a higher risk of CHD.
- Laboratory Variables: Patients with high cholesterol levels may be at a slightly higher risk for CHD.
- Model Performance: Random Forest Classifier and XGBoost models performed the best, with high accuracy, precision, and recall scores.
- Accuracy Rate: The Random Forest Classifier model achieved an accuracy rate of 90.36% in predicting cardiovascular risk.
- Python: Used for data analysis, manipulation, and visualization.
- Pandas: Employed for data manipulation and analysis.
- Matplotlib and Seaborn: Utilized for data visualization to create insightful plots and graphs.
- Scikit-learn: Implemented various machine learning algorithms for predictive modeling.
Model | Test Accuracy | Test Precision | Test Recall | Test ROC AUC |
---|---|---|---|---|
Logistic Regression | 0.6571 | 0.6273 | 0.6945 | 0.6587 |
Random Forest Classifier | 0.9036 | 0.8791 | 0.9255 | 0.9046 |
XGBoost | 0.9019 | 0.8951 | 0.9000 | 0.9018 |
KNN | 0.8194 | 0.7317 | 0.9818 | 0.8265 |
SVC | 0.7899 | 0.7369 | 0.8709 | 0.7934 |
NBClassifier | 0.5694 | 0.6985 | 0.1727 | 0.5523 |
- Improved Risk Assessment: Machine learning models can provide more accurate predictions of cardiovascular risk compared to traditional risk assessment methods.
- Early Intervention: Early identification of individuals at high risk of cardiovascular disease allows for timely intervention and preventive measures.
- Personalized Medicine: Machine learning models can help tailor interventions and treatments based on individual risk profiles.
- Healthcare Resource Allocation: Predictive models can assist healthcare providers in allocating resources more efficiently by targeting high-risk individuals.
Special thanks to the Framingham Heart Study for providing the dataset used in this project.
This project was completed as part of the Data Science Trainee program at AlmaBetter.