From: Comparing different supervised machine learning algorithms for disease prediction
Reference | Disease predicted | Algorithms compared | Type of data | Number of subjects | Cross validation method | Prediction performance | Best one (s) |
---|---|---|---|---|---|---|---|
Aneja and Lal [38] | Asthma | ANN, NB | Disease symptom | 1024 | – | Accuracy (ANN = 85, NB = 88) | NB |
Ayer et al. [39] | Breast cancer | ANN, LR | Clinical and demographic data | 62,219 | 10-fold cross validation | AUC (ANN = 0.965, LR = 0.963) | ANN |
Ahmad et al. [18] | Breast cancer | ANN, DT, SVM | Clinical data for cancer incidence and survival | 1189 | 10-fold cross validation | Accuracy (ANN = 0.947, DT = 0.936, SVM = 0.957) Sensitivity (ANN = 0.956, DT = 0.958, SVM = 0.971) Specificity (ANN = 0.928, DT = 0.907, SVM = 0.945) | SVM |
Lundin et al. [40] | Breast cancer | ANN, LR | Clinical and demographic data | 951 | – | AUC (ANN = 0.909, LR = 0.897) | ANN |
Delen et al. [41] | Breast cancer | ANN, DT, LR | Clinical and demographic data | 202,932 | 10-fold cross validation | Accuracy (ANN = 0.909, DT = 0.935, LR = 0.894) | DT |
Yao et al. [8] | Breast cancer | DT, RF, SVM | Image data | 569 | 10-fold cross validation | Accuracy (DT = 0.932, RF = 0.963, SVM = 0.959) | RF |
Chen et al. [42] | Cerebral infarction | DT, KNN, NB | Electronic health records, medical image and gene data | 31,919 | 10-fold cross validation | AUC (DT = 0.646, KNN = 0.454, NB = 0.495) | DT |
Cai et al. [43] | Diabetes | LR, NB, SVM | Gut microbiota | 489 | 10-fold cross validation | AUC (LR = 0.98, NB = 0.94, SVM = 0.99) | SVM |
Malik et al. [44] | Diabetes | ANN, LR, SVM | Electrochemical measurements of saliva | 175 | 3-fold cross validation | Accuracy (ANN = 80.70, LR = 75.86, SVM = 84.09) F1 score (ANN = 80.20, LR = 75.71, SVM = 84.06) | SVM |
Farran [17] | Diabetes | KNN, LR, SVM | Demographic, anthropometric, vital signs, diagnostic and clinical lab measurement data | 10,632 | 5-fold cross validation | Accuracy (KNN = 79.5, LR = 80.7, SVM = 82.6) | SVM |
Mani et al. [45] | Diabetes | KNN, LR, NB, RF, SVM | Demographic and clinical test result | 2280 | 5-fold cross validation | AUC (KNN = 0.721, LR = 0.755, NB = 0.762, RF = 0.803, SVM = 0.749) | RF |
Tapak et al. [46] | Diabetes | ANN, LR, RF, SVM | Demographic, anthropometric, diagnostic and clinical lab measurement data | 6500 | 10-fold cross validation | Accuracy (ANN = 0.931, LR = 0.935, RF = 0.930, SVM = 0.986) AUC (ANN = 0.751, LR = 0.763, RF = 0.717, SVM = 0.979) | SVM |
Sisodia and Sisodia [47] | Diabetes | DT, NB, SVM | Clinical test result | 768 | 10-fold cross validation | Accuracy (DT = 0.738, NB = 0.763, SVM = 0.651) | NB |
Yang et al. [48] | Diabetes | RF, SVM | Clinical and gene expression data | 9343 | 10-fold cross validation | Accuracy (RF = 0.742, SVM = 0.723) | RF |
Juhola et al. [49] | Heart disease | KNN, RF, SVM | Signal data | – | – | Accuracy (84.5, RF = 87.6, SVM = 87.1) | RF |
Long et al. [50] | Heart disease | ANN, NB, SVM | Clinical, demographic and image data | 537 | – | Accuracy (ANN = 77.8, NB = 83.3, SVM = 75.9 | NB |
Palaniappan and Awang [21] | Heart disease | ANN, DT, NB | Clinical and demographic data | 909 | 2-fold cross validation | Accuracy (ANN = 85.682, DT = 78.8334, NB = 87.885) | NB |
Jin et al. [51] | Heart disease | LR, RF | Electronic health records | 20,000 | 5-fold cross validation | AUC (LR = 0.663, RF = 0.627) | LR |
Puyalnithi and Viswanatham [52] | Heart disease | DT, NB, RF, SVM | Clinical and demographic data | 746 | k-fold and leave-one-out | AUC (DT = 0.940, NB = 0.942, RF = 0.917, SVM = 0.731) | NB |
Forssen et al. [53] | Heart disease | LR, RF | Metabolomic data | 3409 | 50-fold cross validation | Accuracy (LR = 0.767, RF = 0.732) AUC (LR = 0.765, RF = 0.711) | LR |
Tang et al. [54] | Heart disease | ANN, LR | Clinical, demographic, behavioural and medical data | 2092 | – | AUC (ANN = 0.762, LR = 0.758) Accuracy (ANN = 0.714, LR = 0.698) | ANN |
Toshniwal et al. [55] | Heart disease | NB, RF, SVM | Electrocardiography data | 47 | – | Accuracy (NB = 88.44, RF = 98.49, SVM = 98.41) | RF |
Alonso et al. [56] | Heart disease | LR, SVM | Clinical data | 8321 | 5-fold cross validation | AUC (LR = 0.76 and SVM = 0.83) | SVM |
Mustaqeem et al. [57] | Heart disease | KNN, NB, RF, SVM | Electrocardiography data | 452 | 10-fold cross validation | Accuracy (KNN = 76.60, NB = 74.43, RF = 76.50, SVM = 74.47) | KNN |
Mansoor et al. [58] | Heart disease | LR, RF | Demographic and hospital admission | 9637 | 10-fold cross validation | Accuracy (LR = 0.88, RF = 0.89) | RF |
Kim et al. [59] | Heart disease | ANN, DT, LR, SVM | Demographic, behavioural and disease data | 748 | – | AUC (ANN = 0.663, DT = 0.631, LR = 0.658, SVM = 0.664) | SVM |
Kim et al. [59] | Heart disease | ANN, LR | Demographic, behavioural and disease data | 4146 | – | Accuracy (ANN = 87.04, LR = 86.11) | ANN |
Taslimitehrani et al. [60] | Heart disease | DT, LR, RF, SVM | Electronic health records | 119,749 | 2-fold cross validation | AUC (DT = 0.66, LR = 0.81, RF = 0.80, SVM = 0.59) | LR |
Anbarasi et al. [61] | Heart disease | DT, NB | Clinical and demographic data | 909 | k-fold cross validation | Accuracy (DT = 99.2%, NB = 96.5%) | DT |
Bhatla and Jyoti [62] | Heart disease | ANN, DT, NB | Clinical data | 3000 | 10-fold cross validation | Accuracy (ANN = 85.53%, DT = 89%, NB = 86.53%) | DT |
Thenmozhi and Deepika [63] | Heart disease | ANN, DT, NB | Clinical data and medical diagnostic data | – | 10-fold cross validation | Accuracy (ANN = 99.25, DT = 96.66, NB = 94.44) | ANN |
Tamilarasi and Porkodi [64] | Heart disease | ANN, KNN, NB | Clinical and demographic data | – | – | Accuracy (ANN = 99.25, KNN = 100, NB = 85.92) | KNN |
Marikani and Shyamala [65] | Heart disease | DT, KNN, NB, RF, SVM | Clinical and demographic data | 303 | – | Accuracy (DT = 0.954, KNN = 0.757, NB = 0.817, RF = 0.963, SVM = 1.0) | SVM |
Lu et al. [66] | Heart disease | ANN, NB, SVM | Clinical, demographic and diagnostic data | 1090 | – | Accuracy (ANN = 86.04, NB = 82.31, SVM = 86.62) | SVM |
Khateeb and Usman [67] | Heart disease | DT, KNN, NB | Clinical and demographic data | 303 | 10-fold cross validation | Accuracy (DT = 76.89, KNN = 79.20, NB = 66.66) | KNN |
Patel et al. [68] | Heart disease | DT, NB | Clinical and demographic data | – | – | Accuracy (DT = 99.2, NB = 96.5) | DT |
Venkatalakshmi and Shivsankar [69] | Heart disease | DT, NB | Clinical and demographic data | 294 | – | Accuracy (DT = 84.01, NB = 85.03) | DT |
Borah et al. [36] | Hemoglobin variants | DT, KNN, LR, RF, SVM | Clinical and demographic data | 1500 | – | DT and RF (Precision = 93.84, Recall = 92.78, F1 score = 93.33) Precision (KNN = 92.23, LR = 89.23, SVM = 66.67) Recall (KNN = 91.67, LR = 87.34, SVM = 64.78) F1 score (KNN = 91.95, LR = 88.27, SVM = 65.71) | DT, RF |
Farran [17] | Hypertension | KNN, LR, SVM | Demographic, anthropometric, vital signs, diagnostic and clinical lab measurement data | 10,632 | 5-fold cross validation | Accuracy (KNN = 82.4, LR = 82.1, SVM = 83) | SVM |
Ani et al. [70] | Kidney disease | ANN, DT, KNN, NB | Clinical and demographic data | 400 | 10-fold cross validation | Accuracy (ANN = 81, DT = 93, KNN = 90, NB = 78) | DT |
Islam et al. [71] | Liver disease | ANN, LR, RF, SVM | Clinical, demographic and ultrasonography test data | 994 | 10-fold cross validation | Accuracy (ANN = 0.691, LR = 0.707, RF = 0.658, SVM = 0.690) AUC (ANN = 0.733, LR = 0.763, RF = 0.708, SVM = 0.657) | LR |
Lynch et al. [72] | Lung cancer | DT, RF, SVM | Clinical and demographic data | – | 10-fold cross validation | Running Mean Square Error (DT = 15.81, RF = 15.63, SVM = 15.82) | RF |
Chen et al. [73] | microRNA | RF, SVM | microRNA data | 96,325 | 5-fold cross validation | Accuracy (RF = 75.24, SVM = 70.02) | RF |
Eskidere et al. [74] | Parkinson’s disease | ANN, SVM | Voice recording and demographic data | 42 | 10-fold cross validation | Mean absolute error (SVM = 6.99), ANN = 8.20) | SVM |
Chen et al. [75] | Parkinson’s disease | KNN, SVM | Voice recording and demographic data | 31 | 10-fold cross validation | Accuracy (KNN = 95.78, SVM = 93.52) AUC (KNN = 95.60, SVM = 91.12) | KNN |
Behroozi and Sami [76] | Parkinson’s disease | KNN, NB, SVM | Voice recording and demographic data | 40 | Leave-one-out | Accuracy (KNN = 77.50, NB = 80.00, SVM = 87.50) | SVM |
Hussain et al. [77] | Prostate cancer | DT, NB, SVM | Magnetic resonance imaging data | 20 | 10-fold cross validation | AUC (DT = 0.955, NB = 0.989, SVM = 0.997) | SVM |
Zupan et al. [78] | Prostate cancer | DT, NB | Clinical data | 2051 | 10-fold cross validation | Accuracy (NB = 70.80, DT = 68.80) | NB |
Hung et al. [79] | Stroke | ANN, LR, SVM | Electronic medical claim and demographic data | 798,611 | – | Accuracy (ANN = 0.873, LR = 0.866, SVM = 0.839) | ANN |