- Research
- Open access
- Published:
Machine learning algorithms for the prediction of adverse prognosis in patients undergoing peritoneal dialysis
BMC Medical Informatics and Decision Making volume 24, Article number: 8 (2024)
Abstract
Background
An appropriate prediction model for adverse prognosis before peritoneal dialysis (PD) is lacking. Thus, we retrospectively analysed patients who underwent PD to construct a predictive model for adverse prognoses using machine learning (ML).
Methods
A retrospective analysis was conducted on 873 patients who underwent PD from August 2007 to December 2020. A total of 824 patients who met the inclusion criteria were included in the analysis. Five commonly used ML algorithms were used for the initial model training. By using the area under the curve (AUC) and accuracy (ACC), we ranked the indicators with the highest impact and displayed them using the values of Shapley additive explanation (SHAP) version 0.41.0. The top 20 indicators were selected to build a compact model that is conducive to clinical application. All model-building steps were implemented in Python 3.8.3.
Results
At the end of follow-up, 353 patients withdrew from PD (converted to haemodialysis or died), and 471 patients continued receiving PD. In the complete model, the categorical boosting classifier (CatBoost) model exhibited the strongest performance (AUC = 0.80, 95% confidence interval [CI] = 0.76–0.83; ACC: 0.78, 95% CI = 0.72–0.83) and was selected for subsequent analysis. We reconstructed a compression model by extracting 20 key features ranked by the SHAP values, and the CatBoost model still showed the strongest performance (AUC = 0.79, ACC = 0.74).
Conclusions
The CatBoost model, which was built using the intelligent analysis technology of ML, demonstrated the best predictive performance. Therefore, our developed prediction model has potential value in patient screening before PD and hierarchical management after PD.
Background
Peritoneal dialysis (PD) is one of the main renal replacement treatments for end-stage renal disease (ESRD, also known as uraemia) [1]. The average number of new ESRD diagnoses worldwide is 144 individuals per million of the general population [2], of which approximately 11% receive PD [3]. The international PD guidelines recommend family-based renal replacement therapy owing to the prevalence of coronavirus 2019, and it has become the first choice for dialysis patients because of its simplicity and low cost [4]. However, factors such as peritonitis and peritoneal fibrosis lead to the failure of PD technology. The failure of PD technology limits its application, leading to patient withdrawal owing to the cost, and lowers patient survival rates, even leading to death [5, 6]. Early prediction may identify patients who are at high risk of PD technology failure in the short term and help determine whether to choose PD for renal replacement therapy.
With the exponential growth in healthcare data, machine learning (ML) is expected to provide more accurate and personalised services when processing large-scale medical data, predicting the development and prognosis of diseases, assisting doctors in formulating treatment plans, and identifying new disease risk factors and treatment methods. ML is also expected to promote the progress and development of medical science. An ML algorithm was used to evaluate the accuracy (ACC) of predicting cardiovascular events in asymptomatic populations by comparing random survival forests (an ML technique) with standard cardiovascular risk scores [7]. The prognostic factors affecting kidney transplant surgery cover multiple fields of surgery, immunology, epidemiology, and physiology; the large amount of data that is generated can precisely leverage the computational power of ML [8]. However, studies using ML algorithms for PD-related prognosis are limited.
ML technology was used to predict the prognosis, survival, and death risk factors of patients with PD and reported that deep neural networks demonstrated the best predictive performance (area under the curve [AUC]: 0.841) [9]. In patients with PD-associated peritonitis, traditional microbiology and molecular biology methods are considerably slow and have limited clinical applications. ML has demonstrated the power of using nonlinear methods to mine complex biomedical datasets to rapidly predict the fine reactivity and specificity of the human immune system and target antibiotic medication for early patient treatment [10]. Myopenia is associated with cardiovascular risk and mortality in patients with PD, and the ML model can effectively predict PD myopenia using simple clinical indicators [11].
However, we lack an appropriate prediction model for the adverse prognosis before PD; therefore, we constructed a prediction model for the adverse prognosis of PD using ML based on the data from our medical centre.
Materials and methods
Subjects
We retrospectively analysed 873 patients who underwent PD at our institution from August 2007 to December 2020. The inclusion criteria were 1) diagnosis of chronic renal failure and regular PD treatment for over 1 month and 2) age of 16 years or older. The exclusion criteria were 1) patients with acute kidney injuries, patients who received emergency PD, and patients in renal function recovery; 2) patients with incomplete baseline data; 3) patients who received kidney transplantation during follow-up; and 4) patients who stopped communicating with our medical centre. On the basis of these criteria, 824 patients were included in our subsequent analyses. This study was approved by the Medical Ethics Committee of Daping Hospital (YYLS2022-210), and all methods were carried out in accordance with relevant guidelines and regulations or declaration of Helsinki.
Demographic and clinical information
All baseline data were collected before PD, and the variable collection period was one week before the start of PD. The baseline variable was the last value of the patient before the start of PD. The patient demographic data were as follows: age (years), sex (male/female), height (cm), weight (kg), body surface area (m2), body mass index (BMI) (kg/m2), Admitted_date (date of formal dwelling with peritoneal dialysate), marital status (unmarried, married, divorced, widowed), education level (primary school and below, junior high school, high school, college, undergraduate, master’s degree or above), ethnicity (Han, other ethnic minorities), smoking history (yes/no), history of alcohol consumption (yes/no), systolic blood pressure (SBP) (mmHg), diastolic blood pressure (DBP) (mmHg), heart rate (beats/min), urine volume (ml/24 h), primary disease, comorbidities, dialysis term (months), previous history of renal replacement therapy (including haemodialysis and kidney transplantation), and medication history. The laboratory data were as follows: haemoglobin (HGB), ferritin, serum iron, serum total iron binding capacity (TIBC), transferrin saturation, blood calcium, blood phosphorus, intact parathyroid hormone (iPTH), calcium–phosphorus product, alkaline phosphatase, serum albumin, prealbumin, blood sodium, blood potassium, blood chlorine, carbon dioxide binding capacity, creatinine, urea nitrogen, uric acid β2 microglobulin, estimated glomerular filtration rate (eGFR), total cholesterol (TC), triglyceride, low-density lipoprotein cholesterol (LDL-c), high-density lipoprotein cholesterol (HDL-c), TIBC, serum ferritin (SF), fasting blood glucose (FBG), glycosylated HGB, β-type natriuretic peptide, troponin, creatine kinase (CK), CK myoglobin (CKMB), C-reactive protein (CRP), vitamin D (Vd), erythrocyte sedimentation rate (ESR), hepatitis B surface antigen, hepatitis C antigen/antibody, syphilis antibody, and human immunodeficiency virus antibody.
The imaging techniques were as follows: echocardiography (left ventricular end-diastolic diameter, interventricular septum thickness, left ventricular posterior wall thickness, and calculation of the left ventricular mass and left ventricular mass index), and carotid artery colour Doppler ultrasound (the presence of plaque formation).
The adverse prognosis was defined as the withdrawal from PD or all-cause mortality within 24 months of PD initiation. The patients were divided into PD withdrawal and PD continuation groups according to whether an adverse prognostic event occurred. If a patient withdrew from PD during follow-up, the time and reason for withdrawal (peritonitis, insufficient dialysis, ultrafiltration failure, thoracoabdominal fistula, catheter dysfunction, patient requirements, and other causes) were recorded. If a patient died, the dialysis duration and cause of death (cardiovascular death, other causes) were recorded. The study was terminated on 31 December 2020.
Statistical analysis
The measurement data are expressed as the mean ± standard deviation, and the counting data are expressed as a percentage. The measurement data between the groups were compared using the t-test, and the two group rates were compared using the chi-square test. The data were processed using the Statistical Package for the Social Sciences version 20.0. ML methods were used to construct a predictive model for the adverse prognosis in patients with PD. During the model construction, the enrolled patients were randomly divided into two groups at a ratio of 7:3. The larger group was the training subset for ML, and the smaller group was the testing subset for model testing. A small number of missing continuous variables were supplemented using the median method, and the categorical variables were supplemented using the 0-value method.
The following five commonly used ML algorithms were used for the initial model training: categorical boosting classifier (CatBoost) version 1.0.6, logistic regression (LR) version 1.0.2, light gradient boosting (LGB) version 3.2.1, gradient boosting (GBT) version 1.0.2, and random forest (RFL) version 1.0.2. LR is a type of generalised linear regression. The advantage is that the rate function is derivable to any order and has good mathematical properties. Many existing numerical optimisation algorithms can be used to find the optimal solution. The disadvantage is that LR cannot be used to solve nonlinear problems and cannot address the problem of data imbalance. In ML, the goal is to train the model successfully with multiple learning algorithms. Boosting is a method that is often used in practice and is not built in parallel but sequentially. The weak algorithm first trains the model and then reassembles the model according to the training results to improve the learning rate of the model. GBT is the most basic boosting model algorithm, which has no role in optimising complex data types and missing data. LGB is a highly effective way to reduce errors and improve ACC and speed; however, it does not support strings and requires a special algorithm to split the classified data. LGB performs better than CatBoost on large datasets and high-dimensional data, whereas CatBoost is better than LGB at handling category features and missing values. CatBoost is a symmetric decision tree–based learner, which relies on the GBT framework and a small number of parameters, supports categorical variables, and has high ACC. The optimisation algorithm formula of CatBoost is as follows:
where D is the set of all data available to train and evaluate our ensemble. CatBoost chooses the data to use for fitting by placing an arbitrary order on the elements of D with a random permutation σ. Let σ(k) be the kth element of D under σ, and Dk = {x1, x2, …, xk−1}, as ordered by the random permutation σ. Another concept for understanding how CatBoost encodes the values of categorical variables is the indicator function \(\mathbbm{1}_{a=b}\), which is a function of one variable that has the value of one when a = b and zero otherwise. This indicator function plays an important role in the formula applied by CatBoost to map the values of a categorical feature to a numerical value. Specifically, this formula involves the indicator function \(\mathbbm{1}_{{x}_{j}^{i}={x}_{k}^{i}}\), which takes the value one when the ith component of input vector xj of CatBoost is equal to the ith component of input vector xk. p is the added prior term, and a is usually a weight coefficient greater than zero. For binary classification problems, the prior term is the prior probability of the positive example. These concepts enable us to define the formula for the encoded value \({\widehat{x}}_{k}^{i}\).
The AUC, ACC, F1 score, and precision recall curve were used as the primary evaluation indices to select the optimal model for further optimisation. We obtained the ranking of the indicators that had the most significant impact on the model and displayed them using the Shapley additive explanation (SHAP, version 0.41.0) values, from which the top 20 indicators were selected to build a compact model conducive to clinical application. All the model-building steps were implemented in Python (version 3.8.3).
Results
According to the inclusion criteria, nine patients with a dialysis duration of less than one month at the time of withdrawal were excluded. During follow-up, 31 patients underwent kidney transplantation, and nine were lost. A total of 824 patients were included in the analysis at the end of follow-up, 353 patients withdrew from PD (converted to haemodialysis or died), and 471 patients continued receiving PD.
Our cohort included 481 men and 343 women with an average age of 47.82 ± 15.45 years, and most were married (91.6%) and of the Han ethnicity (94.7%). The education level was mainly middle school (37.0%), with smokers accounting for 24.7% and alcohol consumption accounting for 15.6%. The three primary causes of chronic renal failure were chronic glomerulonephritis in 479 patients (58.1%), diabetic nephropathy in 112 patients (13.5%), and hypertensive renal injury in 21 patients (2.5%). The most common complications were hypertension (n = 454, 55.0%). Significant differences in age, education level, urinary output, history of kidney transplantation, primary renal disease, comorbidity, and history of medication were identified between the two groups. Demographic data are presented in Table 1.
The PD withdrawal group had higher levels of ferritin, blood calcium, alkaline phosphatase, eGFR, LDL-c, FBG, glycated HGB, CRP, cardiac ejection fraction, blood phosphorus, iPTH, serum albumin, creatinine, urea nitrogen, and uric acid. β2 microglobulin was lower in the PD continuation group, and we observed no statistically significant differences in the other indicators (Table 2).
Comparison of the five complete models
The performances of the different models are listed in Table 3. The CatBoost algorithm exhibited an excellent AUC of 0.80 (95% confidence interval [CI]: 0.76–0.83) and an ACC of 0.78 (95% CI: 0.72–0.83) values. The prediction performance of the traditional LR method was acceptable, with an AUC of 0.76 (95% CI: 0.73–0.80) and an ACC of 0.71 (95% CI: 0.64–0.77). The performances of the other three ensemble learning algorithms, LGB (AUC: 0.72; ACC: 0.74), GBT (AUC: 0.72; ACC: 0.76), and RFL (AUC: 0.72; ACC: 0.65), were relatively poor. The receiver operating characteristic (ROC) curve of the complete model is displayed in Fig. 1a. The CatBoost model exhibited the strongest performance and was selected for subsequent analysis.
Key features and compact model
After excluding 187 individuals because of missing data on covariates or the predictor variables of interest, 637 patients were included in the final model construction. The calculated SHAP values summarised the ranking of the features that had the strongest influence on the prediction results of the complete model. The feature names and the extent of their influence are presented in Fig. 2. Among the demographic characteristics, age, weight, BMI, and education level were selected as significant predictors of adverse PD prognosis. Iron metabolism was closely related to haematopoiesis, among which TIBC and SF were key predictors. Prealbumin and serum albumin levels, which were closely related to liver synthesis, also played important roles. There were also prominent roles for HDL-c, FBG, and TC in glycolipid metabolism. In addition, calcium and phosphorus metabolism (Vd, serum phosphorus, and iPTH), cardiovascular function (SBP, CKMB), ESR, and creatinine had some predictive effects.
We reconstructed a compression model by extracting 20 key features ranked by the SHAP values. This simplified version of the model (AUC: 0.79; ACC: 0.74) was slightly weaker in performance than the full model but was more conducive to clinical application and data collection (Fig. 1b). The complement model had a maximum Youden index of 0.48, which gives a sensitivity of 0.68 and a specificity of 0.80. The maximum Youden index of the compact model was 0.46, and the sensitivity and specificity were 0.71 and 0.75, respectively. The specific data parameters are listed in Table 4.
Model explanation
The summary plot of the SHAP values in Fig. 2 provides an overview of the impact of the features of the final model. Figure 3 illustrates two specific forecasting examples. The blue bars represent protective factors, where longer bars indicate that PD was less likely to fail. The red bars represent risk factors and indicate the opposite effect. As depicted in Fig. 3a, a 64-year-old patient with poor education, low TIBC, and prealbumin levels was suspected of having reduced hepatic compensatory function. The final model predicted that his PD would fail, and he became hospitalised with infectious peritonitis after 7.1 months of dialysis. For another 40-year-old patient, HGB, prealbumin, and albumin levels all appeared normal, indicating a strong compensatory capacity (Fig. 3b). The model predicted that the patient was suitable for PD, and the patient continued PD after the follow-up period for over five years.
Discussion
PD-associated peritonitis is one of the leading causes of PD withdrawal and death [12, 13]. ML algorithms are becoming increasingly popular in medical research and can be applied to disease screening, diagnosis, and prognosis. We used ML intelligent analysis technology to construct a predictive model for the adverse prognosis of PD and demonstrated that age, body weight, and albumin levels are important predictive factors for the adverse prognosis of PD. We developed five predictive models; in the complete model, the calculated SHAP values summarised the strongest predictive indicators and sorted and extracted the 20 key features to reconstruct the model. Collectively, our findings suggested that the CatBoost model demonstrated the strongest performance.
We ranked the factors closely related to the adverse prognosis of patients by the SHAP values, with the top 20 key factors including age, body weight, albumin, and blood lipids. The meta-analysis revealed that age is a risk factor for all-cause cardiovascular death in dialysis patients [14]. In this study, we observed that the age of patients in the PD continuation group was significantly lower than that in the adverse prognosis group (45.36 vs 51.11 years, P < 0.001). In the complete model, the calculated SHAP values confirmed that age had the strongest impact on predicting an adverse prognosis for patients with PD. In addition, body weight and BMI were critical predictive factors for adverse PD prognosis, with higher BMI leading to higher hospitalisation rates for peritonitis [15]. In the general population, obesity is associated with increased cardiovascular risk and reduced survival, but the “obesity paradox” in ESRD has always been controversial [16, 17]. Our study suggests that increased body weight and BMI correlate with a lower risk of adverse PD prognosis. The nutritional indicators include body weight, as well as albumin and blood lipids. A positive correlation between nutritional status and dialysis duration has been reported in patients with PD because a nutritious diet reduces the incidence of complications such as peritonitis [12].
Education level was also considered a vital predictor of adverse PD prognosis, and multiple studies have demonstrated that [18, 19] patients with lower education levels experience increased peritonitis and technical failure than those with higher education levels. The potential reason may be that patients with lower education levels have lower incomes and poor compliance, which affects their access to timely healthcare, medication, and treatment.
The high prevalence of cardiovascular diseases in patients with PD is related to uremic toxins, inflammation (ESR), and disorders in bone mineral metabolism (Vd, serum phosphorus, and iPTH) [20]. Similarly, we observed Vd, serum phosphorus, iPTH, ESR, creatinine, and cardiovascular disease to be associated with adverse PD prognosis in patients. Furthermore, we observed that TIBC and SF are critical predictive factors for adverse PD prognosis and that higher amounts of iron increase the risk of QT dispersion [21]. Functional iron deficiency is an independent risk factor for all-cause death in patients with PD. Consistent with our research findings, patients with PD with high iron levels have a four-fold higher risk of all-cause cardiovascular death [22]. The effect of the COVID-19 pandemic on patients with PD is still being debated, and certain authors argue that COVID-19 has no effect on the survival of patients with PD [23]. However, other researchers state that the COVID-19 pandemic led to increased death [24, 25]. During the severe period of the COVID-19 pandemic, the mortality rate of patients with PD at our centre was 2.64%, whereas no significant changes were observed in the mortality rate of patients with PD (5.13%) over the same period. We believe that the COVID-19 pandemic had no significant effect the results of the model.
ML is an interdisciplinary field of mathematics and statistics [26] that involves fitting predictive models to data for information grouping. We assumed that ML methods could predict the adverse prognosis of patients before starting PD, recommended the most favourable dialysis method, and provided timely medical intervention, which improved patient prognosis and reduced medical costs.
CatBoost is the third Gradient-Boosted Decision Tree (GBDT)–based improved algorithm after XGBoost and LightGBM [27]. Launched by Yandex Company in Russia in 2018 and is open source. It uses gradient lifting on the decision tree and can be easily integrated into deep-learning frameworks. Based on the GBDT framework, which has fewer parameters, CatBoost supports categorical variables with high ACC and can efficiently and reasonably process t-algorithms. CatBoost has been extensively studied in the prediction of skin sensitisation [28], depression occurrence [29], pregnancy diabetes management [30], and transplanted kidney function [8], and it exhibits good predictive performance. Owing to numerous factors that affect an adverse PD prognosis and considering the clinical applications, we reconstructed a compression model by extracting 20 key features ranked by the SHAP values. This simplified version of the model was slightly weaker in performance than the full model but was more conducive to clinical application and data collection. Before a patient starts PD, the CatBoost model can be used to predict whether the patient is suitable for PD treatment and whether PD-related peritonitis may occur. On the basis of the prediction, the most optimal dialysis plan can be selected for the patient allowing early intervention.
Our study had several limitations. First, this was a single-centre retrospective study, and we could not evaluate whether the external cohort population exhibited the same pattern. Second, this study used the median of missing values, which inevitably led to bias. Third, the number of cases was relatively small, and the model construction lacked cross-validation and external validation, all of which affected the ability to generalise the model. A multicenter joint study is needed to validate the model. Finally, only patients with PD were included in the study. Therefore, the model might be strongly biased if it is used for patients with chronic kidney disease for the selection of the best renal replacement therapy.
Conclusions
Collectively, the CatBoost model built using the intelligent analysis technology of ML demonstrated the best predictive performance (AUC: 0.79; ACC: 0.74). Thus, the model has potential value in patient screening before PD and hierarchical management after PD.
Availability of data and materials
The datasets used and/or analysed in the study are available from the corresponding author on reasonable request.
Abbreviations
- PD:
-
Peritoneal dialysis
- ML:
-
Machine learning
- SHAP:
-
Shapley additive explanation
- ESRD:
-
End-stage renal disease
- SBP:
-
Systolic blood pressure
- DBP:
-
Diastolic blood pressure
- HGB:
-
Haemoglobin
- iPTH:
-
Intact parathyroid hormone
- eGFR:
-
Estimated glomerular filtration rate
- LDL-c:
-
Low-density lipoprotein cholesterol
- HDL-c:
-
High-density lipoprotein cholesterol
- CK:
-
Creatine kinase
- CKMB:
-
Creatine kinase myoglobin
- CRP:
-
C-reactive protein
- TIBC:
-
Total iron binding capacity
- SF:
-
Serum ferritin
- FBG:
-
Fasting blood glucose
- ESR:
-
Erythrocyte sedimentation rate
- LR:
-
Logistic regression
- LGB:
-
Light gradient boosting
- GBT:
-
Gradient boosting
- RFL:
-
Random forest
- AUC:
-
Area under the curve
- ACC:
-
Accuracy
- ROC:
-
Receiver operating characteristic
- F1:
-
F1 score
- PRC:
-
Precision recall curve
- CI:
-
Confidence interval
References
Shrestha BM. Peritoneal dialysis or haemodialysis for kidney failure? JNMA J Nepal Med Assoc. 2018;56(210):556–7.
Thurlow JS, Joshi M, Yan G, et al. Global epidemiology of end-stage kidney disease and disparities in kidney replacement therapy. Am J Nephrol. 2021;52(2):98–107.
Mehrotra R, Devuyst O, Davies SJ, et al. The current state of peritoneal dialysis. J Am Soc Nephrol. 2016;27(11):3238–52.
Teitelbaum I. Peritoneal dialysis. N Engl J Med. 2021;385(19):1786–95.
Chaudhary K, Sangha H, Khanna R. Peritoneal dialysis first: rationale. Clin J Am Soc Nephrol. 2011;6(2):447–56.
Collaboration GCKD. Global, regional, and national burden of chronic kidney disease, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2020;395(10225):709–33.
Venkatesh R, Balasubramanian C, Kaliappan M. Development of big data predictive analytics model for disease prediction using machine learning technique. J Med Syst. 2019;43(8):272.
Quinino RM, Agena F, de Andrade LGM, et al. A machine learning prediction model for immediate graft function after deceased donor kidney transplantation. Transplantation. 2023;6(107):1380–9.
Noh J, Yoo KD, Bae W, et al. Prediction of the mortality risk in peritoneal dialysis patients using machine learning models: a nation-wide prospective cohort in Korea. Sci Rep. 2020;10(1):7470.
Zhang J, Friberg IM, Kift-Morgan A, et al. Machine-learning algorithms define pathogen-specific local immune fingerprints in peritoneal dialysis patients with bacterial infections. Kidney Int. 2017;92(1):179–91.
Wu J, Lin S, Guan J, et al. Prediction of the sarcopenia in peritoneal dialysis using simple clinical information: A machine learning-based model. Semin Dial. 2023;5(36):390–8.
Kiebalo T, Holotka J, Habura I, et al. Nutritional status in peritoneal dialysis: nutritional guidelines, adequacy and the management of malnutrition. Nutrients. 2020;12(6):1715.
Cho Y, Johnson DW. Peritoneal dialysis-related peritonitis: towards improving evidence, practices, and outcomes. Am J Kidney Dis. 2014;64(2):278–89.
Zhang J, Lu X, Li H, et al. Risk factors for mortality in patients undergoing peritoneal dialysis: a systematic review and meta-analysis. Ren Fail. 2021;43(1):743–53.
Obi Y, Streja E, Mehrotra R, et al. Impact of obesity on modality longevity, residual kidney function, peritonitis, and survival among incident peritoneal dialysis patients. Am J Kidney Dis. 2018;71(6):802–13.
Park J, Ahmadi SF, Streja E, et al. Obesity paradox in end-stage kidney disease patients. Prog Cardiovasc Dis. 2014;56(4):415–25.
Prasad N, Sinha A, Gupta A, et al. Effect of body mass index on outcomes of peritoneal dialysis patients in India. Perit Dial Int. 2014;34(4):399–408.
Kim HJ, Lee J, Park M, et al. Lower education level is a risk factor for peritonitis and technique failure but not a risk for overall mortality in peritoneal dialysis under comprehensive training system. PLoS One. 2017;12(1):e169063.
Fan X, Huang R, Wang J, et al. Risk factors for the first episode of peritonitis in Southern Chinese continuous ambulatory peritoneal dialysis patients. PLoS One. 2014;9(9):e107485.
Albakr RB, Bargman JM. A comparison of hemodialysis and peritoneal dialysis in patients with cardiovascular disease. Cardiol Clin. 2021;39(3):447–53.
Bavbek N, Yilmaz H, Erdemli HK, et al. Correlation between iron stores and QTc dispersion in chronic ambulatory peritoneal dialysis patients. Ren Fail. 2014;36(2):187–90.
Luo D, Zhong Z, Qiu Y, et al. Abnormal iron status is associated with an increased risk of mortality in patients on peritoneal dialysis. Nutr Metab Cardiovasc Dis. 2021;31(4):1148–55.
Ozturk S, Gursu M, Arici M, et al. Evaluation of outcomes of peritoneal dialysis patients in the post-COVID-19 period: a national multicenter case-control study from Turkey. Nephron. 2023;147(5):272–80.
Courville KJ, Bustamante NE, Nunez-Samudio V, et al. Vaccination and mortality from COVID-19: a comparative study between hemodialysis, peritoneal dialysis, and non-dialysis adult population in Panama. Int Urol Nephrol. 2023;55(8):2091–8.
Ghonimi T, Alkad MM, Abuhelaiqa EA, et al. Mortality and associated risk factors of COVID-19 infection in dialysis patients in Qatar: A nationwide cohort study[J]. PLoS One. 2021;16(7):e254246.
Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55.
Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features. Adv Inf Process Syst. 2018;31.
Ambe K, Suzuki M, Ashikaga T, et al. Development of quantitative model of a local lymph node assay for evaluating skin sensitization potency applying machine learning CatBoost. Regul Toxicol Pharmacol. 2021;125:105019.
Zhang C, Chen X, Wang S, et al. Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011–2018. Psychiatry Res. 2021;306:114261.
Kumar M, Ang LT, Ho C, et al. Machine learning-derived prenatal predictive risk model to guide intervention and prevent the progression of gestational diabetes mellitus to type 2 diabetes: prediction model development study. JMIR Diabetes. 2022;7(3):e32366.
Acknowledgements
The authors would like to thank Professor Xiao Jinjin of the Xinqiao Hospital for providing statistical support.
Funding
This work was supported by grants from the National Natural Science Foundation of China (82270768), the Chongqing Technology Innovation project (2022YSZX-JCX0007CSTB, cstc2019jscx-msxmX0258), and the National Science and Technology Support Plan (2022-173ZD-112, SKLKF202201).
Author information
Authors and Affiliations
Contributions
JY, JF, and LF are the joint co-first authors. KC, LX and JY designed the study. JY, JF and LF performed the experiments. JY, SH, and KY collected the data. LF, LF, and LX analyzed the data. JF, LX, and KC wrote the manuscript. All of the authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study was approved by the Medical Ethics Committee of Daping Hospital (YYLS2022-210), and all methods were carried out in accordance with relevant guidelines and regulations or declaration of Helsinki. According to the GUIDELINE 10 of 2016 International Ethical Guidelines for Health-related Research Involving Humans (https://0-doi-org.brum.beds.ac.uk/https://0-doi-org.brum.beds.ac.uk/10.56759/rgxl7405), the need for informed consent was waived by the Medical Ethics Committee of Daping Hospital.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplemental Figure 1.
Prediction accuracy of the models with “dead patients” and “PD withdrawal” within 24 months. Patient status after 24 months in the training subset. B. Patient status after 24 months in the test subset. C. AUC of the training model when dead patients were excluded. D. Confusion matrix of the training model when dead patients were excluded. E. AUC with patient death as the predictive outcome. F. Confusion matrix with patient death as the predictive outcome. Supplemental Figure 2. Stack of distributions with missing variables. m: number of missing variables.
Additional file 2: Supplemental Table 1.
The SHAP value of the top 20 variables.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yang, J., Wan, J., Feng, L. et al. Machine learning algorithms for the prediction of adverse prognosis in patients undergoing peritoneal dialysis. BMC Med Inform Decis Mak 24, 8 (2024). https://0-doi-org.brum.beds.ac.uk/10.1186/s12911-023-02412-z
Received:
Accepted:
Published:
DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12911-023-02412-z