Abstract
Acute kidney injury (AKI) is one of the most important lethal factors for patients admitted to intensive care units (ICUs), and timely high-risk prognostic assessment and intervention are essential to improving patient prognosis. In this study, a stacking model using the MIMIC-III dataset with a two-tier feature selection approach was developed to predict the risk of in-hospital mortality in ICU patients admitted for AKI. External validation was performed using separate MIMIC-IV and eICU-CRD. The area under the curve (AUC) was calculated using the stacking model, and features were selected using the Boruta and XGBoost feature selection methods. This study compares the performance of a stacking model using two-tier feature selection with a model using single-tier feature selection (XGBoost: 85; Boruta: 83; two-tier: 0.91). The predictive effectiveness of the stacking model was further validated by using different datasets (Validation 1: 0.83; Validation 2: 0.85) and comparing it with a simpler model and traditional clinical scores (SOFA: 0.65; APACH IV: 0.61). In addition, this study combined interpretable techniques and causal inference to analyze the causal relationship between features and predicted outcomes.
Similar content being viewed by others
Introduction
Acute kidney injury (AKI), a significant factor to inpatient mortality worldwide, affects approximately one-fifth of hospitalized individuals1,2,3. The International Society of Nephrology's 0 by 25 initiative aims to eradicate preventable AKI-related deaths by the year 20254. Despite considerable recent efforts, identifying effective treatments that substantially enhance renal recovery remains challenging. Early prediction or detection of AKI carries significant clinical implications but poses a substantial hurdle. Early prediction or detection of AKI has significant clinical implications but poses a substantial challenge. To address the limitations of early AKI prediction, researchers have increasingly turned to machine learning methods. However, the success of these models hinges on the selection of relevant features. To this end, diverse feature selection techniques are employed to improve model generalization, stability, and interpretability5,6,7.
In this context, artificial intelligence (AI) has demonstrated promise for time-sensitive applications in AKI. These applications encompass early identification, warning, and the provision of AKI treatment recommendations8,9. Machine learning-based models can detect AKI at an early stage, providing clinicians with a chance to intervene earlier and potentially improve patient outcomes10,11,12. While previous research on AKI prediction has predominantly focused on specific settings such as hospital-acquired AKI13, postoperative AKI14, cancer-related AKI8, and critically ill patients in intensive care units (ICUs)15,16, as well as patients admitted to emergency departments17, there remains a gap in machine learning models for predicting AKI in general and critically ill patients. The high heterogeneity of patient history data in general hospitals presents a significant challenge for the independent validation of predictive models18. This is particularly problematic for current machine learning-based mortality risk prediction models, as they often rely on a multitude of patient test results, such as routine blood tests, as input features. The multiplicity of features and the lengthy data collection process pose a significant challenge to model application. Consequently, reducing the number of input features while preserving model accuracy has become a pressing concern. Zhu et al.19 successfully developed a machine learning model for predicting the risk of death in sepsis patients, achieving a 71% reduction in the number of features. Their findings indicate that even with small samples and low-dimensional data, accurate identification of patients at risk is feasible, enabling early treatment. Additionally, Shen et al.20 and Wu et al.7 validated the impact of feature reduction on the stability and accuracy of model prediction performance. Although many models have been proposed to identify patients at risk for AKI, few models can predict the risk of clinically important outcomes (hospital death or dialysis) once a patient develops AKI. The application of models with clinically important predictions may help guide the early treatment of patients with AKI.
In this study, we introduce a machine learning model that employing a two-stage feature selection process to predict in-hospital mortality risk among ICU patients with AKI. Our aim is to identify crucial features for mortality prediction and, as a result, reduce feature dimensionality to enhance model interpretability without sacrificing accuracy.
Results
Study population characteristics
For this study, data from 16,090 initial ICU admissions in the MIMIC III database were collected, with 11,182 patients meeting the inclusion criteria. These data comprised the training set, with 30% allocated for internal validation. Patients were categorized as either dead or surviving. Furthermore, data from MIMIC-IV (validation 1) and eICU-CRD (validation 2) were retrieved for external validation using the same criteria. The mortality and survival data for the training set, internal validation set, and external validation set were statistically analyzed a summarized in Table 1. The training set consisted of 7828 cases, of which 2273 resulted in mortality and 5555 in survival. The internal validation set consisted of 3354 cases with 840 deaths and 2514 survivors (see Supplementary Table 1). External validation set 1 (MIMIC-IV) included 7822 cases with 6705 deaths and 1117 survivors. External validation set 2 (eICU-CRD) consisted of 5928 cases, with 5403 survivors and 525 deaths. Across all three datasets, the proportion of men diagnosed with AKI exceeded that of women, and correspondingly, the mortality rate was also higher among male patients. Additionally, AKI patients aged over 60 exhibited a notably elevated mortality rate compared to patients in other age brackets.
Feature selection and Model performance
Feature selection involved initial screening with the Boruta algorithm for the first tier (see Supplementary Fig. 1A,B). Subsequently refinement was carried out using XGBoost for the second tier (see Supplementary Fig. 2A). Ultimately, a total of 24 relevant features were identified. Additionally, the features identified using only XGBoost feature selection are depicted in Supplementary Fig. 2B.
The construction of the model is carried out according to the features screened by the Boruta and XGBoost algorithms, respectively, and the prediction effect of the model is evaluated and compared. The evaluation indexes are shown in Fig. 1A–C,A–C′′). As can be seen from the figure, the best-evaluated algorithm is stacking, with an AUC (95% CI) of 0.85 (0.846–0.854) for XGBoost-Stacking (Stacking model construction using XGBoost filtered features) and an AUC (95% CI) of 0.83 (0.828–0.831) for Boruta-Stacking (Stacking model construction using Boruta filtered features). Subsequently, model construction was performed using two-tier feature selection, and different types of algorithms were used for model training. The results showed that the stacking algorithm gave the best prediction with an AUC (95% CI) of 0.91 (0.906–0.915) (Fig. 1E–G). The precision, accuracy, and F1-score of the trained model were evaluated, achieving values of 0.90, 0.89, and 0.90, respectively (Fig. 2A–C). Furthermore, a comparison was conducted between the performance of models developed using single-tier and two-tier feature selection methods (Table 2; see also Supplementary Table 2 for detailed results). This analysis revealed that the two-tier feature selection approach consistently yielded superior prediction performance compared to the single-tier method.
Illustrates model performance metrics with single- and two-tier feature selection, including feature selection using only XGBoost and Boruta, along with receiver operating characteristic curves (ROC), precision-recall curves (PRC) predicted by models with two-tier feature selection. These evaluations are conducted on both the training set (MIMIC-III) and the validation sets (MIMIC-IV, eICU-CRD), with the receiver operating characteristic curves presented alongside a shaded 95% confidence interval (ROC (95% CI)). (A–C) denote the ROC, PRC, and ROC (95% CI) predicted by the model using only XGBoost for feature selection, respectively. (A′–C′) denote the ROC, PRC, and ROC (95% CI) predicted by the model using only Boruta for feature selection. (E–G) denote the ROC, PRC, and ROC (95% CI) predicted by the model using two-tier feature selection on the training set. (E′–G′) denotes the AUROC, PRC, and ROC (95% CI) predicted by the model using two-tier feature selection on Validation 1 (MIMIC-IV). (E″–G″) denotes AUROC, PRC, and ROC (95% CI) predicted by the model using two-tier feature selection on Validation2 (eICU-CRD). CI confidence interval.
To further validate the predictive effectiveness of the developed two-tier feature selection model, we used both internal and external validation sets to evaluate the model's performance. Figure 1E’–G’ demonstrates the validation results on validation set 1, with an AUC (95% CI) of 0.83 (0.830–0.833). Figure 2A’–C’ shows the precision of 0.86, accuracy of 0.81, and F1-score of 0.80 for validation set 1. In Fig. 1E”–G”, we show the validation results of validation set 2 with an AUC (95% CI) of 0.85 (0.845–0.853). Figure 2 A”–C” shows that validation set 2 has a precision of 0.84, an accuracy of 0.80, and an F1-score of 0.80. Comparing the training results with the validation results reveals (Tables 2 and 3) that the AUC values of both the internal and external validation sets are higher than 0.80, indicating that the model predicts well on different datasets. In addition, we compared the constructed model with the traditional clinical scoring systems SOFA and APACHE IV (Fig. 1E, E’, E”). The results showed that on the training set, the AUC was 0.65 for SOFA and 0.61 for APACHE; on the validation set 1, the AUC was 0.71 for SOFA and 0.64 for APACHE; and on the validation set 2, the AUC was 0.62 for SOFA and 0.64 for APACHE. These results indicate that, compared to the traditional clinical scoring system, the constructed model has a better prediction effect. The experimental results of internal validation are presented in Online Supplementary Fig. 3A–C, Fig. 4A–C, and Table 3.
To achieve a more comprehensive evaluation of the model’s performance, calibration curves and Brier scores were employed. The results are presented in Fig. 3 and Table 4. The Brier scores (with 95% CI) indicated good calibration, with values of 0.103 (0.093–0.113) for the training set, 0.106 (0.096–0.118) for validation set 1, and 0.110 (0.100–0.122) for validation set 2 (Table 4).
Model interpretability
The ensemble stacking model with two-tier feature selection utilizes two perspectives for model interpretation: individual and global. From an individual perspective, the interpretation module analyzes feature weights of the base model using the PI technique, as depicted in Fig. 4A–G. The top three important features in the base model are age, BUN, and temperature. During the analysis of the meta-model's feature weights, the predicted outputs of RFs' predicted outputs significantly influenced the final predictions. Table 5 displays the feature weights of the stacking model, which align with the findings of the base model analysis. From a global perspective, a causal diagram based on significant features is presented in Fig. 5. A causal relationship is observed between CL and HB as confounders of BUN and age and BUN as confounders of death, This relationship is represented as CL, HB → BUN → Death. However, determining the specific impact of each detection value is not feasible. Therefore, the influence of specific feature values is analyzed using LIME (Fig. 6). The LIME analysis reveals that the model's predictions vary under different combinations of feature values. For instance, the model is more likely to predict a lower risk of death when BUN ranges from 14.0 to 20.5 and age is ≤ 57 years, while predicting a higher risk of death when INR exceeds 1.5. This personalized analysis helps physicians understand the causal relationships between features and characteristics during model executions, thereby enhancing their comprehension of the models' decision-making processes.
Feature importance analysis of stacking base models. (A) LR: Permutation Importance; (B) LGBM: Permutation Importance; (C) NB: Permutation Importance; (D) RF: Permutation Importance; (E) XGBoost: Permutation Importance; (F) SVM: Permutation Importance: The number next to each feature indicates the importance of that feature. A positive value indicates that the feature contributes positively to the model, while a negative value indicates that the feature contributes negatively to the model. The larger the number (positive or negative), the greater the influence of the feature on the model. Green indicates that the feature contributes positively to the model. Red indicates that the feature contributes negatively to the model. The shades of green and red represent the weights.
Interpretation of the predictions using the LIME algorithm with different random state values. (A) Predicted values for the model's disaggregated categories, feature coefficients (orange and blue indicate positive and negative relationships, respectively), and feature values in this sample are plotted against (B) local interpretations of features (red and green indicate positive and negative relationships, respectively).
In summary, the stacking ensemble model with two-tier feature selection integrates individual and global perspectives for model interpretation. The analysis of feature weights using the PI technique for both base and stacked models, along with causal diagrams and LIME analyses based on causal inference, enhances understanding of the model's predictive process and provides reliable references for medical decision-making.
Discussion
In the building of predictive models for AKI, logistic regression with backward or forward selection is a common approach for selecting a subset of features for model construction21. More recent approaches, methods such as Lasso, Boruta22, and XGBoost23 have been employed for feature selection in AKI prediction.
However, Lasso methods are typically limited in their adaptability, often relying on linear models or assumptions. In nonlinear scenarios, Lasso methods may fail to accurately capture complex feature relationships, resulting in the selection of insufficient features for effective data interpretation. Logistic regression, which relies on a linear combination of individual features for classification, may not adequately capture feature interactions, potentially impacting prediction accuracy. While Boruta6, a feature selection method based on tree models, excels at uncovering complex feature relationships and handling highly correlated features. Nonetheless, it solely focuses on the relationship between features and targets, disregarding the importance of features and models. XGBoost24, a gradient-boosting tree model, excels at capturing complex relationships among features, particularly in nonlinear scenarios. Its feature selection process focuses on the correlation between features and the model.
Several studies have highlighted the importance of feature selection in improving model performance for AKI prediction. Zhou et al.25 demonstrated significant improvements in model predictions by incorporating deep features alongside those extracted using convolutional neural networks (CNNs). Similarly, Zhu et al.19 observed a substantial enhancement in prediction accuracy following a 71% reduction in feature set size. Based on these findings, we propose that the model prediction performance can be improved by selecting intersecting features and reducing redundant features. Therefore, in our experiments, we first conducted feature selection using the Boruta algorithm to filter out features that correlate with the target value. Subsequently, in the second tier of feature selection, we employed XGBoost to filter out features that correlate with the model. Experimental results demonstrate the superiority of the two-tier feature selection approach over the single-tier approach. The stacking ensemble model exhibited superior predictive performance compared to the baseline model. Notably, the stacking ensemble model with two-tier feature selection achieved the highest predictive performance. Yue et al.26 applied the Boruta algorithm to screen 34 variables and built a random forest model for predicting mortality risk in acute kidney injury patients, achieving an AUROC of 0.82. Yang et al.27 employed the Boruta algorithm to select 36 variables and utilized XGBoost for modeling mortality risk prediction in sepsis-associated acute kidney injury patients, achieving an AUROC of 0.85. In comparison to previous studies, our proposed model achieved an AUROC of 0.91, indicating improved model performance. These results affirm the efficacy of our proposed approach. Furthermore, by employing model interpretable techniques and causal inference, we conducted a causal analysis of factors influencing model predictions. Our findings revealed significant associations between various laboratory tests and the prediction of mortality risk in AKI patients, consistent with previous studies by Son28, Zhang29, and others, further validating the reliability of our model. Taken together with the experimental results, the two-tier feature selection proposed in this study can better predict the risk of death of AKI patients in the ICU, and it can better capture the complexity and diversity of AKI risk by reducing the confounding variables in the model inputs. By combining the predictive power of multiple models, it can provide a more reliable auxiliary diagnosis for clinical decision-making.
However, it is worth noting that a common challenge we faced was that this study was not prospective but retrospective. We chose to use the MIMIC-III dataset, but it is important to note that this dataset does not adequately represent the entire population and the diversity of different clinical practices. This somewhat limits our ability to analyze the problem in depth and make accurate predictions, as well as our ability to generalize the model to real-world applications. It is worth noting that due to the large number of missing urine output indicators in the dataset, we chose to temporarily omit this indicator from our study after referring to the relevant literature18,30,31,32,33. However, recent studies34 suggest that urine output plays a crucial role in the disease progression of AKI. Therefore, we will fully consider urine output as an important indicator in future studies to improve the ability to predict more accurately the risk of death in patients.
Methods
The conceptual framework for our developing two-tier feature selection prediction model is presented in Fig. 7.
A conceptual model for predicting outcomes in AKI patients within the ICU using a limited set of features. The initial conceptual model is designed for ongoing prediction of AKI-related hospitalization outcomes. Firstly, we gather data on the patient's laboratory tests, surgeries, and medication usage. Secondly, relevant features are identified for prediction through feature selection. Thirdly, we introduce a stacking ensemble model, employing fivefold cross-validation to assess patient outcomes. Lastly, the model undergoes analysis using various interpretable methods.
Study population
Data for this study were retrieved from three distinct critical care databases: MIMIC-III35, MIMIC-IV36, and eICU-CRD37. The prediction models were developed using the publicly accessible MIMIC-III databases. The data were divided into two sets: 30% of the data were reserved for internal validation, and the remaining 70% were used for model construction. The predictive performance of these models was validated using an entirely independent dataset, the MIMIC-IV and eICU-CRD datasets. MIMIC-III includes critical care data from 46,520 ICU patients admitted to Beth Israel Deaconess Medical Center in Boston between June 1, 2001, and October 31, 2012. This dataset encompasses 26 tables encompassing demographics, admission records, discharge summaries, ICD-9 diagnostic records, vital signs, laboratory measurements, and medication usage. In contrast, MIMIC-IV includes data from over 190,000 patients, 450,000 hospitalizations, and more than 1000 hospital admissions to Beth Israel Deaconess Medical Center (BIDMC) and Massachusetts Institute of Technology (MIT) between 2008 and 2019, totaling 1,000,000 admissions. It offers a broader array of information, covering demographics, laboratory tests, medication usage, vital signs, surgical procedures, and disease diagnoses. Although MIMIC-III and MIMIC-IV may share medical information and data types, their data collection, processing, and dissemination methodologies differ. The MIMIC-IV dataset is broader in scope, spanning more hospitals and patients and covering a longer timeframe. The eICU-CRD Collaborative Research Database (eICU-CRD) is a large public database created by MIT in collaboration with the Laboratory for Computational Physiology (LCP). The database is a completely independent dataset that brings together data from many hospitals within the United States, expanding the scope of the study by providing data from multiple centers. The database covers routine data on more than 200,000 patients admitted to intensive care units in 2014 and 2015 and includes a wealth of high-quality clinical information such as physiological parameters, laboratory results, medication records, and diagnostic information. The data are presented in both structured and unstructured forms and are automatically collected from monitoring equipment, electronic medical records, and other healthcare information systems.
For each patient sample, the following information was collected: (1) Demographic characteristics: including gender, age in years, and survival status; (2) Vital signs: including heart rate (HR, beats/min), respiratory rate (Resp, beats/min), body temperature (Temp, degrees Celsius), and pain (pain, not applicable); (3) Laboratory parameters: including blood urea nitrogen (BUN, mg/dL), creatinine (Creatinine, mg/dL), glucose (GLU, mg/dL), bicarbonate (HCO3, mmol/L), international normalized ratio (INR), potassium (K), potassium (K, mmol/L), sodium (Na, mmol/L), partial pressure of carbon dioxide (PCO2, mmHg), prothrombin time (PT, s), white blood cell count (PCR, mmol/L s), white blood cell count (WBC, in 103/μL), chloride (CL, in mmol/L), Glasgow Coma Scale (GCS), hematocrit (HCT, %), hemoglobin ( HB, g/dL), acid–base balance index (PH,) platelet count (PL, in mmol/L), platelet count (PLT, in 103/μL), oxygen pressure (PO2, in mmHg), peripheral oxygen saturation (SpO2, in %), and fraction of inspired oxygen (FiO2, in %). Blood samples were taken before and after dialysis, following an 8-h fast for routine biochemical testing.
Determination of outcome variables: mortality and AKI
Mortality, defined as the death rate among patients with AKI during their ICU hospitalization, was determined through specific criteria. Firstly, AKI diagnosis followed the Kidney Disease Improving Global Prognosis (KDIGO)1 guidelines, considering serum creatinine concentration (Scar) and urine output (UO) levels. According to literature studies30,31,32,33,38, serum creatinine concentration (Scar) was used as the main target of study in this experiment. AKI was defined as: a 1.5-fold increase in serum creatinine concentration within the prior 7 days; a rise of ≥ 0.3 mg/dL within 48 h; or a sustained urine output of < 0.5 mL/kg/h for ≥ 6 h. In cases where baseline serum creatinine was unavailable pre-admission, the first serum creatinine at admission served as the baseline. Patients with AKI in the ICU were identified via departmental codes. Subsequently, ICU duration was computed based on admission and discharge times, and data from 24 h preceding admission were extracted26,39,40. Data from the initial ICU admission were used for patients with multiple admissions; the average value was calculated for repeated examinations within 24 hours41.
Inclusion and exclusion criteria
To ensure data safety and emphasize the effectiveness of the model in early prediction, we focused on developing a predictive model using medical data from 24 h prior to a patient's admission to the hospital to screen patients diagnosed with AKI. The final dataset for the experiment was selected from this data (Fig. 8). During the data selection process, we excluded patients who met the following criteria: (1) age < 18 years old; (2) patients who were admitted to the intensive care unit for > 24 h; (3) patients who had already received chronic renal replacement therapy prior to admission; and (4) data with < 20% of missing values or a lack of outcome information. These exclusion criteria were designed to ensure the quality and accuracy of the experimental data for better exploring the relationship between early patient status and AKI.
Data processing
In this study, datasets with missing values exceeding 20% were excluded, and outliers were identified using box-and-whisker plots and subsequently removed. To handle missing values, multiple imputations were performed utilizing the RF algorithm, known for its effectiveness in imputing missing data42 RF offers several advantages, including the ability to handle mixed types of missing data, adaptability to interactions and nonlinearities, and scalability to large datasets43, while preserving the distribution of data post-imputation. Additionally, the data underwent Min–Max normalization, transforming it into a specific range of intervals to ensure uniform scaling of each feature. This normalization process ensured uniform scaling for each feature, maintaining a relative weight balance between features. By addressing issues of model bias towards certain features due to differing scales, the normalization process improved the performance and interpretability of the machine learning model, ensuring consistent contribution weights of individual features to the model.
Statistical analyses
Descriptive statistics were utilized to assess the distribution and inherent patterns of numerical characteristics within the dataset. Measures such as mean, median, mode, range, variance, and standard deviation were examined as appropriate. Pearson's correlation coefficient was employed to analyze the degree of linear correlation between variables. Descriptive statistics for continuous variables included either mean ± standard deviation or median (interquartile range), while frequencies were utilized for categorical variables. The normal distribution of each variable was evaluated using the Kolmogorov–Smirnov test. Student's t-test compared continuous variables, and Fisher's exact method was used for correlational analysis between variables. Statistical analysis was performed using R version 4.3.1 for Windows.
Feature selection
This study employs a two-tier feature selection approach to improve both the performance and interpretability of the prediction model. The Boruta algorithm was utilized in the initial tier, and the XGBoost algorithm was employed in the subsequent tier. Boruta6,44 is a RF-based feature selection method that evaluates feature importance through modelling the distribution of random and original features. In the first tier, Boruta is applied to filter out features with significant predictive power for the target variable from the initial set. XGBoost23, an efficient gradient boosting tree algorithm, serves as the meta-model in the second tier, known for its excellent predictive performance and automatic feature screening. Feature selection within the XGBoost model further refines the initially selected features, resulting in a final subset with enhanced predictive power and stability.
Model construction
In this study, we have strategically employed the SEM (Stacking Ensemble Method) to build our model, with the goal of further enhancing overall model performance by adeptly integrating outputs from multiple base learners (single classifiers) as inputs to the meta-learner. Extensive prior research has demonstrated the substantial superiority of the SEM in performance compared to independent classifiers45. To further optimize model performance, this study has employed the voting ensemble method in the preliminary stage. This method selectively crafts the base model for the SEM based on data characteristics and the principle of model diversity. Ultimately, Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Light Gradient Boosting Machine (LGBM), EXtreme Gradient Boosting (XGBoost), and Random Forest (RF) were identified as the base models for stacking ensemble, with LR being specifically chosen as the metamodel. This decision was made considering that the variables outputted by the base models represent linear data and align with the pursuit of model interpretability. The objective of this selection is to strike a balance between the diversity of the base models and the performance of the overall model, thereby providing a more comprehensive and reliable analytical foundation for this study.
Evaluation metrics
To assess the performance of the used models comprehensively and thoroughly, we utilize a diverse set of performance metrics, encompassing the area under the Receiver Operating Characteristic (AUROC), 95% Confidence Interval (CI), Precision-Recall Curve (AUC-PRC), Precision, Accuracy, Recall, F1 score, Calibration curves, and Brier scores. This comprehensive metric framework is designed to provide a more holistic understanding of the model's performance across various dimensions. Specifically, the evaluation is conducted using the following formulas:
True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
N is the total number of samples, \({f}_{i}\) is the predicted probability of the ith predicted sample, and \({o}_{i}\) is the actual outcome of the ith sample (usually 0 or 1).
Model interpretability
The SEM combines multiple base models to generate predictions. Thus, when interpreting the model, the feature weights of each base model undergo an initial assessment using the Permutation Importance (PI) technique. Subsequently, mathematical computation determines the feature weights of each model, which are then utilized as the feature weights of the stacked models. Features with higher weights are selected based on importance ranking, and a causal diagram is constructed using a causal inference framework46. In this framework, confounders are defined as variables directly influencing both the predicted outcome and the predictor. These confounders are pivotal factors contributing to AKI mortality rates47. Finally, Local Interpretable Model-Agnostic Explanations (LIME) is employed to analyze how specific values of different characteristics impact the model's predicted outcomes across various categories. This elucidation of clinical parameters leading to high patient mortality facilitates targeted interventions for potentially critical illnesses during clinical practice.
Data availability
The datasets generated during the current study are available in the GitHub (https://github.com/mengqings/Data_aki_all/tree/master and https://github.com/mengqings/eICU_Data_extract/tree/master) repository.
References
Ostermann, M. et al. Controversies in acute kidney injury: Conclusions from a Kidney Disease: Improving Global Outcomes (KDIGO) Conference. Kidney Int. 98, 294–309 (2020).
Khwaja, A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin. Pract. 120, c179–c184 (2012).
Susantitaphong, P. et al. World incidence of AKI: A meta-analysis. Clin. J. Am. Soc. Nephrol. 8, 1482–1493 (2013).
Mehta, R. L. et al. International Society of Nephrology’s 0by25 initiative for acute kidney injury (zero preventable deaths by 2025): A human rights case for nephrology. Lancet 385, 2616–2643 (2015).
Bhowal, P., Sen, S. & Sarkar, R. A two-tier feature selection method using Coalition game and Nystrom sampling for screening COVID-19 from chest X-Ray images. J. Ambient Intell. Hum. Comput. 14, 3659–3674 (2023).
Kursa, M. B., Jankowski, A. & Rudnicki, W. R. Boruta—A system for feature selection. Fundamenta Informaticae 101, 271–285 (2010).
Wu, L. et al. Feature ranking in predictive models for hospital-acquired acute kidney injury. Sci. Rep. 8, 17298 (2018).
Loftus, T. J. et al. Artificial intelligence-enabled decision support in nephrology. Nat. Rev. Nephrol. 18, 452–465 (2022).
Sabut, S., Patra, P. & Ray, A. Deep learning approach for classifying ischemic stroke using DWI sequences of brain MRIs. IJISTA 20, 524 (2022).
Tomašev, N. et al. Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records. Nat. Protoc. 16, 2765–2787 (2021).
Liu, K. et al. Development and validation of a personalized model with transfer learning for acute kidney injury risk estimation using electronic health records. JAMA Netw. Open 5, e2219776 (2022).
Churpek, M. M. et al. Internal and external validation of a machine learning risk score for acute kidney injury. JAMA Netw. Open 3, e2012892 (2020).
Cronin, R. M. et al. National Veterans Health Administration inpatient risk stratification models for hospital-acquired acute kidney injury. J. Am. Med. Inform. Assoc. 22, 1054–1071 (2015).
Bihorac, A. et al. MySurgeryRisk: Development and validation of a machine-learning risk algorithm for major complications and death after surgery. Ann. Surg. 269, 652–662 (2019).
Liu, J. et al. Mortality prediction based on imbalanced high-dimensional ICU big data. Comput. Ind. 98, 218–225 (2018).
Lauritsen, S. M. et al. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat. Commun. 11, 1–11 (2020).
Martinez, D. A. et al. Early prediction of acute kidney injury in the emergency department with machine-learning methods applied to electronic health record data. Ann. Emerg. Med. 76, 501–514 (2020).
Wu, C. et al. Predicting in-hospital outcomes of patients with acute kidney injury. Nat. Commun. 14, 1–9 (2023).
Yaqiang, Z. Research and Implementation of Death Risk Prediction Model for Septic Patients Based on Machine Learning (Beijing University of Posts and Telecommunications, 2022).
Shen, J. et al. Features Selection in a Predictive Model for Cardiac Surgery—Associated Acute Kidney Injury. https://www.researchsquare.com/article/rs-3103913/v1 (2023) https://doi.org/10.21203/rs.3.rs-3103913/v1.
Bell, S. et al. Risk of postoperative acute kidney injury in patients undergoing orthopaedic surgery—Development and validation of a risk score and effect of acute kidney injury on survival: Observational cohort study. BMJ (Clin. Res. Ed.) 351, h5639 (2015).
Maurya, N. S., Kushwah, S., Kushwaha, S., Chawade, A. & Mani, A. Prognostic model development for classification of colorectal adenocarcinoma by using machine learning model based on feature selection technique boruta. Sci. Rep. 13, 6413 (2023).
Zhang, B., Zhang, Y. & Jiang, X. Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm. Sci. Rep. 12, 9244 (2022).
Manju, N., Harish, B. S. & Prajwal, V. Ensemble feature selection and classification of internet traffic using XGBoost classifier. IJCNIS 11, 37–44 (2019).
Zhou, L., Nandal, A., Ganchev, T. & Dhaka, A. Breast cancer detection by fusion of deep features with CNN extracted features. IJISTA 20, 510 (2022).
Yue, S. et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J. Transl. Med. 20, 215 (2022).
Yang, J., Peng, H., Luo, Y., Zhu, T. & Xie, L. Explainable ensemble machine learning model for prediction of 28-day mortality risk in patients with sepsis-associated acute kidney injury. Front. Med. 10, (2023).
Song, X., Liu, X., Liu, F. & Wang, C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int. J. Med. Inform. 151, 104484 (2021).
Zhang, X. et al. Machine learning for the prediction of acute kidney injury in critical care patients with acute cerebrovascular disease. Ren. Fail. 44, 43–53 (2022).
Mistry, N. S. & Koyner, J. L. Artificial intelligence in acute kidney injury: From static to dynamic models. Adv. Chronic Kidney Dis. 28, 74–82 (2021).
Dong, J. et al. Machine learning model for early prediction of acute kidney injury (AKI) in pediatric critical care. Crit. Care. 25, 288 (2021).
Yang, L. et al. Acute kidney injury in China: A cross-sectional survey. Lancet 386, 1465–1471 (2015).
Song, X. et al. Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat. Commun. 11, 5668 (2020).
Zhang, Z., Ho, K. M. & Hong, Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit. Care 23, 112 (2019).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data. 3, 1–9 (2016).
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data. 10, 1–9 (2023).
Pollard, T. J. et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 180178 (2018).
Zhang, Z. Machine Learning method for the management of acute kidney injury: More than just treating biomarkers individually. Biomark. Med. 13, 1251–1253 (2019).
Yao, X. et al. Development of a nomogram model for predicting the risk of in-hospital death in patients with acute kidney injury. RMHP 14, 4457–4468 (2021).
Lee, C.-W. et al. A combination of SOFA score and biomarkers gives a better prediction of septic AKI and in-hospital mortality in critically ill surgical patients: a pilot study. World J. Emerg. Surg. 13, 41 (2018).
Li, F. et al. Prediction model of in-hospital mortality in intensive care unit patients with heart failure: Machine learning-based, retrospective analysis of the MIMIC-III database. BMJ Open 11, e044779 (2021).
Yang, D. et al. Development of a predictive nomogram for acute respiratory distress syndrome in patients with acute pancreatitis complicated with acute kidney injury. Ren. Fail. 45, 2251591 (2023).
Tang, F. & Ishwaran, H. Random forest missing data algorithms. Stat. Anal. Data Min. ASA Data Sci. J. 10, 363–377 (2017).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).
Wang, Y. et al. Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl. Soft Comput. 77, 188–204 (2019).
Zhang, Z. et al. Causal inference with marginal structural modeling for longitudinal data in laparoscopic surgery: A technical note. Laparosc. Endosc. Robot. Surg. 5, 146–152 (2022).
Zhang, Z. Distinguishing between mediators and confounders is important for the causal inference in observational studies. AME Med. J. 4, 35 (2019).
Acknowledgements
The authors greatly appreciate the editors and peer reviewers for their critical reading and insightful comments, which are helpful to improve our manuscript substantially. The authors express their gratitude to the researchers responsible for establishing and overseeing the MIMIC-III, MIMIC-IV, and EICU database. The authors would like to express their gratitude to the various project funds that financed this study.
Funding
Anhui Natural Science Foundation: Fault Diagnosis Research in Uncertain Environment under the Background of Industry 4.0, [Grant Number: KJ2021A0866]. National Natural Science Foundation of China [Grant Number: 82072228]. Natural General Research Project Fund of Shanghai University of Medicine & Health Sciences. National Natural Science Foundation of China [Grant Number: 62376152]. Three-Year Action Plan for Strengthening the Construction of Public Health System in Shanghai (2023–2025) of Construction project [Grant Number: GWVI-6]. Three-Year Action Plan for Strengthening the Construction of Public Health System in Shanghai (2023–2025) of Key discipline construction project (Grant No. GWVI-11.1-49).
Author information
Authors and Affiliations
Contributions
M.L.: Writing-original draft, software, methodology, data collection and analysis, investigation, formal, conceptualization analysis, writing. Z.F.: Review and editing. Y.G.: Visualization. V.M.: Adjusting the English syntax. R.W.: Review. W.L.: Supervision. N.X.: Validation. K.L.: Validation. Z.L.: Funding and idea. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, M., Fan, Z., Gao, Y. et al. A two-tier feature selection method for predicting mortality risk in ICU patients with acute kidney injury. Sci Rep 14, 16794 (2024). https://doi.org/10.1038/s41598-024-63793-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-63793-3