这是indexloc提供的服务,不要输入任何密码
Skip to main content

Mitigating bias in AI mortality predictions for minority populations: a transfer learning approach

Abstract

Background

The COVID-19 pandemic has highlighted the crucial role of artificial intelligence (AI) in predicting mortality and guiding healthcare decisions. However, AI models may perpetuate or exacerbate existing health disparities due to demographic biases, particularly affecting racial and ethnic minorities. The objective of this study is to investigate the demographic biases in AI models predicting COVID-19 mortality and to assess the effectiveness of transfer learning in improving model fairness across diverse demographic groups.

Methods

This retrospective cohort study used a population-based dataset of COVID-19 cases from the Centers for Disease Control and Prevention (CDC), spanning the years 2020–2024. The study analyzed AI model performance across different racial and ethnic groups and employed transfer learning techniques to improve model fairness by adapting pre-trained models to the specific demographic and clinical characteristics of the population.

Results

Decision Tree (DT) and Random Forest (RF) models consistently showed improvements in accuracy, precision, and ROC-AUC scores for Non-Hispanic Black, Hispanic/Latino, and Asian populations. The most significant precision improvement was observed in the DT model for Hispanic/Latino individuals, which increased from 0.3805 to 0.5265. The precision for Asians or Pacific Islanders in the DT model increased from 0.4727 to 0.6071, and for Non-Hispanic Blacks, it rose from 0.5492 to 0.6657. Gradient Boosting Machines (GBM) produced mixed results, showing accuracy and precision improvements for Non-Hispanic Black and Asian groups, but declines for the Hispanic/Latino and American Indian groups, with the most significant decline in precision, which dropped from 0.4612 to 0.2406 in the American Indian group. Logistic Regression (LR) demonstrated minimal changes across all metrics and groups. For the Non-Hispanic American Indian group, most models showed limited benefits, with several performance metrics either remaining stable or declining.

Conclusions

This study demonstrates the potential of AI in predicting COVID-19 mortality while also underscoring the critical need to address demographic biases. The application of transfer learning significantly improved the predictive performance of models across various racial and ethnic groups, suggesting these techniques are effective in mitigating biases and promoting fairness in AI models.

Peer Review reports

Background

Machine learning has been widely adopted in biomedical research, with numerous studies demonstrating its utility in predicting disease outcomes and informing clinical decision-making [1,2,3,4,5]. The COVID-19 pandemic has underscored the critical role of artificial intelligence (AI) in predicting mortality and informing healthcare decisions [6, 7]. AI models have been instrumental in forecasting outcomes and allocating resources, yet their effectiveness is contingent upon the accuracy and fairness of their predictions [8]. However, evidence suggests that these models may perpetuate or even exacerbate existing health disparities due to demographic biases, particularly affecting racial and ethnic minorities [9, 10]. For instance, studies have shown discrepancies in COVID-19 mortality predictions across different racial groups, raising concerns about the equity of AI-driven healthcare interventions [11].

The existence of health disparities among various demographic groups has been brought to light by COVID-19 pandemic, with the non-Hispanic Black population facing significantly higher mortality rates compared to the non-Hispanic White population [12, 13]. These disparities highlight the necessity of evaluating and ensuring the fairness of AI models in their predictions across diverse demographic groups. Furthermore, the integration of demographic considerations into AI models remains a challenge, often due to the lack of representative data and the complexity of bias mitigation techniques [14, 15].Bias mitigation in machine learning has been extensively studied, with methods categorized into three broad types: preprocessing, in-processing, and post-processing approaches [16,17,18]. Preprocessing methods such as rebalancing datasets and data augmentation, aim to adjust the dataset to reduce bias before training [19, 20]. In-processing methods such as adversarial debiasing and the incorporation of fairness-aware loss functions, aim to adjust the learning algorithm to reduce bias during training. Post-processing methods like fairness constraints and threshold adjustments, correct biased predictions after the model is trained. Each of these approaches has strengths and limitations, depending on the context and the nature of the bias being addressed.

We selected transfer learning as our primary bias mitigation technique due to its ability to adapt pre-trained models for new tasks with limited data while retaining prior knowledge. This is particularly valuable for addressing demographic bias, where underrepresented groups often have smaller datasets available for model training [21,22,23,24,25,26,27]. While transfer learning has been widely applied in AI research, our study’s novelty lies in its specific application to mitigate demographic biases in COVID-19 mortality predictions across racial and ethnic groups. This approach addresses a critical public health concern by targeting racial and ethnic minorities disproportionately affected by the pandemic. By adapting models trained on data from the non-Hispanic White population to better predict outcomes for the non-Hispanic Black population and other underrepresented groups, researchers can work towards more equitable AI tools in healthcare [28].

This study aims to identify the presence of demographic bias in AI models predicting pandemic mortality and to mitigate these biases, particularly focusing on the differences in prediction accuracy between non-Hispanic whites and minor populations. Through the use of transfer learning, we seek to improve the fairness and accuracy of these predictions, contributing to the development of more equitable healthcare interventions during pandemics and beyond [15].

Methods

Data Collection and Preprocessing

We obtained access to the “COVID-19 Case Surveillance Restricted Access Detailed Data” from the Centers for Disease Control and Prevention (CDC) prior to initiating the study. The data selected for analysis spans from January 1, 2020, to January 4, 2024 [29]. This study was determined to qualify for Not Human Subjects Research (NHSR) and to be exempt by the UTHSC Institutional Review Board.

From the initial set of 33 variables in the dataset, we identified 23 variables to serve as filtering conditions and features for model training, as detailed in Supplementary Table S1. The variable for feature “Did the patient die as a result of this illness?” was used as the target outcome. We filtered the data to retain only those cases where this feature’s value is “death.” The variable for feature “What is the current status of this person?” was used to retain only “Laboratory-confirmed case” statuses. Additionally, “Race” and “Ethnicity” variables were combined and used to filter the targeted population. Cases with invalid values, such as “Unknown,” “Missing,” or “NA,” were excluded from the analysis.

The dataset was divided into two parts: 80% for training the model on the Non-Hispanic White demographic, and the remaining 20% to evaluate the model’s efficacy. This partitioning strategy was similarly applied to other minority populations when adapting the model for these minority populations, to maintain a balanced representation of demographic characteristics across the datasets, ensuring a representative distribution of demographics in each.

Model Development and evaluation

We trained baseline machine learning models using the selected features from data on the Non-Hispanic White group. Several machine learning algorithms including “Random Forest(RF)” “Decision Tree(DT)” “Logistic Regression(LR)” “Gradient Boosting Machines (GBM)” were applied for training the model.

The predictive performances of these baseline models were tested with the remaining 20% of datasets using several metrics: accuracy, precision, recall, F1 Score, ROC-AUC(Receiver Operating Characteristic - Area Under the Curve)Score and AUCPR (Area Under the Precision-Recall Curve).

We then trained and tested the performance of base models for minority groups separately on the datasets from Non-Hispanic Black, Hispanic, Asian or Pacific Islander, and American Indian populations. Any potential biases will be assessed during this process. This approach allowed for a thorough evaluation of the models’ performance across different demographic groups, identifying any discrepancies and biases presented in the predictive outcomes. We also utilized Equalized Odds as an additional metric to evaluate fairness disparities and provide insight into whether the model’s predictions are biased across sensitive demographic attributes [30, 31].

Bias Mitigation

To enhance model performance on the minorities, we applied transfer learning techniques to adjust the baseline models for minorities. The base models developed for Non-Hispanic White group were fine-tuned on the data from underrepresented populations, including Non-Hispanic Blacks, Hispanics, American Indians and Asians or Pacific Islanders. Fine-tuning involves adjusting the model parameters and possibly retraining certain layers of the model with the new data to improve performance for the target minority groups. This process aimed to mitigate biases and enhance the predictive accuracy for these groups.

In addition, we reviewed alternative bias mitigation strategies, such as reweighting, adversarial debiasing, and preprocessing methods. These techniques are well-established in bias mitigation literature, but we prioritized transfer learning due to its ability to leverage pre-trained models and adapt to smaller datasets with minimal loss in generalizability.

We performed hyperparameter tuning using a 10-fold cross-validation approach. For each machine learning model, we focused on a few critical hyperparameters known to significantly impact performance:

Random Forest (RF): We tuned the number of trees, maximum depth of each tree, and the minimum samples required to split an internal node. These parameters control the model’s complexity and ability to capture intricate patterns in the data.

Gradient Boosting Machine (GBM): For GBM, we tuned the learning rate, number of boosting rounds, and maximum depth of each tree. These parameters are critical for controlling the learning process and balancing the model’s accuracy with generalizability.

Logistic Regression (LR): For Logistic Regression, we primarily tuned the regularization parameter (C), which controls the strength of the regularization applied to the model.

Decision Tree (DT): For Decision Trees, we focused on tuning the maximum depth and the minimum samples required to split a node, as these control the trees complexity and prevent overfitting.

Bias mitigation was achieved by selectively adjusting model parameters to improve accuracy for minority groups. For example, deeper trees in Random Forest were beneficial for groups where complex interactions in the data might influence outcomes, while a lower learning rate in Gradient Boosting prevented overfitting on smaller datasets for certain minority groups.

The 10-fold cross-validation process evaluated different combinations of these hyperparameters, selecting those that maximized accuracy, precision, recall, F1 score, ROC-AUC and AUCPR across the folds. Based on the cross-validation results, the optimal hyperparameters were chosen for each model. For instance, the Random Forest model achieved optimal performance with 100 trees, a maximum depth of 10, and a minimum sample split of 5. Similarly, the Gradient Boosting Machine showed the best performance with a learning rate of 0.01, 200 boosting rounds, and a maximum depth of 6. (Table 1)

Table 1 Summary of final hyperparameters

Incorporating fairness metrics such as Equalized Odds provided insights into True Positive Rate (TPR) and False Positive Rate (FPR) disparities across demographic groups. These metrics allowed us to quantify and monitor bias during the model fine-tuning process.

Comparison

The comparison of the performance metrics across baseline and fine-tuned models was conducted to evaluate the impact of transfer learning in mitigating demographic biases. This comparison involved examining the changes in key performance metrics, including accuracy, precision, recall, F1 score, ROC-AUC, and AUCPR, as well as fairness metrics such as Equalized Odds for each demographic group to identify potential improvements after fine-tuning (Fig. 1).

Fig. 1
figure 1

Structured workflow for bias identification and mitigation in demographic models

Results

After filtering out all missing and invalid values during the feature selection process, 365,608 COVID-19 cases remained. Among them, 252,125 cases were Non-Hispanic Whites, 43,304 were Non-Hispanic Blacks, 19,385 were Hispanics, 5,667 were Asians, and 651 were American Indians. The performance of the base models trained by Non-Hispanic White cases are listed in the Table 2.

Table 2 The Performance Metrics of different base models in non-hispanic white population

Among all model types chosen for training base models, Gradient Boosting Machines demonstrated best performance. We then proceeded to train predictive models for other race/ethnicity groups including Non-Hispanic Blacks, Hispanics, American Indians and Asians by using different model types to generate their own base models. The performance of the base models trained by different models are shown in the Table 3.

Table 3 The Performance Metrics of different base models across demographic groups

Subsequently, we fine-tuned the base models for other minor groups by applying transfer learning to transfer the knowledge from base models of the Non-Hispanic White population to minority groups. The performance metrics of various fine-tuned models are detailed in Table 4.

Table 4 The Performance Metrics of various fine-tuned models across demographic groups

The evaluation of Equalized Odds revealed minimal differences in True Positive Rates (TPR) and False Positive Rates (FPR) across racial and ethnic groups.

For the base models, TPR ranged between 0.52 and 0.56 across all racial/ethnic groups. FPR ranged between 0.006 and 0.015, indicating that the models exhibited low false-positive rates across all groups (Fig. 2).

For the fine-tuned models, fine-tuning resulted in slight improvements in TPR for some groups, such as the Black, Non-Hispanic group with the Decision Tree model (TPR improved from 0.56 to 0.56 and FPR decreased from 0.014 to 0.009). Similar patterns of modest improvement were observed for other groups, with no significant disparities between the base and fine-tuned models (Fig. 3).

Fig. 2
figure 2

True positive rates and false positive rates across racial and ethnic groups in base models

Fig. 3
figure 3

True positive rates and false positive rates across racial and ethnic groups in fine-tuned models

The impact of fine-tuning on the performance metrics varied across models and demographic groups. Decision Tree (DT) and Random Forest (RF) models demonstrated consistent improvements in accuracy, precision, and ROC-AUC scores across most populations, except for the Non-Hispanic American Indian group. Gradient Boosting Machines (GBM) showed mixed results, with improvements in accuracy and precision observed for Non-Hispanic Black individuals and precision gains for Asian or Pacific Islanders. Logistic Regression (LR) showed minimal changes across all metrics and groups, suggesting that transfer learning had the least influence on this type of model.

Accuracy and precision generally followed similar trends across groups, with precision often exhibiting more pronounced changes. For Non-Hispanic Blacks, DT precision increased significantly from 0.5492 to 0.6657, while for Asians or Pacific Islanders, it rose from 0.4727 to 0.6071. The most substantial improvement in precision was observed in the DT model for Hispanic/Latino individuals, increasing from 0.3805 to 0.5265. These findings suggest that precision was more sensitive to fine-tuning than accuracy.

Recall showed declines in most cases, remained stable in some, and improved in one instance: RF recall increased for Asians or Pacific Islanders from 0.4587 to 0.4679. These declines highlight a trade-off where fine-tuning prioritized reducing false positives over false negatives.

AUCPR improvements were primarily observed in DT and RF models for Non-Hispanic Black and Asian groups. The most significant gain was seen in the DT model for Non-Hispanic Blacks, where AUCPR increased from 0.4477 to 0.5573, demonstrating enhanced handling of imbalanced datasets.

F1 scores reflected the interplay between precision and recall. Notable increases were observed in DT models for Non-Hispanic Black and Hispanic/Latino individuals and in DT and RF models for Asians or Pacific Islanders. However, F1 scores decreased in other groups, indicating that reductions in recall outweighed precision gains in those cases.

DT models exhibited consistent improvements in accuracy, precision, F1 score, and ROC-AUC Score across three minority populations: Non-Hispanic Black, Hispanic/Latino, and Asian or Pacific Islander individuals with the most significant improvement observed in precision for Hispanic/Latino individuals (from 0.3805 to 0.5265). However, these metrics showed little or no improvement for Non-Hispanic American Indians. RF models similarly demonstrated improvements across accuracy, precision, ROC-AUC scores, and AUCPR for the same three minority populations. For Non-Hispanic American Indians, RF performance either declined or remained stable, indicating limited benefits of fine-tuning for this group. GBM results were mixed, with some metrics improving while others showed declines. Notably, AUCPR for GBM in Non-Hispanic American Indians decreased dramatically from 0.4612 to 0.2406. LR performance remained nearly unchanged across all metrics and demographic groups, indicating that transfer learning had little to no impact on this model type.For Non-Hispanic Black, Hispanic/Latino, and Asian or Pacific Islander populations, both the DT and RF models showed substantial improvements in accuracy, precision, ROC-AUC Score, and AUCPR, whereas the GBM model yielded mixed results. In contrast, the Non-Hispanic American Indian population experienced only minimal benefits from fine-tuning, with most models either maintaining stable or declining performance metrics.

Discussion

The results of our study underscore the importance and potential of using AI to predict COVID-19 mortality while highlighting the critical need to address demographic biases inherent in these models. Our findings reveal that while AI models can achieve high accuracy, they often exhibit disparities in performance across different racial and ethnic groups, which can exacerbate existing health inequities.

Our baseline models, trained predominantly on data from the Non-Hispanic White population, demonstrated considerable performance differences when applied to other racial and ethnic groups. The discrepancies in predictive accuracy, precision, recall, F1 scores ROC-AUC Score and AUCPR between Non-Hispanic White and other racial and ethnic populations point to significant demographic biases. These biases likely come from historical inequities in healthcare access, socioeconomic factors, and the underrepresentation of minority populations in the training datasets [32]. Unequal access to hospitals, often due to geographic or financial barriers, plays a significant role [33]. For instance, minority populations may live in areas with fewer healthcare facilities, leading to delayed or inadequate treatment. Financial condition can also limit access to quality health care, as individuals may not be able to afford necessary treatments or insurance coverage [34]. The socioeconomic factors include disparities in income, education, and employment, which influence underlying health conditions and health outcomes [35]. Lower income levels can lead to poorer nutrition, increased stress, and limited access to healthcare services, all of which contribute to worse health conditions and health outcomes. Educational disparities may result in a lack of health interpretation, making it harder for individuals to understand health information and navigate the healthcare system effectively. Employment-related factors, such as jobs with higher physical demands and lower job security, can also negatively impact health outcomes. The underrepresentation of minority populations in the training datasets often arises from the systemic exclusion of these groups in clinical trials and research studies. This exclusion can be due to a lack of outreach, mistrust of the healthcare system, or challenges in recruiting diverse participants [36]. As a result, AI models trained on these datasets may not accurately reflect the health needs or risks of minority populations, leading to biased predictions and perpetuating health disparities.

Our results also revealed that precision, recall, and F1 score were particularly sensitive to the population-specific differences in the performance of the models. For instance, precision improved significantly for Hispanic/Latino and Asian or Pacific Islander populations in the DT model, indicating better identification of high-risk individuals. However, recall generally declined across most groups, suggesting a trade-off in which false positives were reduced at the expense of false negatives. These changes underscore the necessity of evaluating multiple metrics to fully understand the implications of AI predictions on different populations.

The application of transfer learning techniques in our study provided a promising approach to mitigate these biases. By adapting the baseline models to better predict outcomes for underrepresented groups, we observed substantial improvements in performance for minority populations, particularly in precision and accuracy, suggesting that models like Decision Tree and Random Forest were more adept at adjusting to the demographic-specific needs of these groups. These results demonstrate the effectiveness of transfer learning in addressing biases and improving model performance for underrepresented groups. Fine-tuning through transfer learning demonstrated marginal but consistent improvements in TPR and reductions in FPR for minority groups, particularly for models like Random Forest and Decision Tree. These improvements reinforce the value of transfer learning in adapting models to better serve underrepresented populations while maintaining fairness. However, these benefits were not universally observed across all populations. For instance, the Non-Hispanic American Indian group showed limited or no improvements across key metrics, indicating that while transfer learning can significantly improve performance for certain groups, its impact may vary depending on specific demographic characteristics and the quality and quantity of available data. This emphasizes the necessity of continuously refining and validating AI models with diverse and representative datasets to ensure equitable healthcare outcomes. Ensuring that these models are trained on data that accurately reflects the diversity of the population is crucial to achieving equitable healthcare outcomes. The limitations observed in some groups suggest that transfer learning alone may not be sufficient to fully address all biases. However, it still should be part of an effort to create more inclusive and fair AI systems.

The improved performance of our AI models for minority populations through transfer learning holds significant implications for healthcare delivery. Enhanced predictive performance can lead to better resource allocation, more tailored healthcare interventions, and ultimately, improved health outcomes for these groups [37]. For instance, accurate predictions can help in identifying high-risk individuals who may benefit from early interventions, thus preventing severe outcomes and reduce mortality rates. This approach not only improves individual health outcomes but also alleviates the burden on healthcare systems, particularly during public health crises like the COVID-19 pandemic. During crises, where timely and accurate predictions are essential, equitable AI models can ensure that vulnerable populations receive the attention and resources they need.

Moreover, the integration of transfer learning and other fairness-enhancing techniques should be prioritized in the development of AI models, particularly in the context of public health emergencies where the impact on marginalized populations can be profound [38]. Transfer learning enables models to leverage pre-existing knowledge and adapt to new, often limited datasets, which is particularly valuable when dealing with diverse and varied health data. This approach helps improve model performance across different demographic groups by adjusting to the unique characteristics and health needs of each group [39]. In addition to transfer learning, implementing fairness-enhancing techniques, such as bias mitigation algorithms and inclusive dataset practices, can further address and reduce the systemic biases that may persist in AI systems [38]. By addressing and reducing all these biases, AI can play a crucial role in reducing health disparities, ensuring equitable care for all populations and improving health outcomes during emergencies.

The potential for integrating this model into clinical workflows is particularly relevant in the context of public health emergencies, where rapid and accurate mortality predictions can significantly impact resource allocation and patient triage. By providing early identification of high-risk individuals, this model could support healthcare professionals in prioritizing care and allocating critical resources, such as ICU beds and ventilators, more effectively. Additionally, implementing this model in clinical decision support systems could enable ongoing, data-driven adjustments as demographic-specific risk patterns emerge, allowing for adaptive responses to evolving health crises. With appropriate safeguards and continuous validation, the model could be deployed as an aid in public health planning, enhancing preparedness and response strategies in future pandemics and other high-impact health events.

Limitations and future directions

Despite the promising results, our study has limitations that warrant consideration. The reliance on historical data may inherently reflect past biases, and while transfer learning helps mitigate these, it may not entirely eliminate them. Additionally, the model’s performance is dependent on the quality and representativeness of the training data, which underscores the need for comprehensive and inclusive data collection practices. While machine learning models can be fine-tuned to improve performance across various racial groups, the degree of improvement varies significantly by both model type and racial/ethnic group. The study is also limited by its exclusive use of CDC COVID-19 data, and we recognize that cross-dataset validation would be valuable for assessing the model’s generalizability to other contexts.

We also recognize the limitations of transfer learning in fully addressing demographic biases, given the complex social determinants that impact health outcomes. While our approach improves fairness in predictive accuracy, further research is needed to explore additional methods that can complement transfer learning, such as incorporating contextual social and environmental data. Addressing these ethical concerns is essential to ensuring that AI-driven healthcare interventions contribute to equitable outcomes across all population groups.

Future research should focus on developing more sophisticated bias mitigation techniques and exploring other AI methodologies that can further enhance fairness and accuracy. Moreover, continuous monitoring and updating of AI models are essential to adapt to evolving health dynamics and demographic changes.

Conclusion

In conclusion, our study demonstrates the potential of AI in predicting COVID-19 mortality while also underscoring the critical need to address demographic biases to ensure equitable healthcare outcomes. Through the application of transfer learning, we significantly improved the predictive performance of models across various racial and ethnic groups. Future research should explore additional bias mitigation strategies and consider the dynamic nature of health data to further enhance model performance.

Data availability

All data analyzed during this study are available from the Centers for Disease Control and Prevention (CDC) repository. We obtained access to the ‘COVID-19 Case Surveillance Restricted Access Detailed Data’ from the CDC, with data spanning from January 1, 2020, to January 4, 2024. These datasets can be accessed through the CDC, subject to applicable restrictions and access policies.

Abbreviations

AI:

Artificial Intelligence

AUCPR:

Area Under the Precision-Recall Curve

CDC:

Centers for Disease Control and Prevention

RF:

Random Forest

DT:

Decision Tree

GBM:

Gradient Boosting Machines

LR:

Logistic Regression

ROC-AUC:

Receiver Operating Characteristic - Area Under the Curve

NIMHD:

National Institute on Minority Health and Health Disparities

References

  1. Nguyen HS, Ho DKN, Nguyen NN, Tran HM, Tam KW, Le NQK. Predicting EGFR Mutation Status in Non-small Cell Lung Cancer using Artificial Intelligence: a systematic review and Meta-analysis. Acad Radiol. 2024;31(2):660–83.

    Article  PubMed  Google Scholar 

  2. Kha QH, Le VH, Hung TNK, Nguyen NTK, Le NQK. Development and validation of an explainable machine learning-based prediction model for drug-food interactions from Chemical structures. Sens (Basel). 2023;23(8).

  3. Chalkidis G, McPherson JP, Beck A, Newman MG, Guo JW, Sloss EA, Staes CJ. External validation of a machine learning model to Predict 6-Month Mortality for patients with Advanced Solid tumors. JAMA Netw Open. 2023;6(8):e2327193.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Zhu J, Yang F, Wang Y, Wang Z, Xiao Y, Wang L, Sun L. Accuracy of machine learning in discriminating Kawasaki Disease and other Febrile illnesses: systematic review and Meta-analysis. J Med Internet Res. 2024;26:e57641.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Zhu L, Yao Y. Prediction of the risk of mortality in older patients with coronavirus disease 2019 using blood markers and machine learning. Front Immunol. 2024;15:1445618.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wang L, Zhang Y, Wang D, Tong X, Liu T, Zhang S et al. Artificial Intelligence for COVID-19: a systematic review. Front Med. 2021;8.

  7. Aksenen CF, Ferreira DMA, Jeronimo PMC, Costa TO, de Souza TC, Lino BMNS, et al. Enhancing SARS-CoV-2 lineage surveillance through the integration of a simple and direct qPCR-based protocol adaptation with established machine learning algorithms. Anal Chem. 2024;96(46):18537–44.

  8. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53.

    Article  CAS  PubMed  Google Scholar 

  10. Vyas DA, Eisenstein LG, Jones DS. Hidden in Plain Sight - reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874–82.

    Article  PubMed  Google Scholar 

  11. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018;178(11):1544–7.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Millett GA, Jones AT, Benkeser D, Baral S, Mercer L, Beyrer C, et al. Assessing differential impacts of COVID-19 on black communities. Ann Epidemiol. 2020;47:37–44.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Mackey K, Ayers CK, Kondo KK, Saha S, Advani SM, Young S, et al. Racial and ethnic disparities in COVID-19-Related infections, hospitalizations, and deaths: a systematic review. Ann Intern Med. 2021;174(3):362–73.

    Article  PubMed  Google Scholar 

  14. Mittermaier M, Raza MM, Kvedar JC. Bias in AI-based models for medical applications: challenges and mitigation strategies. NPJ Digit Med. 2023;6(1):113.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in Healthcare. Annu Rev Biomed Data Sci. 2021;4:123–44.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Pagano TP, Loureiro RB, Lisboa FVN, Peixoto RM, Guimarães GAS, Cruz GOR et al. Bias and Unfairness in Machine Learning models: a systematic review on datasets, Tools, Fairness Metrics, and identification and mitigation methods. Big Data Cogn Comput [Internet]. 2023; 7(1).

  17. Malhotra A, Thulal AN, editors. A Comparative Analysis of Bias Mitigation Methods in Machine Learning. 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO); 2024 14–15 March 2024.

  18. Siddique S, Haque MA, George R, Gupta KD, Gupta D, Faruk MJ. Survey on Machine Learning biases and Mitigation techniques. Digit [Internet]. 2024;4(1):1–68. pp.].

    Google Scholar 

  19. Zhang Z, Wang S, Meng G, editors. A review on pre-processing methods for fairness in machine learning. In: Advances in natural computation, fuzzy systems and knowledge discovery. Cham: Springer International Publishing; 2023.

  20. Shahrezaei MH, Loughran R, Daid KM, editors. Pre-processing techniques to mitigate against algorithmic bias. In: 2023 31st Irish conference on artificial intelligence and science C. AICS; 2023. pp. 7–8.

  21. Hosna A, Merry E, Gyalmo J, Alom Z, Aung Z, Azim MA. Transfer learning: a friendly introduction. J Big Data. 2022;9(1):102.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Zou Q, Xie S, Lin Z, Wu M, Ju Y. Finding the best classification threshold in Imbalanced classification. Big Data Res. 2016;5:2–8.

    Article  Google Scholar 

  23. Toseef M, Olayemi Petinrin O, Wang F, Rahaman S, Liu Z, Li X, Wong KC. Deep transfer learning for clinical decision-making based on high-throughput data: comprehensive survey with benchmark results. Brief Bioinform. 2023;24(4).

  24. Dalkıran A, Atakan A, Rifaioğlu AS, Martin MJ, Atalay R, Acar AC, et al. Transfer learning for drug-target interaction prediction. Bioinformatics. 2023;39(39 Suppl 1):i103–10.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Clark M, Meyer C, Ramos-Cejudo J, Elbers DC, Pierce-Murray K, Fricks R, et al. Transfer learning for Mortality Prediction in Non-small Cell Lung Cancer with Low-Resolution Histopathology Slide snapshots. Stud Health Technol Inf. 2024;310:735–9.

    Google Scholar 

  26. Mendel K, Li H, Sheth D, Giger M. Transfer learning from convolutional neural networks for computer-aided diagnosis: a comparison of digital breast tomosynthesis and full-field Digital Mammography. Acad Radiol. 2019;26(6):735–43.

    Article  PubMed  Google Scholar 

  27. Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer learning for classification of Alzheimer’s Disease based on genome wide data. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(5):2700–11.

    Article  CAS  PubMed  Google Scholar 

  28. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Centers for Disease Control and Prevention C-RC-CSRDA. Summary, and limitations (dataset access date: June 21, 2024).

  30. Yuan C, Linn KA, Hubbard RA. Algorithmic Fairness of Machine Learning models for Alzheimer Disease Progression. JAMA Netw Open. 2023;6(11):e2342203.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Ganta T, Kia A, Parchure P, Wang MH, Besculides M, Mazumdar M, Smith CB. Fairness in Predicting Cancer Mortality Across racial subgroups. JAMA Netw Open. 2024;7(7):e2421290.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat Med. 2020;26(1):16–7.

    Article  CAS  PubMed  Google Scholar 

  33. Peek ME, Simons RA, Parker WF, Ansell DA, Rogers SO, Edmonds BT. COVID-19 among African americans: an Action Plan for Mitigating disparities. Am J Public Health. 2021;111(2):286–92.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Caraballo C, Massey D, Mahajan S, Lu Y, Annapureddy AR, Roy B et al. Racial and Ethnic Disparities in Access to Health Care Among Adults in the United States: A 20-Year National Health Interview Survey Analysis, 1999–2018. medRxiv. 2020.

  35. Williams DR, Cooper LA. Reducing racial inequities in Health: using what we already know to take action. Int J Environ Res Public Health. 2019;16(4).

  36. George S, Duran N, Norris K. A systematic review of barriers and facilitators to minority research participation among African americans, latinos, Asian americans, and Pacific islanders. Am J Public Health. 2014;104(2):e16–31.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Obermeyer Z, Emanuel EJ. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375(13):1216–9.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–9.

    Article  CAS  PubMed  Google Scholar 

  39. Wang F, Casalino LP, Khullar D. Deep learning in Medicine-Promise, Progress, and challenges. JAMA Intern Med. 2019;179(3):293–4.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the contributions of all individuals and institutions involved in the data collection and analysis process.

Funding

This research was supported by the National Institute on Minority Health and Health Disparities (NIMHD) of the National Institutes of Health (NIH) under Award Number R01MD018766.

Author information

Authors and Affiliations

Contributions

T.G. and M.L. conceptualized and designed the study. T.G., J.Y., and G.J. collected the data. T.G. and M.L. performed the data analysis and interpretation. W.P., X.M., and Y.W. validated the results and contributed to the methodology. T.G. drafted the original manuscript, while M.L. and Y.W. provided critical revisions. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Minghui Li.

Ethics declarations

Ethics approval and consent to participate

This study utilized data from the “COVID-19 Case Surveillance Restricted Access Detailed Data” provided by the Centers for Disease Control and Prevention (CDC). The research was determined to qualify as Not Human Subjects Research (NHSR) and was exempt from ethical review by the UTHSC Institutional Review Board. Informed consent was not required, and a waiver was granted by the UTHSC IRB. The study adhered to the ethical principles outlined in the Declaration of Helsinki.

Consent for publication

Not applicable.

Disclaimer

The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of any organization. The CDC does not take responsibility for the scientific validity or accuracy of methodology, results, statistical analyses, or conclusions presented.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, T., Pan, W., Yu, J. et al. Mitigating bias in AI mortality predictions for minority populations: a transfer learning approach. BMC Med Inform Decis Mak 25, 30 (2025). https://doi.org/10.1186/s12911-025-02862-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12911-025-02862-7

Keywords