- Research
- Open access
- Published:
Mitigating bias in AI mortality predictions for minority populations: a transfer learning approach
BMC Medical Informatics and Decision Making volume 25, Article number: 30 (2025)
Abstract
Background
The COVID-19 pandemic has highlighted the crucial role of artificial intelligence (AI) in predicting mortality and guiding healthcare decisions. However, AI models may perpetuate or exacerbate existing health disparities due to demographic biases, particularly affecting racial and ethnic minorities. The objective of this study is to investigate the demographic biases in AI models predicting COVID-19 mortality and to assess the effectiveness of transfer learning in improving model fairness across diverse demographic groups.
Methods
This retrospective cohort study used a population-based dataset of COVID-19 cases from the Centers for Disease Control and Prevention (CDC), spanning the years 2020–2024. The study analyzed AI model performance across different racial and ethnic groups and employed transfer learning techniques to improve model fairness by adapting pre-trained models to the specific demographic and clinical characteristics of the population.
Results
Decision Tree (DT) and Random Forest (RF) models consistently showed improvements in accuracy, precision, and ROC-AUC scores for Non-Hispanic Black, Hispanic/Latino, and Asian populations. The most significant precision improvement was observed in the DT model for Hispanic/Latino individuals, which increased from 0.3805 to 0.5265. The precision for Asians or Pacific Islanders in the DT model increased from 0.4727 to 0.6071, and for Non-Hispanic Blacks, it rose from 0.5492 to 0.6657. Gradient Boosting Machines (GBM) produced mixed results, showing accuracy and precision improvements for Non-Hispanic Black and Asian groups, but declines for the Hispanic/Latino and American Indian groups, with the most significant decline in precision, which dropped from 0.4612 to 0.2406 in the American Indian group. Logistic Regression (LR) demonstrated minimal changes across all metrics and groups. For the Non-Hispanic American Indian group, most models showed limited benefits, with several performance metrics either remaining stable or declining.
Conclusions
This study demonstrates the potential of AI in predicting COVID-19 mortality while also underscoring the critical need to address demographic biases. The application of transfer learning significantly improved the predictive performance of models across various racial and ethnic groups, suggesting these techniques are effective in mitigating biases and promoting fairness in AI models.
Background
Machine learning has been widely adopted in biomedical research, with numerous studies demonstrating its utility in predicting disease outcomes and informing clinical decision-making [1,2,3,4,5]. The COVID-19 pandemic has underscored the critical role of artificial intelligence (AI) in predicting mortality and informing healthcare decisions [6, 7]. AI models have been instrumental in forecasting outcomes and allocating resources, yet their effectiveness is contingent upon the accuracy and fairness of their predictions [8]. However, evidence suggests that these models may perpetuate or even exacerbate existing health disparities due to demographic biases, particularly affecting racial and ethnic minorities [9, 10]. For instance, studies have shown discrepancies in COVID-19 mortality predictions across different racial groups, raising concerns about the equity of AI-driven healthcare interventions [11].
The existence of health disparities among various demographic groups has been brought to light by COVID-19 pandemic, with the non-Hispanic Black population facing significantly higher mortality rates compared to the non-Hispanic White population [12, 13]. These disparities highlight the necessity of evaluating and ensuring the fairness of AI models in their predictions across diverse demographic groups. Furthermore, the integration of demographic considerations into AI models remains a challenge, often due to the lack of representative data and the complexity of bias mitigation techniques [14, 15].Bias mitigation in machine learning has been extensively studied, with methods categorized into three broad types: preprocessing, in-processing, and post-processing approaches [16,17,18]. Preprocessing methods such as rebalancing datasets and data augmentation, aim to adjust the dataset to reduce bias before training [19, 20]. In-processing methods such as adversarial debiasing and the incorporation of fairness-aware loss functions, aim to adjust the learning algorithm to reduce bias during training. Post-processing methods like fairness constraints and threshold adjustments, correct biased predictions after the model is trained. Each of these approaches has strengths and limitations, depending on the context and the nature of the bias being addressed.
We selected transfer learning as our primary bias mitigation technique due to its ability to adapt pre-trained models for new tasks with limited data while retaining prior knowledge. This is particularly valuable for addressing demographic bias, where underrepresented groups often have smaller datasets available for model training [21,22,23,24,25,26,27]. While transfer learning has been widely applied in AI research, our study’s novelty lies in its specific application to mitigate demographic biases in COVID-19 mortality predictions across racial and ethnic groups. This approach addresses a critical public health concern by targeting racial and ethnic minorities disproportionately affected by the pandemic. By adapting models trained on data from the non-Hispanic White population to better predict outcomes for the non-Hispanic Black population and other underrepresented groups, researchers can work towards more equitable AI tools in healthcare [28].
This study aims to identify the presence of demographic bias in AI models predicting pandemic mortality and to mitigate these biases, particularly focusing on the differences in prediction accuracy between non-Hispanic whites and minor populations. Through the use of transfer learning, we seek to improve the fairness and accuracy of these predictions, contributing to the development of more equitable healthcare interventions during pandemics and beyond [15].
Methods
Data Collection and Preprocessing
We obtained access to the “COVID-19 Case Surveillance Restricted Access Detailed Data” from the Centers for Disease Control and Prevention (CDC) prior to initiating the study. The data selected for analysis spans from January 1, 2020, to January 4, 2024 [29]. This study was determined to qualify for Not Human Subjects Research (NHSR) and to be exempt by the UTHSC Institutional Review Board.
From the initial set of 33 variables in the dataset, we identified 23 variables to serve as filtering conditions and features for model training, as detailed in Supplementary Table S1. The variable for feature “Did the patient die as a result of this illness?” was used as the target outcome. We filtered the data to retain only those cases where this feature’s value is “death.” The variable for feature “What is the current status of this person?” was used to retain only “Laboratory-confirmed case” statuses. Additionally, “Race” and “Ethnicity” variables were combined and used to filter the targeted population. Cases with invalid values, such as “Unknown,” “Missing,” or “NA,” were excluded from the analysis.
The dataset was divided into two parts: 80% for training the model on the Non-Hispanic White demographic, and the remaining 20% to evaluate the model’s efficacy. This partitioning strategy was similarly applied to other minority populations when adapting the model for these minority populations, to maintain a balanced representation of demographic characteristics across the datasets, ensuring a representative distribution of demographics in each.
Model Development and evaluation
We trained baseline machine learning models using the selected features from data on the Non-Hispanic White group. Several machine learning algorithms including “Random Forest(RF)” “Decision Tree(DT)” “Logistic Regression(LR)” “Gradient Boosting Machines (GBM)” were applied for training the model.
The predictive performances of these baseline models were tested with the remaining 20% of datasets using several metrics: accuracy, precision, recall, F1 Score, ROC-AUC(Receiver Operating Characteristic - Area Under the Curve)Score and AUCPR (Area Under the Precision-Recall Curve).
We then trained and tested the performance of base models for minority groups separately on the datasets from Non-Hispanic Black, Hispanic, Asian or Pacific Islander, and American Indian populations. Any potential biases will be assessed during this process. This approach allowed for a thorough evaluation of the models’ performance across different demographic groups, identifying any discrepancies and biases presented in the predictive outcomes. We also utilized Equalized Odds as an additional metric to evaluate fairness disparities and provide insight into whether the model’s predictions are biased across sensitive demographic attributes [30, 31].
Bias Mitigation
To enhance model performance on the minorities, we applied transfer learning techniques to adjust the baseline models for minorities. The base models developed for Non-Hispanic White group were fine-tuned on the data from underrepresented populations, including Non-Hispanic Blacks, Hispanics, American Indians and Asians or Pacific Islanders. Fine-tuning involves adjusting the model parameters and possibly retraining certain layers of the model with the new data to improve performance for the target minority groups. This process aimed to mitigate biases and enhance the predictive accuracy for these groups.
In addition, we reviewed alternative bias mitigation strategies, such as reweighting, adversarial debiasing, and preprocessing methods. These techniques are well-established in bias mitigation literature, but we prioritized transfer learning due to its ability to leverage pre-trained models and adapt to smaller datasets with minimal loss in generalizability.
We performed hyperparameter tuning using a 10-fold cross-validation approach. For each machine learning model, we focused on a few critical hyperparameters known to significantly impact performance:
Random Forest (RF): We tuned the number of trees, maximum depth of each tree, and the minimum samples required to split an internal node. These parameters control the model’s complexity and ability to capture intricate patterns in the data.
Gradient Boosting Machine (GBM): For GBM, we tuned the learning rate, number of boosting rounds, and maximum depth of each tree. These parameters are critical for controlling the learning process and balancing the model’s accuracy with generalizability.
Logistic Regression (LR): For Logistic Regression, we primarily tuned the regularization parameter (C), which controls the strength of the regularization applied to the model.
Decision Tree (DT): For Decision Trees, we focused on tuning the maximum depth and the minimum samples required to split a node, as these control the trees complexity and prevent overfitting.
Bias mitigation was achieved by selectively adjusting model parameters to improve accuracy for minority groups. For example, deeper trees in Random Forest were beneficial for groups where complex interactions in the data might influence outcomes, while a lower learning rate in Gradient Boosting prevented overfitting on smaller datasets for certain minority groups.
The 10-fold cross-validation process evaluated different combinations of these hyperparameters, selecting those that maximized accuracy, precision, recall, F1 score, ROC-AUC and AUCPR across the folds. Based on the cross-validation results, the optimal hyperparameters were chosen for each model. For instance, the Random Forest model achieved optimal performance with 100 trees, a maximum depth of 10, and a minimum sample split of 5. Similarly, the Gradient Boosting Machine showed the best performance with a learning rate of 0.01, 200 boosting rounds, and a maximum depth of 6. (Table 1)
Incorporating fairness metrics such as Equalized Odds provided insights into True Positive Rate (TPR) and False Positive Rate (FPR) disparities across demographic groups. These metrics allowed us to quantify and monitor bias during the model fine-tuning process.
Comparison
The comparison of the performance metrics across baseline and fine-tuned models was conducted to evaluate the impact of transfer learning in mitigating demographic biases. This comparison involved examining the changes in key performance metrics, including accuracy, precision, recall, F1 score, ROC-AUC, and AUCPR, as well as fairness metrics such as Equalized Odds for each demographic group to identify potential improvements after fine-tuning (Fig. 1).
Results
After filtering out all missing and invalid values during the feature selection process, 365,608 COVID-19 cases remained. Among them, 252,125 cases were Non-Hispanic Whites, 43,304 were Non-Hispanic Blacks, 19,385 were Hispanics, 5,667 were Asians, and 651 were American Indians. The performance of the base models trained by Non-Hispanic White cases are listed in the Table 2.
Among all model types chosen for training base models, Gradient Boosting Machines demonstrated best performance. We then proceeded to train predictive models for other race/ethnicity groups including Non-Hispanic Blacks, Hispanics, American Indians and Asians by using different model types to generate their own base models. The performance of the base models trained by different models are shown in the Table 3.
Subsequently, we fine-tuned the base models for other minor groups by applying transfer learning to transfer the knowledge from base models of the Non-Hispanic White population to minority groups. The performance metrics of various fine-tuned models are detailed in Table 4.
The evaluation of Equalized Odds revealed minimal differences in True Positive Rates (TPR) and False Positive Rates (FPR) across racial and ethnic groups.
For the base models, TPR ranged between 0.52 and 0.56 across all racial/ethnic groups. FPR ranged between 0.006 and 0.015, indicating that the models exhibited low false-positive rates across all groups (Fig. 2).
For the fine-tuned models, fine-tuning resulted in slight improvements in TPR for some groups, such as the Black, Non-Hispanic group with the Decision Tree model (TPR improved from 0.56 to 0.56 and FPR decreased from 0.014 to 0.009). Similar patterns of modest improvement were observed for other groups, with no significant disparities between the base and fine-tuned models (Fig. 3).
The impact of fine-tuning on the performance metrics varied across models and demographic groups. Decision Tree (DT) and Random Forest (RF) models demonstrated consistent improvements in accuracy, precision, and ROC-AUC scores across most populations, except for the Non-Hispanic American Indian group. Gradient Boosting Machines (GBM) showed mixed results, with improvements in accuracy and precision observed for Non-Hispanic Black individuals and precision gains for Asian or Pacific Islanders. Logistic Regression (LR) showed minimal changes across all metrics and groups, suggesting that transfer learning had the least influence on this type of model.
Accuracy and precision generally followed similar trends across groups, with precision often exhibiting more pronounced changes. For Non-Hispanic Blacks, DT precision increased significantly from 0.5492 to 0.6657, while for Asians or Pacific Islanders, it rose from 0.4727 to 0.6071. The most substantial improvement in precision was observed in the DT model for Hispanic/Latino individuals, increasing from 0.3805 to 0.5265. These findings suggest that precision was more sensitive to fine-tuning than accuracy.
Recall showed declines in most cases, remained stable in some, and improved in one instance: RF recall increased for Asians or Pacific Islanders from 0.4587 to 0.4679. These declines highlight a trade-off where fine-tuning prioritized reducing false positives over false negatives.
AUCPR improvements were primarily observed in DT and RF models for Non-Hispanic Black and Asian groups. The most significant gain was seen in the DT model for Non-Hispanic Blacks, where AUCPR increased from 0.4477 to 0.5573, demonstrating enhanced handling of imbalanced datasets.
F1 scores reflected the interplay between precision and recall. Notable increases were observed in DT models for Non-Hispanic Black and Hispanic/Latino individuals and in DT and RF models for Asians or Pacific Islanders. However, F1 scores decreased in other groups, indicating that reductions in recall outweighed precision gains in those cases.
DT models exhibited consistent improvements in accuracy, precision, F1 score, and ROC-AUC Score across three minority populations: Non-Hispanic Black, Hispanic/Latino, and Asian or Pacific Islander individuals with the most significant improvement observed in precision for Hispanic/Latino individuals (from 0.3805 to 0.5265). However, these metrics showed little or no improvement for Non-Hispanic American Indians. RF models similarly demonstrated improvements across accuracy, precision, ROC-AUC scores, and AUCPR for the same three minority populations. For Non-Hispanic American Indians, RF performance either declined or remained stable, indicating limited benefits of fine-tuning for this group. GBM results were mixed, with some metrics improving while others showed declines. Notably, AUCPR for GBM in Non-Hispanic American Indians decreased dramatically from 0.4612 to 0.2406. LR performance remained nearly unchanged across all metrics and demographic groups, indicating that transfer learning had little to no impact on this model type.For Non-Hispanic Black, Hispanic/Latino, and Asian or Pacific Islander populations, both the DT and RF models showed substantial improvements in accuracy, precision, ROC-AUC Score, and AUCPR, whereas the GBM model yielded mixed results. In contrast, the Non-Hispanic American Indian population experienced only minimal benefits from fine-tuning, with most models either maintaining stable or declining performance metrics.
Discussion
The results of our study underscore the importance and potential of using AI to predict COVID-19 mortality while highlighting the critical need to address demographic biases inherent in these models. Our findings reveal that while AI models can achieve high accuracy, they often exhibit disparities in performance across different racial and ethnic groups, which can exacerbate existing health inequities.
Our baseline models, trained predominantly on data from the Non-Hispanic White population, demonstrated considerable performance differences when applied to other racial and ethnic groups. The discrepancies in predictive accuracy, precision, recall, F1 scores ROC-AUC Score and AUCPR between Non-Hispanic White and other racial and ethnic populations point to significant demographic biases. These biases likely come from historical inequities in healthcare access, socioeconomic factors, and the underrepresentation of minority populations in the training datasets [32]. Unequal access to hospitals, often due to geographic or financial barriers, plays a significant role [33]. For instance, minority populations may live in areas with fewer healthcare facilities, leading to delayed or inadequate treatment. Financial condition can also limit access to quality health care, as individuals may not be able to afford necessary treatments or insurance coverage [34]. The socioeconomic factors include disparities in income, education, and employment, which influence underlying health conditions and health outcomes [35]. Lower income levels can lead to poorer nutrition, increased stress, and limited access to healthcare services, all of which contribute to worse health conditions and health outcomes. Educational disparities may result in a lack of health interpretation, making it harder for individuals to understand health information and navigate the healthcare system effectively. Employment-related factors, such as jobs with higher physical demands and lower job security, can also negatively impact health outcomes. The underrepresentation of minority populations in the training datasets often arises from the systemic exclusion of these groups in clinical trials and research studies. This exclusion can be due to a lack of outreach, mistrust of the healthcare system, or challenges in recruiting diverse participants [36]. As a result, AI models trained on these datasets may not accurately reflect the health needs or risks of minority populations, leading to biased predictions and perpetuating health disparities.
Our results also revealed that precision, recall, and F1 score were particularly sensitive to the population-specific differences in the performance of the models. For instance, precision improved significantly for Hispanic/Latino and Asian or Pacific Islander populations in the DT model, indicating better identification of high-risk individuals. However, recall generally declined across most groups, suggesting a trade-off in which false positives were reduced at the expense of false negatives. These changes underscore the necessity of evaluating multiple metrics to fully understand the implications of AI predictions on different populations.
The application of transfer learning techniques in our study provided a promising approach to mitigate these biases. By adapting the baseline models to better predict outcomes for underrepresented groups, we observed substantial improvements in performance for minority populations, particularly in precision and accuracy, suggesting that models like Decision Tree and Random Forest were more adept at adjusting to the demographic-specific needs of these groups. These results demonstrate the effectiveness of transfer learning in addressing biases and improving model performance for underrepresented groups. Fine-tuning through transfer learning demonstrated marginal but consistent improvements in TPR and reductions in FPR for minority groups, particularly for models like Random Forest and Decision Tree. These improvements reinforce the value of transfer learning in adapting models to better serve underrepresented populations while maintaining fairness. However, these benefits were not universally observed across all populations. For instance, the Non-Hispanic American Indian group showed limited or no improvements across key metrics, indicating that while transfer learning can significantly improve performance for certain groups, its impact may vary depending on specific demographic characteristics and the quality and quantity of available data. This emphasizes the necessity of continuously refining and validating AI models with diverse and representative datasets to ensure equitable healthcare outcomes. Ensuring that these models are trained on data that accurately reflects the diversity of the population is crucial to achieving equitable healthcare outcomes. The limitations observed in some groups suggest that transfer learning alone may not be sufficient to fully address all biases. However, it still should be part of an effort to create more inclusive and fair AI systems.
The improved performance of our AI models for minority populations through transfer learning holds significant implications for healthcare delivery. Enhanced predictive performance can lead to better resource allocation, more tailored healthcare interventions, and ultimately, improved health outcomes for these groups [37]. For instance, accurate predictions can help in identifying high-risk individuals who may benefit from early interventions, thus preventing severe outcomes and reduce mortality rates. This approach not only improves individual health outcomes but also alleviates the burden on healthcare systems, particularly during public health crises like the COVID-19 pandemic. During crises, where timely and accurate predictions are essential, equitable AI models can ensure that vulnerable populations receive the attention and resources they need.
Moreover, the integration of transfer learning and other fairness-enhancing techniques should be prioritized in the development of AI models, particularly in the context of public health emergencies where the impact on marginalized populations can be profound [38]. Transfer learning enables models to leverage pre-existing knowledge and adapt to new, often limited datasets, which is particularly valuable when dealing with diverse and varied health data. This approach helps improve model performance across different demographic groups by adjusting to the unique characteristics and health needs of each group [39]. In addition to transfer learning, implementing fairness-enhancing techniques, such as bias mitigation algorithms and inclusive dataset practices, can further address and reduce the systemic biases that may persist in AI systems [38]. By addressing and reducing all these biases, AI can play a crucial role in reducing health disparities, ensuring equitable care for all populations and improving health outcomes during emergencies.
The potential for integrating this model into clinical workflows is particularly relevant in the context of public health emergencies, where rapid and accurate mortality predictions can significantly impact resource allocation and patient triage. By providing early identification of high-risk individuals, this model could support healthcare professionals in prioritizing care and allocating critical resources, such as ICU beds and ventilators, more effectively. Additionally, implementing this model in clinical decision support systems could enable ongoing, data-driven adjustments as demographic-specific risk patterns emerge, allowing for adaptive responses to evolving health crises. With appropriate safeguards and continuous validation, the model could be deployed as an aid in public health planning, enhancing preparedness and response strategies in future pandemics and other high-impact health events.
Limitations and future directions
Despite the promising results, our study has limitations that warrant consideration. The reliance on historical data may inherently reflect past biases, and while transfer learning helps mitigate these, it may not entirely eliminate them. Additionally, the model’s performance is dependent on the quality and representativeness of the training data, which underscores the need for comprehensive and inclusive data collection practices. While machine learning models can be fine-tuned to improve performance across various racial groups, the degree of improvement varies significantly by both model type and racial/ethnic group. The study is also limited by its exclusive use of CDC COVID-19 data, and we recognize that cross-dataset validation would be valuable for assessing the model’s generalizability to other contexts.
We also recognize the limitations of transfer learning in fully addressing demographic biases, given the complex social determinants that impact health outcomes. While our approach improves fairness in predictive accuracy, further research is needed to explore additional methods that can complement transfer learning, such as incorporating contextual social and environmental data. Addressing these ethical concerns is essential to ensuring that AI-driven healthcare interventions contribute to equitable outcomes across all population groups.
Future research should focus on developing more sophisticated bias mitigation techniques and exploring other AI methodologies that can further enhance fairness and accuracy. Moreover, continuous monitoring and updating of AI models are essential to adapt to evolving health dynamics and demographic changes.
Conclusion
In conclusion, our study demonstrates the potential of AI in predicting COVID-19 mortality while also underscoring the critical need to address demographic biases to ensure equitable healthcare outcomes. Through the application of transfer learning, we significantly improved the predictive performance of models across various racial and ethnic groups. Future research should explore additional bias mitigation strategies and consider the dynamic nature of health data to further enhance model performance.
Data availability
All data analyzed during this study are available from the Centers for Disease Control and Prevention (CDC) repository. We obtained access to the ‘COVID-19 Case Surveillance Restricted Access Detailed Data’ from the CDC, with data spanning from January 1, 2020, to January 4, 2024. These datasets can be accessed through the CDC, subject to applicable restrictions and access policies.
Abbreviations
- AI:
-
Artificial Intelligence
- AUCPR:
-
Area Under the Precision-Recall Curve
- CDC:
-
Centers for Disease Control and Prevention
- RF:
-
Random Forest
- DT:
-
Decision Tree
- GBM:
-
Gradient Boosting Machines
- LR:
-
Logistic Regression
- ROC-AUC:
-
Receiver Operating Characteristic - Area Under the Curve
- NIMHD:
-
National Institute on Minority Health and Health Disparities
References
Nguyen HS, Ho DKN, Nguyen NN, Tran HM, Tam KW, Le NQK. Predicting EGFR Mutation Status in Non-small Cell Lung Cancer using Artificial Intelligence: a systematic review and Meta-analysis. Acad Radiol. 2024;31(2):660–83.
Kha QH, Le VH, Hung TNK, Nguyen NTK, Le NQK. Development and validation of an explainable machine learning-based prediction model for drug-food interactions from Chemical structures. Sens (Basel). 2023;23(8).
Chalkidis G, McPherson JP, Beck A, Newman MG, Guo JW, Sloss EA, Staes CJ. External validation of a machine learning model to Predict 6-Month Mortality for patients with Advanced Solid tumors. JAMA Netw Open. 2023;6(8):e2327193.
Zhu J, Yang F, Wang Y, Wang Z, Xiao Y, Wang L, Sun L. Accuracy of machine learning in discriminating Kawasaki Disease and other Febrile illnesses: systematic review and Meta-analysis. J Med Internet Res. 2024;26:e57641.
Zhu L, Yao Y. Prediction of the risk of mortality in older patients with coronavirus disease 2019 using blood markers and machine learning. Front Immunol. 2024;15:1445618.
Wang L, Zhang Y, Wang D, Tong X, Liu T, Zhang S et al. Artificial Intelligence for COVID-19: a systematic review. Front Med. 2021;8.
Aksenen CF, Ferreira DMA, Jeronimo PMC, Costa TO, de Souza TC, Lino BMNS, et al. Enhancing SARS-CoV-2 lineage surveillance through the integration of a simple and direct qPCR-based protocol adaptation with established machine learning algorithms. Anal Chem. 2024;96(46):18537–44.
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53.
Vyas DA, Eisenstein LG, Jones DS. Hidden in Plain Sight - reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874–82.
Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018;178(11):1544–7.
Millett GA, Jones AT, Benkeser D, Baral S, Mercer L, Beyrer C, et al. Assessing differential impacts of COVID-19 on black communities. Ann Epidemiol. 2020;47:37–44.
Mackey K, Ayers CK, Kondo KK, Saha S, Advani SM, Young S, et al. Racial and ethnic disparities in COVID-19-Related infections, hospitalizations, and deaths: a systematic review. Ann Intern Med. 2021;174(3):362–73.
Mittermaier M, Raza MM, Kvedar JC. Bias in AI-based models for medical applications: challenges and mitigation strategies. NPJ Digit Med. 2023;6(1):113.
Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in Healthcare. Annu Rev Biomed Data Sci. 2021;4:123–44.
Pagano TP, Loureiro RB, Lisboa FVN, Peixoto RM, Guimarães GAS, Cruz GOR et al. Bias and Unfairness in Machine Learning models: a systematic review on datasets, Tools, Fairness Metrics, and identification and mitigation methods. Big Data Cogn Comput [Internet]. 2023; 7(1).
Malhotra A, Thulal AN, editors. A Comparative Analysis of Bias Mitigation Methods in Machine Learning. 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO); 2024 14–15 March 2024.
Siddique S, Haque MA, George R, Gupta KD, Gupta D, Faruk MJ. Survey on Machine Learning biases and Mitigation techniques. Digit [Internet]. 2024;4(1):1–68. pp.].
Zhang Z, Wang S, Meng G, editors. A review on pre-processing methods for fairness in machine learning. In: Advances in natural computation, fuzzy systems and knowledge discovery. Cham: Springer International Publishing; 2023.
Shahrezaei MH, Loughran R, Daid KM, editors. Pre-processing techniques to mitigate against algorithmic bias. In: 2023 31st Irish conference on artificial intelligence and science C. AICS; 2023. pp. 7–8.
Hosna A, Merry E, Gyalmo J, Alom Z, Aung Z, Azim MA. Transfer learning: a friendly introduction. J Big Data. 2022;9(1):102.
Zou Q, Xie S, Lin Z, Wu M, Ju Y. Finding the best classification threshold in Imbalanced classification. Big Data Res. 2016;5:2–8.
Toseef M, Olayemi Petinrin O, Wang F, Rahaman S, Liu Z, Li X, Wong KC. Deep transfer learning for clinical decision-making based on high-throughput data: comprehensive survey with benchmark results. Brief Bioinform. 2023;24(4).
Dalkıran A, Atakan A, Rifaioğlu AS, Martin MJ, Atalay R, Acar AC, et al. Transfer learning for drug-target interaction prediction. Bioinformatics. 2023;39(39 Suppl 1):i103–10.
Clark M, Meyer C, Ramos-Cejudo J, Elbers DC, Pierce-Murray K, Fricks R, et al. Transfer learning for Mortality Prediction in Non-small Cell Lung Cancer with Low-Resolution Histopathology Slide snapshots. Stud Health Technol Inf. 2024;310:735–9.
Mendel K, Li H, Sheth D, Giger M. Transfer learning from convolutional neural networks for computer-aided diagnosis: a comparison of digital breast tomosynthesis and full-field Digital Mammography. Acad Radiol. 2019;26(6):735–43.
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer learning for classification of Alzheimer’s Disease based on genome wide data. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(5):2700–11.
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18.
Centers for Disease Control and Prevention C-RC-CSRDA. Summary, and limitations (dataset access date: June 21, 2024).
Yuan C, Linn KA, Hubbard RA. Algorithmic Fairness of Machine Learning models for Alzheimer Disease Progression. JAMA Netw Open. 2023;6(11):e2342203.
Ganta T, Kia A, Parchure P, Wang MH, Besculides M, Mazumdar M, Smith CB. Fairness in Predicting Cancer Mortality Across racial subgroups. JAMA Netw Open. 2024;7(7):e2421290.
Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat Med. 2020;26(1):16–7.
Peek ME, Simons RA, Parker WF, Ansell DA, Rogers SO, Edmonds BT. COVID-19 among African americans: an Action Plan for Mitigating disparities. Am J Public Health. 2021;111(2):286–92.
Caraballo C, Massey D, Mahajan S, Lu Y, Annapureddy AR, Roy B et al. Racial and Ethnic Disparities in Access to Health Care Among Adults in the United States: A 20-Year National Health Interview Survey Analysis, 1999–2018. medRxiv. 2020.
Williams DR, Cooper LA. Reducing racial inequities in Health: using what we already know to take action. Int J Environ Res Public Health. 2019;16(4).
George S, Duran N, Norris K. A systematic review of barriers and facilitators to minority research participation among African americans, latinos, Asian americans, and Pacific islanders. Am J Public Health. 2014;104(2):e16–31.
Obermeyer Z, Emanuel EJ. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375(13):1216–9.
Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–9.
Wang F, Casalino LP, Khullar D. Deep learning in Medicine-Promise, Progress, and challenges. JAMA Intern Med. 2019;179(3):293–4.
Acknowledgements
The authors would like to acknowledge the contributions of all individuals and institutions involved in the data collection and analysis process.
Funding
This research was supported by the National Institute on Minority Health and Health Disparities (NIMHD) of the National Institutes of Health (NIH) under Award Number R01MD018766.
Author information
Authors and Affiliations
Contributions
T.G. and M.L. conceptualized and designed the study. T.G., J.Y., and G.J. collected the data. T.G. and M.L. performed the data analysis and interpretation. W.P., X.M., and Y.W. validated the results and contributed to the methodology. T.G. drafted the original manuscript, while M.L. and Y.W. provided critical revisions. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study utilized data from the “COVID-19 Case Surveillance Restricted Access Detailed Data” provided by the Centers for Disease Control and Prevention (CDC). The research was determined to qualify as Not Human Subjects Research (NHSR) and was exempt from ethical review by the UTHSC Institutional Review Board. Informed consent was not required, and a waiver was granted by the UTHSC IRB. The study adhered to the ethical principles outlined in the Declaration of Helsinki.
Consent for publication
Not applicable.
Disclaimer
The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of any organization. The CDC does not take responsibility for the scientific validity or accuracy of methodology, results, statistical analyses, or conclusions presented.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gu, T., Pan, W., Yu, J. et al. Mitigating bias in AI mortality predictions for minority populations: a transfer learning approach. BMC Med Inform Decis Mak 25, 30 (2025). https://doi.org/10.1186/s12911-025-02862-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911-025-02862-7