Abstract
Foundation models are pretrained on massive datasets. However, collecting medical datasets is expensive and time-consuming, and raises privacy concerns. Here we show that synthetic data generated via conditioning with disease labels can be leveraged for building high-performing medical foundation models. We pretrained a retinal foundation model, first with approximately one million synthetic retinal images with physiological structures and feature distribution consistent with real counterparts, and then with only 16.7% of the 904,170 real-world colour fundus photography images required in a recently reported retinal foundation model (RETFound). The data-efficient model performed as well or better than RETFound across nine public datasets and four diagnostic tasks; and for diabetic-retinopathy grading, it used only 40% of the expert-annotated training data used by RETFound. We also support the generalizability of the data-efficient strategy by building a classifier for the detection of tuberculosis on chest X-ray images. The text-conditioned generation of synthetic data may enhance the performance and generalization of medical foundation models.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The main data supporting the results in this study are available within the paper and its Supplementary Information. Data for pretraining can be accessed through the following weblinks: AIROGS (https://airogs.grand-challenge.org/data-and-challenge), Kaggle EyePACS (https://www.kaggle.com/c/diabetic-retinopathy-detection), DDR (https://github.com/nkicsl/DDR-dataset), ODIR-2019 (https://odir2019.grand-challenge.org). Data for fine-tuning can be accessed through the following weblinks: IDRID (https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid), MESSIDOR-2 (https://www.adcis.net/en/third-party/messidor2), APTOS-2019 (https://www.kaggle.com/competitions/aptos2019-blindness-detection/data), PAPILA (https://figshare.com/articles/dataset/PAPILA/14798004/1), Glaucoma Fundus (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/1YRRAC), ORIGA (https://www.kaggle.com/datasets/arnavjain1/glaucoma-datasets), AREDS (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000001.v3.p1), JSIEC (https://zenodo.org/record/3477553), Retina (https://www.kaggle.com/datasets/jr2ngb/cataractdataset), REFUGE (https://ieee-dataport.org/documents/refuge-retinal-fundus-glaucoma-challenge), RIM-ONE-DL (https://github.com/miag-ull/rim-one-dl?tab=readme-ov-file), CheXpert (https://stanfordmlgroup.github.io/competitions/chexpert/), Shenzhen Hospital CXR Set (https://data.lhncbc.nlm.nih.gov/public/Tuberculosis-Chest-X-ray-Datasets/Shenzhen-Hospital-CXR-Set/index.html), TB Chest X-ray database (https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset).
Code availability
The code of RETFound-DE is available at https://github.com/Jonlysun/DERETFound (ref. 64), and an online interactive platform is available at http://fdudml.cn:12001. We used stable diffusion implemented by diffusers (https://github.com/huggingface/diffusers) for the backbone and image generation. The heat maps were generated with GradCam (https://github.com/jacobgil/pytorch-grad-cam) and the t-SNE visualization was generated with tsne-pytorch (https://github.com/mxl1990/tsne-pytorch).
References
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Huang, Z. et al. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Zhang, X. et al. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat. Commun. 14, 4542 (2023).
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
Mitchell, M., Jain, R. & Langer, R. Engineering and physical sciences in oncology: challenges and opportunities. Nat. Rev. Cancer 17, 659–675 (2017).
Villoslada, P., Baeza-Yates, R. & Masdeu, J. C. Reclassifying neurodegenerative diseases. Nat. Biomed. Eng. 4, 759–760 (2020).
Rajpurkar, P. et al. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
Ribaric, S., Ariyaeeinia, A. & Pavesic, N. De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016).
Chang, Q. et al. Mining multi-center heterogeneous medical data with distributed synthetic learning. Nat. Commun. 14, 5510 (2023).
Bond-Taylor, S., Leach, A., Long, Y. & Willcocks, C. G. Deep generative modelling: a comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7327–7347 (2021).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (IEEE, 2022).
Kazerouni, A. et al. Diffusion models in medical imaging: a comprehensive survey. Med. Image Anal. 88, 102846 (2023).
Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).
Shin, J. E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Schmitt, L. T. et al. Prediction of designer-recombinases for DNA editing with generative deep learning. Nat. Commun. 13, 7966 (2022).
Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).
Huang, X. et al. The landscape of mRNA nanomedicine. Nat. Med. 28, 2273–2287 (2022).
Chen, Z. et al. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 3, 1040–1049 (2021).
Zhong, W., Yang, Z. & Chen, C. Y. C. Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat. Commun. 14, 3009 (2023).
Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021).
Kanakasabapathy, M. K. et al. Adaptive adversarial neural networks for the analysis of lossy and domain-shifted datasets of medical images. Nat. Biomed. Eng. 5, 571–585 (2021).
Ozyoruk, K. B. et al. A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded. Nat. Biomed. Eng. 6, 1407–1419 (2022).
DeGrave, A. J. et al. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01160-9 (2023).
Cao, R. et al. Label-free intraoperative histology of bone tissue via deep-learning-assisted ultraviolet photoacoustic microscopy. Nat. Biomed. Eng. 7, 124–134 (2023).
Nichol, A. et al. Glide: towards photorealistic image generation and editing with text-guided diffusion models. Preprint at https://arxiv.org/abs/2112.10741 (2021).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. International Conference on Machine Learning 8821–8831 (PMLR, 2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. International Conference on Machine Learning 8748–8763 (PMLR, 2021).
Kather, J. N. et al. Medical domain knowledge in domain-agnostic generative AI. npj Digit. Med. 5, 90 (2022).
Burlina, P. M. et al. Assessment of deep generative models for high-resolution synthetic retinal image generation of age-related macular degeneration. JAMA Ophthalmol. 137, 258–264 (2019).
Yoon, J. et al. EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records. npj Digit. Med. 6, 141 (2023).
Trabucco, B., Doherty, K., Gurinas, M. & Salakhutdinov, R. Effective data augmentation with diffusion models. In Proc. International Conference on Learning Representations (ICLR, 2024).
Zhang, A. et al. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng. 6, 1330–1345 (2022).
Chen, R. J. et al. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493–497 (2021).
DuMont Schütte, A. et al. Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation. npj Digit. Med. 4, 141 (2021).
World Report on Vision (World Health Organization, 2019).
Cen, L. P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat. Commun. 12, 4828 (2021).
Alimanov, A. & Islam, M. B. Denoising diffusion probabilistic model for retinal image generation and segmentation. In Proc. IEEE International Conference on Computational Photography 1–12 (IEEE, 2023).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR, 2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2022).
Karthik, M. & Sohier, D. APTOS 2019 Blindness Detection (Kaggle, 2019).
Porwal, P. et al. Idrid: diabetic retinopathy–segmentation and grading challenge. Med. Image Anal. 59, 101561 (2020).
Decencière, E. et al. Feedback on a publicly distributed image database: the Messidor database. Image Anal. Stereol. 33, 231–234 (2014).
Kovalyk, O. et al. PAPILA: dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment. Sci. Data 9, 291 (2022).
Zhang, Z. et al. Origa-light: an online retinal fundus image database for glaucoma analysis and research. In Proc. Annual International Conference of the IEEE Engineering in Medicine and Biology 3065–3068 (IEEE, 2010).
Irvin, J. et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conference on Artificial Intelligence Vol. 33 590–597 (AAAI, 2019).
Jaeger, S. et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4, 475–477 (2014).
Rahman, T. et al. Reliable tuberculosis detection using chest X-ray with deep learning, segmentation and visualization. IEEE Access 8, 191586–191601 (2020).
Peng, W., Adeli, E., Zhao, Q. & Pohl, K. M. in Medical Image Computing and Computer Assisted Intervention 14–24 (MICCAI, 2023).
Eschweiler, D. et al. Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets. PLoS Comput. Biol. 20, e1011890 (2024).
Ktena, I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nat. Med. 30, 1166–1173 (2024).
Bachmann, R., Mizrahi, D., Atanov, A. & Zamir, A. Multimae: multi-modal multi-task masked autoencoders. In Proc. European Conference on Computer Vision 348–367 (Springer, 2022).
Shumailov, I. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).
Yang, Y. et al. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).
de Vente, C. et al. AIROGS: artificial intelligence for robust glaucoma screening challenge. IEEE Trans. Med. Imaging 43, 542–557 (2024).
van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Proc. 31st Conference on Neural Information Processing Systems 6309–6318 (NIPS, 2017).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. In Proc. International Conference on Learning Representations (ICLR, 2021).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Yuqi. Controllable generative model enables high data efficiency for building medical foundation model. GitHub https://github.com/Jonlysun/DERETFound (2024).
Acknowledgements
We acknowledge support for this work provided by the National Natural Science Foundation of China (grant numbers U2001209, 62372117 and 62472102) and the Natural Science Foundation of Shanghai (grant number 21ZR1406600). The computations in this research were performed using the CFFF platform of Fudan University. The AREDS dataset used for the analyses described in this paper was obtained from the Age-Related Eye Disease Study (AREDS) Database found at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000001.v3.p1 through dbGaP accession number phs000001.v3.p1. Funding support for AREDS was provided by the National Eye Institute (N01-EY-0-2127). We thank the AREDS participants and the AREDS Research Group for their valuable contribution to this research.
Author information
Authors and Affiliations
Contributions
B.Y. and W.T. supervised the research. B.Y. conceived the technique. Y.S. implemented the algorithm. Y.S. and W.T. designed the validation experiments. Y.S. trained the network and performed the validation experiments. Y.S. and W.T. analysed the validation results. Z.G. verified the code. R.H. provided technical support on the implementation of the web page. Z.G., M.P. and S.C. collected the public datasets. Y.S., W.T. and B.Y. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Pearse Keane and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance (AUPR) in downstream tasks.
a, Internal evaluation. We fine-tuned the pretrained models on nine public datasets across four downstream tasks: diabetic retinopathy grading, glaucoma diagnosis, age-related macular degeneration (AMD) grading and multi diseases classification. Compared to RETFound, RETFound-DE achieves superior performance on six datasets (P < 0.05) and comparable performance on the other three datasets (P > 0.05). b, External evaluation. Models are fine-tuned on one diabetic retinopathy grading dataset and evaluated on the others. RETFound-DE outperforms RETFound when fine-tuned on APTOS-2019 and evaluated on IDRID, or when fine-tuned on IDRID and evaluated on MESSIDOR-2. We present the mean value of AUPR on each bar and the error bars show 95% confidence intervals. P-value was calculated with the two-sided t-test and listed in the figure.
Extended Data Fig. 2 Confusion matrices comparison in downstream tasks.
The confusion matrices shows the comparison between the predicted classes and the actual labels by the models, with each element of the matrix representing the prediction distribution for a specific class. RETFound-DE’s diagonal prediction scores are generally higher than SSL-ImageNet, and even higher than RETFound, indicating that it classifies each class more accurately in every task and has a lower misclassification rate.
Extended Data Fig. 3 Label and training time efficiency (AUPR).
a, Label efficiency. Label efficiency refers to the amount of training data and labels to achieve the target performance for a deep learning networks. RETFound-DE and RETFound show superior label efficiency than SSL-ImageNet. RETFound-DE even outperforms RETFound on Retina dataset. b, Training time efficiency. Training time efficiency refers to the fine-tuning time to achieve the target performance for foundation models when adapted to downstream datasets. RETFound-DE enables same performance as RETFound in less training time on several datasets, such as IDRID and Retina. The dashed grey lines highlight the significant difference between RETFound and RETFound-DE.
Extended Data Fig. 4 Performance and pretraining time comparisons of SSL-ImageNet-Retinal and RETFound-DE (AUPR).
a, The effect of synthetic data on using different number of real data for pretraining. The efficacy of SSL-ImageNet-Retinal progressively enhances on four downstream tasks as the quantity of real retinal images used for pretraining increases. By pretraining on generated images, RETFound-DE shows a significant performance improvement than SSL-ImageNet-Retinal. On IDRID, MESSIDOR-2 and Retina datasets, RETFound-DE outperforms RETFound when pretrained on only 40k real retinal images. b, The performance of RETFound-DE and SSL-ImageNet-Retinal (150k) over a matched computational time from 5 to 6 8-A100 days. We use 8-A100 day as a unit to denote the pretraining time of using 8 NVIDIA A100 GPUs for one day. For both models, the pretraining dataset at this time period is 150k real retinal image dataset. RETFound-DE consistently outperforms SSL-ImageNet-Retinal (150k) on all four downstream datasets within the same pretraining time.
Extended Data Fig. 5 Internal and external evaluation on Chest X-ray images.
a, Internal evaluation. CXRFM and CXRFM-DE perform comparably on the ShenzhenCXR dataset and significantly outperformed SSL-ImageNet. On the TBChest dataset, all three models achieved similar performance, with an AUROC exceeding 0.99. b, External evaluation. When fine-tuned on TBChest and evaluated on ShenzhenCXR, CXRFM and CXRFM-DE perform similarly, both substantially surpassing SSL-ImageNet. When fine-tuned on ShenzhenCXR and evaluated on TBChest, CXRFM outperforms CXRFM-DE. c, CXRFM-DE (denoted as ‘With synthetic data’) significantly outperforms SSL-ImageNet-Chest (20k) (denoted as ‘Without synthetic data’) under both conditions, demonstrating the enhancement in generalization brought about by synthetic data.
Extended Data Fig. 6 Feature distribution of real and synthetic datasets in terms of age and gender.
We use histograms and cumulative distribution functions to illustrate the feature distributions of real data and synthetic data. (a) and (b) show that the features distribution of real and synthetic data for the 0<Age<60 and Age>60 groups. (c) and (d) show that the features distribution of real and synthetic data for female and male. The results show the consistency of feature distribution between real and synthetic datasets in terms of age and gender.
Supplementary information
Supplementary Information
Supplementary notes, figures and captions for the supplementary tables.
Supplementary Tables
Supplementary Tables 1–9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Y., Tan, W., Gu, Z. et al. A data-efficient strategy for building high-performing medical foundation models. Nat. Biomed. Eng 9, 539–551 (2025). https://doi.org/10.1038/s41551-025-01365-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41551-025-01365-0
This article is cited by
-
Synthetic data boosts medical foundation models
Nature Biomedical Engineering (2025)
-
Generative artificial intelligence for ophthalmic images: developments, applications and challenges
The Visual Computer (2025)