Abstract
In many clinical and research settings, the scarcity of high-quality medical imaging datasets has hampered the potential of artificial intelligence (AI) clinical applications. This issue is particularly pronounced in less common conditions, underrepresented populations and emerging imaging modalities, where the availability of diverse and comprehensive datasets is often inadequate. To address this challenge, we introduce a unified medical image–text generative model called MINIM that is capable of synthesizing medical images of various organs across various imaging modalities based on textual instructions. Clinician evaluations and rigorous objective measurements validate the high quality of MINIM’s synthetic images. MINIM exhibits an enhanced generative capability when presented with previously unseen data domains, demonstrating its potential as a generalist medical AI (GMAI). Our findings show that MINIM’s synthetic images effectively augment existing datasets, boosting performance across multiple medical applications such as diagnostics, report generation and self-supervised learning. On average, MINIM enhances performance by 12% for ophthalmic, 15% for chest, 13% for brain and 17% for breast-related tasks. Furthermore, we demonstrate MINIM’s potential clinical utility in the accurate prediction of HER2-positive breast cancer from MRI images. Using a large retrospective simulation analysis, we demonstrate MINIM’s clinical potential by accurately identifying targeted therapy-sensitive EGFR mutations using lung cancer computed tomography images, which could potentially lead to improved 5-year survival rates. Although these results are promising, further validation and refinement in more diverse and prospective settings would greatly enhance the model’s generalizability and robustness.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Restrictions apply to the availability of the developmental and validation datasets, which were used with permission of the participants for the present study. De-identified data may be available for research purposes from the corresponding authors upon reasonable request.
Code availability
The code to reproduce the results can be accessed at https://github.com/WithStomach/MINIM.
References
Gao, Y., Baptista-Hon, D. T. & Zhang, K. The inevitable transformation of medicine and research by large language models: the possibilities and pitfalls. MedComm Futur. Med. 2, e49 (2023).
Wang, D.-Q., Feng, L.-Y., Ye, J.-G., Zou, J.-G. & Zheng, Y.-F. Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. MedComm Futur. Med. 2, e43 (2023).
Xia, K. & Wang, J. Recent advances of transformers in medical image analysis: a comprehensive review. MedComm Futur. Med. 2, e38 (2023).
Ye, Y., Sarkar, S., Bhaskar, A., Tomlinson, B. & Monteiro, O. Using ChatGPT in a clinical setting: a case report. MedComm Futur. Med. 2, e51 (2023).
Gao, C. et al. Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat. Mach. Intell. 5, 294–308 (2023).
Schäfer, R. et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat. Comput. Sci. 4, 495–509 (2024).
Bluethgen, C. et al. A vision–language foundation model for the generation of realistic chest X-ray images. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01246-y (2024).
Tudosiu, P. D. et al. Realistic morphology-preserving generative modelling of the brain. Nat. Mach. Intell. 6, 811–819 (2024).
Carrillo-Perez, F. et al. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01193-8 (2024).
Ktena, I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nat. Med. 30, 1166–1173 (2024).
Sagers, L. W. et al. Augmenting medical image classifiers with synthetic data from latent diffusion models. Preprint at https://arxiv.org/abs/2308.12453 (2023).
Yang, X., Lin, Y., Wang, Z., Li, X. & Cheng, K. T. Bi-modality medical image synthesis using semi-supervised sequential generative adversarial networks. IEEE J. Biomed. Health Inform. 24, 855–865 (2020).
Jin, C.-B. et al. DC2Anet: generating lumbar spine MR images from CT scan data based on semi-supervised learning. Appl. Sci. 9, 2521 (2019).
Thambawita, V. et al. SinGAN-Seg: synthetic training data generation for medical image segmentation. PLoS ONE 17, e0267976 (2022).
Abdusalomov, A. B., Nasimov, R., Nasimova, N., Muminov, B. & Whangbo, T. K. Evaluating synthetic medical images using artificial intelligence with the GAN algorithm. Sensors 23, 3440 (2023).
Sauer, A., Karras, T., Laine, S., Geiger, A. & Aila, T. StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis. In Proc. 40th International Conference On Machine Learning 30105–30118 (PMLR, 2023).
Kang, M. et al. Scaling up GANs for text-to-image synthesis. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10124–10134 (IEEE, 2023).
Saharia, C. et al. Photorealistic text-to-image diffusion models with deep language understanding. Preprint at https://arxiv.org/abs/2205.11487 (2022).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference On Machine Learning 8821–8831 (PMLR, 2021).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 6629–6640 (Curran Associates, 2017).
Salimans, T. et al. Improved techniques for training GANs. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D. et al.) 2234–2242 (Curran Associates, 2016).
Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. In Proc. Thirty-Seventh Asilomar Conference on Signals, Systems & Computers 1398–1402 (IEEE, 2003).
Ravuri, S. & Vinyals, O. Classification accuracy score for conditional generative models. Preprint at https://arxiv.org/abs/1905.10887 (2019).
Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. IEEE/CVF International Conference On Computer Vision 10012–10022 (IEEE, 2021).
Mokady, R., Hertz, A. & Bermano, A. H. ClipCap: CLIP prefix for image captioning. Preprint at https://arxiv.org/abs/2111.09734 (2021).
Koch, G., Zemel, R. & Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proc. 32nd International Conference on Machine Learning (JMLR, 2015).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference On Computer Vision And Pattern Recognition 4700–4708 (IEEE, 2017).
Bellovin, S., Dutta, P. & Reitinger, N. Privacy and synthetic datasets. Stan. Tech. L. Rev. 22, 1 (2019).
Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493–497 (2021).
Mok, T. S. et al. Osimertinib or platinum-pemetrexed in EGFR T790M-positive lung cancer. N. Engl. J. Med. 376, 629–640 (2017).
Planchard, D. et al. Osimertinib with or without chemotherapy in EGFR-mutated advanced NSCLC. N. Engl. J. Med. 389, 1935–1948 (2023).
Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer Statistics, 2024. CA Cancer J. Clin. 74, 12–49 (2024).
Giaquinto, A. N. et al. Breast Cancer Statistics, 2022. CA Cancer J. Clin. 72, 524–541 (2022).
Valenza, C. et al. Targeting HER2 heterogeneity in breast and gastrointestinal cancers. Trends Cancer 10, 113–123 (2024).
Antun, V., Renna, F., Poon, C., Adcock, B. & Hansen, A. C. On instabilities of deep learning in image reconstruction and the potential costs of AI. Proc. Natl Acad. Sci. USA 117, 30088–30095 (2020).
Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T. & Saalbach, A. Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9, 6381 (2019).
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J. & Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proc. 15th International Symposium on Biomedical Imaging 289–293 (IEEE, 2018).
Ghorbani, A., Natarajan, V., Coz, D. & Liu, Y. DermGAN: synthetic generation of clinical skin images with pathology. In Proc. Machine Learning for Health NeurIPS Workshop 155–170 (PMLR, 2020).
Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4396–4405 (IEEE, 2019).
Mitchell, T. et al. Never-ending learning. Commun. ACM 61, 103–115 (2018).
Wang, S.-Q. et al. Diabetic retinopathy diagnosis using multichannel generative adversarial network with semisupervision. IEEE Trans. Autom. Sci. Eng. 18, 574–1585 (2020).
Ahmad, B. et al. Improving skin cancer classification using heavy-tailed Student T-Distribution in Generative Adversarial Networks (TED-GAN). Diagnostics 11, 21–47 (2021).
Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. Preprint at https://arxiv.org/abs/2010.00747 (2020).
Lee, K. et al. Aligning text-to-image models using human feedback. Preprint at https://arxiv.org/abs/2302.12192 (2023).
Howard, J. & Ruder, S. Universal language model fine-tuning for text classification. Preprint at https://arxiv.org/abs/1801.06146 (2018).
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).
Hu, H. et al. Mutational landscape of secondary glioblastoma guides MET-targeted trial in brain tumor. Cell 175, 1665–1678 (2018).
Liang, W. et al. Impact of EGFR mutation status on tumor response and progression free survival after first-line chemotherapy in patients with advanced non-small-cell lung cancer: a meta-analysis. J. Thorac. Dis. 6, 1239 (2014).
Wang, S. et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur. Respir. J. 53, 1800986 (2019).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inform. Proc. Syst. 35, 27730–27744 (2022).
Acknowledgements
This research was supported by the Wenzhou Eye Hospital, Wenzhou Medical University Eye Health and Disease Advanced Institute, Macau Science and Technology Development Fund, Macao (0007/2020/AFJ, 0070/2020/A2 and 0003/2021/AKP, K.Z.); Guangzhou National Laboratory (YW-SLJC0201, K.Z.); Discipline Development of Peking University (7101302940 and 7101303005, J.W.); the Young Elite Scientist Sponsorship Program by the Beijing Association for Science and Technology (BYESS2023026, J.W.); the Macao Young Scholars Program (AM2023024 and AM2023018, Y.G.) and National Natural Science Foundation of China (W2431057, 32100631 and 32470964). We thank the physicians and patients for providing clinical data.
Author information
Authors and Affiliations
Contributions
J.W., K.W., Y.Y., Y.L., W.X., Z.S., F.L., Z.Z., Y.G., L.Y., H.-Y.Z., H.M., W.Z., L.H., L.Z., R.G., I.C., B.D., L.C., X.C., J.L., M.-H.Z., D.B.-H., O.M., M.L., Y.K., J.L., S.Z., T.G., J.Z., K.X., E.O., H.L., Y.Y., K.Z. and J.Q. collected and analyzed the data. K.Z. and J.W. conceived and designed the project. K.Z., J.W. and J.Q. designed experiments and supervised the project. J.W. and K.Z. wrote the manuscript. All authors discussed the results and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Su-In Lee, Jie Tian and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Joao Monteiro, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Text-conditioned synthesis of medical images.
Samples were created by prompting a fine-tuned model using annotations made by ophthalmologists and radiologists.
Extended Data Fig. 2 AI-based EGFR mutation prediction and its effect on progression-free survival (PFS) rates of advanced lung cancer patients.
a, PFS curves of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in the cohort A. b, PFS curves of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort A. c, PFS curves of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B. d, PFS curves of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B.
Extended Data Fig. 3 AI-based EGFR mutation prediction and its effect on objective response rate (ORR) of advanced lung cancer patients.
a, ORR of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in the cohort A. b, ORR of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort A. c, ORR of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B. d, ORR of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B.
Extended Data Fig. 4 Synthetic images assist in ophthalmological diagnostic.
For diagnostic labels with lower classification performance, MINIM can selectively generate synthetic OCT images and integrate them into the training data, thereby further refining the diagnostic model’s accuracy.
Extended Data Fig. 5 A typical cases of common diseases and report generation.
a, Examples of atypical cases of common diseases including retinal vein occlusions (RVOs), diabetic retinopathy (DR), and choroidal neovascularization (CNV). b, When compared to reports generated by selected report generation models and ophthalmologists, the models can identify more comprehensive and accurate information. e, Incorporating OCT image data synthesized by MINIM into the self-supervised learning of OCT image models can enhance the representational capabilities of pre-trained models.
Extended Data Fig. 6 Breast MRI dataset and analysis flowchart.
Breast MR images were split into Tumor and Benign, each of which has three modalities, that is, T1, T1c and T2. These images were sent to MINIM for generative model training and downstream HER2 mutation detection.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Wang, K., Yu, Y. et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med 31, 609–617 (2025). https://doi.org/10.1038/s41591-024-03359-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41591-024-03359-y
This article is cited by
-
Artificial intelligence in cancer: applications, challenges, and future perspectives
Molecular Cancer (2025)
-
Foundation models for radiology—the position of the AI for Health Imaging (AI4HI) network
Insights into Imaging (2025)
-
MedIENet: medical image enhancement network based on conditional latent diffusion model
BMC Medical Imaging (2025)
-
Emergent photonics for cardiovascular health
Nature Photonics (2025)
-
Uncovering ethical biases in publicly available fetal ultrasound datasets
npj Digital Medicine (2025)