+
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Abstract

In many clinical and research settings, the scarcity of high-quality medical imaging datasets has hampered the potential of artificial intelligence (AI) clinical applications. This issue is particularly pronounced in less common conditions, underrepresented populations and emerging imaging modalities, where the availability of diverse and comprehensive datasets is often inadequate. To address this challenge, we introduce a unified medical image–text generative model called MINIM that is capable of synthesizing medical images of various organs across various imaging modalities based on textual instructions. Clinician evaluations and rigorous objective measurements validate the high quality of MINIM’s synthetic images. MINIM exhibits an enhanced generative capability when presented with previously unseen data domains, demonstrating its potential as a generalist medical AI (GMAI). Our findings show that MINIM’s synthetic images effectively augment existing datasets, boosting performance across multiple medical applications such as diagnostics, report generation and self-supervised learning. On average, MINIM enhances performance by 12% for ophthalmic, 15% for chest, 13% for brain and 17% for breast-related tasks. Furthermore, we demonstrate MINIM’s potential clinical utility in the accurate prediction of HER2-positive breast cancer from MRI images. Using a large retrospective simulation analysis, we demonstrate MINIM’s clinical potential by accurately identifying targeted therapy-sensitive EGFR mutations using lung cancer computed tomography images, which could potentially lead to improved 5-year survival rates. Although these results are promising, further validation and refinement in more diverse and prospective settings would greatly enhance the model’s generalizability and robustness.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic illustration of our generative system for medical image synthesis.
Fig. 2: Evaluations of generated synthetic images.
Fig. 3: Self-improvement evaluations.
Fig. 4: Synthetic images serve as data augmentation.
Fig. 5: AI-based mutation prediction performance and its effect on 5-year survival rates of patients with advanced lung cancer.
Fig. 6: A two-stage RL strategy using human ratings.

Similar content being viewed by others

Data availability

Restrictions apply to the availability of the developmental and validation datasets, which were used with permission of the participants for the present study. De-identified data may be available for research purposes from the corresponding authors upon reasonable request.

Code availability

The code to reproduce the results can be accessed at https://github.com/WithStomach/MINIM.

References

  1. Gao, Y., Baptista-Hon, D. T. & Zhang, K. The inevitable transformation of medicine and research by large language models: the possibilities and pitfalls. MedComm Futur. Med. 2, e49 (2023).

    Article  Google Scholar 

  2. Wang, D.-Q., Feng, L.-Y., Ye, J.-G., Zou, J.-G. & Zheng, Y.-F. Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. MedComm Futur. Med. 2, e43 (2023).

  3. Xia, K. & Wang, J. Recent advances of transformers in medical image analysis: a comprehensive review. MedComm Futur. Med. 2, e38 (2023).

    Article  CAS  Google Scholar 

  4. Ye, Y., Sarkar, S., Bhaskar, A., Tomlinson, B. & Monteiro, O. Using ChatGPT in a clinical setting: a case report. MedComm Futur. Med. 2, e51 (2023).

    Article  Google Scholar 

  5. Gao, C. et al. Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat. Mach. Intell. 5, 294–308 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Schäfer, R. et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat. Comput. Sci. 4, 495–509 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Bluethgen, C. et al. A vision–language foundation model for the generation of realistic chest X-ray images. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01246-y (2024).

  8. Tudosiu, P. D. et al. Realistic morphology-preserving generative modelling of the brain. Nat. Mach. Intell. 6, 811–819 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Carrillo-Perez, F. et al. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01193-8 (2024).

  10. Ktena, I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nat. Med. 30, 1166–1173 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sagers, L. W. et al. Augmenting medical image classifiers with synthetic data from latent diffusion models. Preprint at https://arxiv.org/abs/2308.12453 (2023).

  12. Yang, X., Lin, Y., Wang, Z., Li, X. & Cheng, K. T. Bi-modality medical image synthesis using semi-supervised sequential generative adversarial networks. IEEE J. Biomed. Health Inform. 24, 855–865 (2020).

    Article  PubMed  Google Scholar 

  13. Jin, C.-B. et al. DC2Anet: generating lumbar spine MR images from CT scan data based on semi-supervised learning. Appl. Sci. 9, 2521 (2019).

    Article  Google Scholar 

  14. Thambawita, V. et al. SinGAN-Seg: synthetic training data generation for medical image segmentation. PLoS ONE 17, e0267976 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Abdusalomov, A. B., Nasimov, R., Nasimova, N., Muminov, B. & Whangbo, T. K. Evaluating synthetic medical images using artificial intelligence with the GAN algorithm. Sensors 23, 3440 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Sauer, A., Karras, T., Laine, S., Geiger, A. & Aila, T. StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis. In Proc. 40th International Conference On Machine Learning 30105–30118 (PMLR, 2023).

  17. Kang, M. et al. Scaling up GANs for text-to-image synthesis. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10124–10134 (IEEE, 2023).

  18. Saharia, C. et al. Photorealistic text-to-image diffusion models with deep language understanding. Preprint at https://arxiv.org/abs/2205.11487 (2022).

  19. Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference On Machine Learning 8821–8831 (PMLR, 2021).

  20. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 6629–6640 (Curran Associates, 2017).

  21. Salimans, T. et al. Improved techniques for training GANs. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D. et al.) 2234–2242 (Curran Associates, 2016).

  22. Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. In Proc. Thirty-Seventh Asilomar Conference on Signals, Systems & Computers 1398–1402 (IEEE, 2003).

  23. Ravuri, S. & Vinyals, O. Classification accuracy score for conditional generative models. Preprint at https://arxiv.org/abs/1905.10887 (2019).

  24. Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. IEEE/CVF International Conference On Computer Vision 10012–10022 (IEEE, 2021).

  25. Mokady, R., Hertz, A. & Bermano, A. H. ClipCap: CLIP prefix for image captioning. Preprint at https://arxiv.org/abs/2111.09734 (2021).

  26. Koch, G., Zemel, R. & Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proc. 32nd International Conference on Machine Learning (JMLR, 2015).

  27. Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference On Computer Vision And Pattern Recognition 4700–4708 (IEEE, 2017).

  28. Bellovin, S., Dutta, P. & Reitinger, N. Privacy and synthetic datasets. Stan. Tech. L. Rev. 22, 1 (2019).

    Google Scholar 

  29. Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493–497 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Mok, T. S. et al. Osimertinib or platinum-pemetrexed in EGFR T790M-positive lung cancer. N. Engl. J. Med. 376, 629–640 (2017).

    Article  CAS  PubMed  Google Scholar 

  31. Planchard, D. et al. Osimertinib with or without chemotherapy in EGFR-mutated advanced NSCLC. N. Engl. J. Med. 389, 1935–1948 (2023).

    Article  CAS  PubMed  Google Scholar 

  32. Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer Statistics, 2024. CA Cancer J. Clin. 74, 12–49 (2024).

    Article  PubMed  Google Scholar 

  33. Giaquinto, A. N. et al. Breast Cancer Statistics, 2022. CA Cancer J. Clin. 72, 524–541 (2022).

    Article  PubMed  Google Scholar 

  34. Valenza, C. et al. Targeting HER2 heterogeneity in breast and gastrointestinal cancers. Trends Cancer 10, 113–123 (2024).

    Article  CAS  PubMed  Google Scholar 

  35. Antun, V., Renna, F., Poon, C., Adcock, B. & Hansen, A. C. On instabilities of deep learning in image reconstruction and the potential costs of AI. Proc. Natl Acad. Sci. USA 117, 30088–30095 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T. & Saalbach, A. Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9, 6381 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J. & Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proc. 15th International Symposium on Biomedical Imaging 289–293 (IEEE, 2018).

  38. Ghorbani, A., Natarajan, V., Coz, D. & Liu, Y. DermGAN: synthetic generation of clinical skin images with pathology. In Proc. Machine Learning for Health NeurIPS Workshop 155–170 (PMLR, 2020).

  39. Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4396–4405 (IEEE, 2019).

  40. Mitchell, T. et al. Never-ending learning. Commun. ACM 61, 103–115 (2018).

    Article  Google Scholar 

  41. Wang, S.-Q. et al. Diabetic retinopathy diagnosis using multichannel generative adversarial network with semisupervision. IEEE Trans. Autom. Sci. Eng. 18, 574–1585 (2020).

    Article  Google Scholar 

  42. Ahmad, B. et al. Improving skin cancer classification using heavy-tailed Student T-Distribution in Generative Adversarial Networks (TED-GAN). Diagnostics 11, 21–47 (2021).

    Article  Google Scholar 

  43. Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. Preprint at https://arxiv.org/abs/2010.00747 (2020).

  44. Lee, K. et al. Aligning text-to-image models using human feedback. Preprint at https://arxiv.org/abs/2302.12192 (2023).

  45. Howard, J. & Ruder, S. Universal language model fine-tuning for text classification. Preprint at https://arxiv.org/abs/1801.06146 (2018).

  46. Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).

    Article  CAS  PubMed  Google Scholar 

  47. Hu, H. et al. Mutational landscape of secondary glioblastoma guides MET-targeted trial in brain tumor. Cell 175, 1665–1678 (2018).

    Article  CAS  PubMed  Google Scholar 

  48. Liang, W. et al. Impact of EGFR mutation status on tumor response and progression free survival after first-line chemotherapy in patients with advanced non-small-cell lung cancer: a meta-analysis. J. Thorac. Dis. 6, 1239 (2014).

    PubMed  PubMed Central  Google Scholar 

  49. Wang, S. et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur. Respir. J. 53, 1800986 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inform. Proc. Syst. 35, 27730–27744 (2022).

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Wenzhou Eye Hospital, Wenzhou Medical University Eye Health and Disease Advanced Institute, Macau Science and Technology Development Fund, Macao (0007/2020/AFJ, 0070/2020/A2 and 0003/2021/AKP, K.Z.); Guangzhou National Laboratory (YW-SLJC0201, K.Z.); Discipline Development of Peking University (7101302940 and 7101303005, J.W.); the Young Elite Scientist Sponsorship Program by the Beijing Association for Science and Technology (BYESS2023026, J.W.); the Macao Young Scholars Program (AM2023024 and AM2023018, Y.G.) and National Natural Science Foundation of China (W2431057, 32100631 and 32470964). We thank the physicians and patients for providing clinical data.

Author information

Authors and Affiliations

Authors

Contributions

J.W., K.W., Y.Y., Y.L., W.X., Z.S., F.L., Z.Z., Y.G., L.Y., H.-Y.Z., H.M., W.Z., L.H., L.Z., R.G., I.C., B.D., L.C., X.C., J.L., M.-H.Z., D.B.-H., O.M., M.L., Y.K., J.L., S.Z., T.G., J.Z., K.X., E.O., H.L., Y.Y., K.Z. and J.Q. collected and analyzed the data. K.Z. and J.W. conceived and designed the project. K.Z., J.W. and J.Q. designed experiments and supervised the project. J.W. and K.Z. wrote the manuscript. All authors discussed the results and reviewed the manuscript.

Corresponding authors

Correspondence to Jinzhuo Wang, Kang Zhang or Jia Qu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Su-In Lee, Jie Tian and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Joao Monteiro, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Text-conditioned synthesis of medical images.

Samples were created by prompting a fine-tuned model using annotations made by ophthalmologists and radiologists.

Extended Data Fig. 2 AI-based EGFR mutation prediction and its effect on progression-free survival (PFS) rates of advanced lung cancer patients.

a, PFS curves of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in the cohort A. b, PFS curves of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort A. c, PFS curves of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B. d, PFS curves of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B.

Extended Data Fig. 3 AI-based EGFR mutation prediction and its effect on objective response rate (ORR) of advanced lung cancer patients.

a, ORR of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in the cohort A. b, ORR of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort A. c, ORR of true EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B. d, ORR of AI identified EGFR sensitive mutations (underwent TKI therapy) versus insensitive mutations (underwent chemotherapy) in cohort B.

Extended Data Fig. 4 Synthetic images assist in ophthalmological diagnostic.

For diagnostic labels with lower classification performance, MINIM can selectively generate synthetic OCT images and integrate them into the training data, thereby further refining the diagnostic model’s accuracy.

Extended Data Fig. 5 A typical cases of common diseases and report generation.

a, Examples of atypical cases of common diseases including retinal vein occlusions (RVOs), diabetic retinopathy (DR), and choroidal neovascularization (CNV). b, When compared to reports generated by selected report generation models and ophthalmologists, the models can identify more comprehensive and accurate information. e, Incorporating OCT image data synthesized by MINIM into the self-supervised learning of OCT image models can enhance the representational capabilities of pre-trained models.

Extended Data Fig. 6 Breast MRI dataset and analysis flowchart.

Breast MR images were split into Tumor and Benign, each of which has three modalities, that is, T1, T1c and T2. These images were sent to MINIM for generative model training and downstream HER2 mutation detection.

Extended Data Table 1 Data characteristics of the cohorts used in this study
Extended Data Table 2 Ablation study on quantitative assessment of the synthetic images using metrics of fidelity and diversity
Extended Data Table 3 Ablation study quantitative assessment of the synthetic images in a multi-classification task
Extended Data Table 4 Multi-class diagnosis performance using real OCT images and mixed data of real and synthetic OCT images

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Wang, K., Yu, Y. et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med 31, 609–617 (2025). https://doi.org/10.1038/s41591-024-03359-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41591-024-03359-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载