MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection

Cao, Yi; Ju, Xiaolin; Chen, Xiang; Gong, Lina

doi:10.1007/s10515-025-00543-3

MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection

Published: 28 July 2025

Volume 32, article number 67, (2025)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

291 Accesses
1 Citation
Explore all metrics

Abstract

Vulnerability detection in software systems is a critical challenge due to the increasing complexity of code and the rising frequency of security vulnerabilities. Traditional approaches typically rely on single-modality inputs and struggle to distinguish between similar code snippets. However, multi-modal methods find it challenging to balance performance and efficiency. To address these challenges, we propose MCL-VD, a framework that leverages multi-modal inputs including source code, code comments, and AST to capture complementary structural and contextual information. We employ LoRA, which reduces the computational burden by optimizing the number of trainable parameters without sacrificing performance. Additionally, we apply multi-modal contrastive learning to align and differentiate the representations across the three modalities, thereby enhancing the model’s discriminative power and robustness. We designed and conducted experiments on three public benchmark datasets, i.e., Devign, Reveal, and Big-Vul. The experimental results show that MCL-VD significantly outperforms the best-performing baselines, achieving F1-score improvements ranging from 4.86% to 17.26%. These results highlight the effectiveness of combining multi-modal contrastive learning with LoRA optimization, providing a powerful and efficient solution for vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Consistency-Aware Fake Videos Detection on Short Video Platforms

Software Vulnerability Detection via Multimodal Deep Learning

Data Availability

No datasets were generated or analysed during the current study.

Notes

References

Aberdam, A., Litman, R., Tsiper, S., Anschel, O., Slossberg, R., Mazor, S., Manmatha, R., Perona, P.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15302–15312 (2021)
Ayewah, N., Pugh, W., Hovemeyer, D., Morgenthaler, J.D., Penix, J.: Using static analysis to find bugs. IEEE Softw. 25(5), 22–29 (2008)
Article Google Scholar
Bruening, D., Amarasinghe, S.: Efficient, transparent, and comprehensive runtime code manipulation (2004)
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Trans. Softw. Eng. 48(9), 3280–3296 (2021)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR
Cheng, X., Zhang, G., Wang, H., Sui, Y.: Path-sensitive code embedding via contrastive learning for software vulnerability detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 519–531 (2022)
Chess, B., McGraw, G.: Static analysis for security. IEEE Sec. Privacy. 2(6), 76–79 (2004)
Article Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized llms (2023). 52:3982–3992 (2023). arXiv:2305.14314
Fan, J., Li, Y., Wang, S., Nguyen, T.N.: Ac/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 (2020)
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages (2020). arXiv:2002.08155
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: Unixcoder: Unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7212–7225 (2022)
Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., et al.: Graphcodebert: Pre-training code representations with data flow (2020). arXiv:2009.08366
Hanif, H., Maffeis, S.: Vulberta: Simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2022). IEEE
Hayou, S., Ghosh, N., Yu, B.: Lora+: Efficient low rank adaptation of large models (2024). arXiv:2402.12354
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models (2021). arXiv:2106.09685
Jain, P., Jain, A., Zhang, T., Abbeel, P., Gonzalez, J.E., Stoica, I.: Contrastive code representation learning (2020). arXiv:2007.04973
Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., Yan, M., Ye, Q., Zhang, J., Huang, F., Zhang, S.: Hallucination augmented contrastive learning for multimodal large language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27036–27046 (2024)
Kan, X., Sun, C., Liu, S., Huang, Y., Tan, G., Ma, S., Zhang, Y.: Sdft: A pdg-based summarization for efficient dynamic data flow tracking. In: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), pp. 702–713 (2021). IEEE
Kim, S., Kim, R.Y.C., Park, Y.B.: Software vulnerability detection methodology combined with static and dynamic analysis. Wireless Personal Commun. 89, 777–793 (2016)
Article Google Scholar
Kopiczko, D.J., Blankevoort, T., Asano, Y.M.: Vera: Vector-based random matrix adaptation (2023). arXiv:2310.11454
Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proceed. IEEE. 103(9), 1449–1477 (2015)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceed. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, L., Ding, S.H., Walenstein, A., Charland, P., Fung, B.C.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp. 1215–1225 (2024)
Li, Y., Yu, Y., Liang, C., He, P., Karampatziakis, N., Chen, W., Zhao, T.: Loftq: Lora-fine-tuning-aware quantization for large language models (2023). arXiv:2310.08659
Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: A deep learning-based system for vulnerability detection (2018). arXiv:1801.01681
Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Secure Comput. 19(4), 2244–2258 (2021)
Article Google Scholar
Liu, T., Curtsinger, C., Berger, E.D.: Doubletake: Fast and precise error detection via evidence-based dynamic analysis. In: Proceedings of the 38th International Conference on Software Engineering, pp. 911–922 (2016)
Liu, S.-Y., Wang, C.-Y., Yin, H., Molchanov, P., Wang, Y.-C.F., Cheng, K.-T., Chen, M.-H.: Dora: Weight-decomposed low-rank adaptation (2024). arXiv:2402.09353
Liu, S., Wu, B., Xie, X., Meng, G., Liu, Y.: Contrabert: Enhancing code pre-trained models via contrastive learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 2476–2487 (2023). IEEE
Livshits, V.B., Lam, M.S.: Finding security vulnerabilities in java applications with static analysis. In: USENIX Security Symposium, vol. 14, pp. 18–18 (2005)
Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)
Marjanov, T., Pashchenko, I., Massacci, F.: Machine learning for source code vulnerability detection: What works and what isn’t there yet. IEEE Sec. Privacy. 20(5), 60–76 (2022)
Article Google Scholar
Meng, F., Wang, Z., Zhang, M.: Pissa: Principal singular values and singular vectors adaptation of large language models (2024). arXiv:2404.02948
Neelakantan, A., Xu, T., Puri, R., Radford, A., Han, J.M., Tworek, J., Yuan, Q., Tezak, N., Kim, J.W., Hallacy, C., et al.: Text and code embeddings by contrastive pre-training (2022). arXiv:2201.10005
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Notices. 42(6), 89–100 (2007)
Article Google Scholar
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)
Nguyen, V.-A., Nguyen, D.Q., Nguyen, V., Le, T., Tran, Q.H., Phung, D.: Regvd: Revisiting graph neural networks for vulnerability detection. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pp. 178–182 (2022)
Nunes, P., Medeiros, I., Fonseca, J., Neves, N., Correia, M., Vieira, M.: An empirical study on combining diverse static analysis tools for web security vulnerabilities based on development scenarios. Computing. 101, 161–185 (2019)
Article MathSciNet Google Scholar
Renduchintala, A., Konuk, T., Kuchaiev, O.: Tied-lora: Enhacing parameter efficiency of lora with weight tying (2023). arXiv:2311.09578
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature. 323(6088), 533–536 (1986)
Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., McConley, M.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762 (2018). IEEE
Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Vats, I., Moazen, H., Sarro, F.: A survey on machine learning techniques for source code analysis (2021). arXiv:2110.09610
Shin, Y., Williams, L.: Is complexity really the enemy of software security? In: Proceedings of the 4th ACM Workshop on Quality of Protection, pp. 47–50 (2008)
Svyatkovskiy, A., Deng, S.K., Fu, S., Sundaresan, N.: Intellicode compose: Code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433–1443 (2020)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation (2021). arXiv:2109.00859
Wang, R., Xu, S., Tian, Y., Ji, X., Sun, X., Jiang, S.: Scl-cvd: Supervised contrastive learning for code vulnerability detection via graphcodebert. Comput. Sec. 145, 103994 (2024)
Article Google Scholar
Wen, F., Nagy, C., Bavota, G., Lanza, M.: A large-scale empirical study on code-comment inconsistencies. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 53–64 (2019). IEEE
Wu, H., Zhao, H., Zhang, M.: Code summarization with structure-induced transformer. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1078–1090 (2021)
Wu, Y., Zou, D., Dou, S., Yang, W., Xu, D., Jin, H.: Vulcnn: An image-inspired scalable vulnerability detection system. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2365–2376 (2022)
Xu, S., Zhang, X., Wu, Y., Wei, F.: Sequence level contrastive learning for text summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11556–11565 (2022)
Yang, Y., Xia, X., Lo, D., Grundy, J.: A survey on deep learning for software engineering. ACM Comput. Surv. (CSUR). 54(10s), 1–73 (2022)
Article Google Scholar
Zhang, Q., Chen, M., Bukharin, A., Karampatziakis, N., He, P., Cheng, Y., Chen, W., Zhao, T.: Adalora: Adaptive budget allocation for parameter-efficient fine-tuning (2023). arXiv:2303.10512
Zhang, H., Li, Z., Li, G., Ma, L., Liu, Y., Jin, Z.: Generating adversarial examples for holding robustness of source code processing models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1169–1176 (2020)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 32 (2019)

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China (Grant No. 61872263) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX25_2000).

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Nantong University, Nantong, Jiangsu, China
Yi Cao, Xiaolin Ju & Xiang Chen
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
Lina Gong

Authors

Yi Cao
View author publications
Search author on:PubMed Google Scholar
Xiaolin Ju
View author publications
Search author on:PubMed Google Scholar
Xiang Chen
View author publications
Search author on:PubMed Google Scholar
Lina Gong
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed significantly to the research and preparation of this paper. Yi Cao conceptualized the study, designed the MCL-VD framework, and led the development of the LoRA-enhanced GraphCodeBERT model. Xiaolin Ju implemented the multi-modal contrastive learning approach and conducted the experiments, including dataset preparation and performance evaluation. Xiang Chen focused on the theoretical aspects of contrastive learning and contributed to the analysis and interpretation of the results. Lina Gong provided expertise in vulnerability detection, reviewed existing methods, and assisted in benchmarking the proposed framework against state-of-the-art baselines. All authors contributed to the writing and editing of the manuscript, with Yi Cao and Xiaolin Ju coordinating the overall structure and ensuring technical accuracy. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Xiaolin Ju or Xiang Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, Y., Ju, X., Chen, X. et al. MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection. Autom Softw Eng 32, 67 (2025). https://doi.org/10.1007/s10515-025-00543-3

Download citation

Received: 23 December 2024
Accepted: 20 July 2025
Published: 28 July 2025
Version of record: 28 July 2025
DOI: https://doi.org/10.1007/s10515-025-00543-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Consistency-Aware Fake Videos Detection on Short Video Platforms

Software Vulnerability Detection via Multimodal Deep Learning

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now