这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Vulnerability detection in software systems is a critical challenge due to the increasing complexity of code and the rising frequency of security vulnerabilities. Traditional approaches typically rely on single-modality inputs and struggle to distinguish between similar code snippets. However, multi-modal methods find it challenging to balance performance and efficiency. To address these challenges, we propose MCL-VD, a framework that leverages multi-modal inputs including source code, code comments, and AST to capture complementary structural and contextual information. We employ LoRA, which reduces the computational burden by optimizing the number of trainable parameters without sacrificing performance. Additionally, we apply multi-modal contrastive learning to align and differentiate the representations across the three modalities, thereby enhancing the model’s discriminative power and robustness. We designed and conducted experiments on three public benchmark datasets, i.e., Devign, Reveal, and Big-Vul. The experimental results show that MCL-VD significantly outperforms the best-performing baselines, achieving F1-score improvements ranging from 4.86% to 17.26%. These results highlight the effectiveness of combining multi-modal contrastive learning with LoRA optimization, providing a powerful and efficient solution for vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

No datasets were generated or analysed during the current study.

Notes

  1. https://nvd.nist.gov

  2. https://unit42.paloaltonetworks.com/

  3. https://github.com/olivergo7/MCL-VD

  4. https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html

References

  • Aberdam, A., Litman, R., Tsiper, S., Anschel, O., Slossberg, R., Mazor, S., Manmatha, R., Perona, P.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15302–15312 (2021)

  • Ayewah, N., Pugh, W., Hovemeyer, D., Morgenthaler, J.D., Penix, J.: Using static analysis to find bugs. IEEE Softw. 25(5), 22–29 (2008)

    Article  Google Scholar 

  • Bruening, D., Amarasinghe, S.: Efficient, transparent, and comprehensive runtime code manipulation (2004)

  • Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Trans. Softw. Eng. 48(9), 3280–3296 (2021)

    Article  Google Scholar 

  • Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR

  • Cheng, X., Zhang, G., Wang, H., Sui, Y.: Path-sensitive code embedding via contrastive learning for software vulnerability detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 519–531 (2022)

  • Chess, B., McGraw, G.: Static analysis for security. IEEE Sec. Privacy. 2(6), 76–79 (2004)

    Article  Google Scholar 

  • Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized llms (2023). 52:3982–3992 (2023). arXiv:2305.14314

  • Fan, J., Li, Y., Wang, S., Nguyen, T.N.: Ac/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 (2020)

  • Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages (2020). arXiv:2002.08155

  • Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: Unixcoder: Unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7212–7225 (2022)

  • Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., et al.: Graphcodebert: Pre-training code representations with data flow (2020). arXiv:2009.08366

  • Hanif, H., Maffeis, S.: Vulberta: Simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2022). IEEE

  • Hayou, S., Ghosh, N., Yu, B.: Lora+: Efficient low rank adaptation of large models (2024). arXiv:2402.12354

  • He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

  • Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models (2021). arXiv:2106.09685

  • Jain, P., Jain, A., Zhang, T., Abbeel, P., Gonzalez, J.E., Stoica, I.: Contrastive code representation learning (2020). arXiv:2007.04973

  • Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., Yan, M., Ye, Q., Zhang, J., Huang, F., Zhang, S.: Hallucination augmented contrastive learning for multimodal large language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27036–27046 (2024)

  • Kan, X., Sun, C., Liu, S., Huang, Y., Tan, G., Ma, S., Zhang, Y.: Sdft: A pdg-based summarization for efficient dynamic data flow tracking. In: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), pp. 702–713 (2021). IEEE

  • Kim, S., Kim, R.Y.C., Park, Y.B.: Software vulnerability detection methodology combined with static and dynamic analysis. Wireless Personal Commun. 89, 777–793 (2016)

    Article  Google Scholar 

  • Kopiczko, D.J., Blankevoort, T., Asano, Y.M.: Vera: Vector-based random matrix adaptation (2023). arXiv:2310.11454

  • Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proceed. IEEE. 103(9), 1449–1477 (2015)

    Article  Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceed. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  • Li, L., Ding, S.H., Walenstein, A., Charland, P., Fung, B.C.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp. 1215–1225 (2024)

  • Li, Y., Yu, Y., Liang, C., He, P., Karampatziakis, N., Chen, W., Zhao, T.: Loftq: Lora-fine-tuning-aware quantization for large language models (2023). arXiv:2310.08659

  • Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: A deep learning-based system for vulnerability detection (2018). arXiv:1801.01681

  • Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Secure Comput. 19(4), 2244–2258 (2021)

    Article  Google Scholar 

  • Liu, T., Curtsinger, C., Berger, E.D.: Doubletake: Fast and precise error detection via evidence-based dynamic analysis. In: Proceedings of the 38th International Conference on Software Engineering, pp. 911–922 (2016)

  • Liu, S.-Y., Wang, C.-Y., Yin, H., Molchanov, P., Wang, Y.-C.F., Cheng, K.-T., Chen, M.-H.: Dora: Weight-decomposed low-rank adaptation (2024). arXiv:2402.09353

  • Liu, S., Wu, B., Xie, X., Meng, G., Liu, Y.: Contrabert: Enhancing code pre-trained models via contrastive learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 2476–2487 (2023). IEEE

  • Livshits, V.B., Lam, M.S.: Finding security vulnerabilities in java applications with static analysis. In: USENIX Security Symposium, vol. 14, pp. 18–18 (2005)

  • Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)

  • Marjanov, T., Pashchenko, I., Massacci, F.: Machine learning for source code vulnerability detection: What works and what isn’t there yet. IEEE Sec. Privacy. 20(5), 60–76 (2022)

    Article  Google Scholar 

  • Meng, F., Wang, Z., Zhang, M.: Pissa: Principal singular values and singular vectors adaptation of large language models (2024). arXiv:2404.02948

  • Neelakantan, A., Xu, T., Puri, R., Radford, A., Han, J.M., Tworek, J., Yuan, Q., Tezak, N., Kim, J.W., Hallacy, C., et al.: Text and code embeddings by contrastive pre-training (2022). arXiv:2201.10005

  • Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Notices. 42(6), 89–100 (2007)

    Article  Google Scholar 

  • Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)

  • Nguyen, V.-A., Nguyen, D.Q., Nguyen, V., Le, T., Tran, Q.H., Phung, D.: Regvd: Revisiting graph neural networks for vulnerability detection. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pp. 178–182 (2022)

  • Nunes, P., Medeiros, I., Fonseca, J., Neves, N., Correia, M., Vieira, M.: An empirical study on combining diverse static analysis tools for web security vulnerabilities based on development scenarios. Computing. 101, 161–185 (2019)

    Article  MathSciNet  Google Scholar 

  • Renduchintala, A., Konuk, T., Kuchaiev, O.: Tied-lora: Enhacing parameter efficiency of lora with weight tying (2023). arXiv:2311.09578

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature. 323(6088), 533–536 (1986)

  • Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., McConley, M.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762 (2018). IEEE

  • Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Vats, I., Moazen, H., Sarro, F.: A survey on machine learning techniques for source code analysis (2021). arXiv:2110.09610

  • Shin, Y., Williams, L.: Is complexity really the enemy of software security? In: Proceedings of the 4th ACM Workshop on Quality of Protection, pp. 47–50 (2008)

  • Svyatkovskiy, A., Deng, S.K., Fu, S., Sundaresan, N.: Intellicode compose: Code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433–1443 (2020)

  • Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation (2021). arXiv:2109.00859

  • Wang, R., Xu, S., Tian, Y., Ji, X., Sun, X., Jiang, S.: Scl-cvd: Supervised contrastive learning for code vulnerability detection via graphcodebert. Comput. Sec. 145, 103994 (2024)

    Article  Google Scholar 

  • Wen, F., Nagy, C., Bavota, G., Lanza, M.: A large-scale empirical study on code-comment inconsistencies. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 53–64 (2019). IEEE

  • Wu, H., Zhao, H., Zhang, M.: Code summarization with structure-induced transformer. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1078–1090 (2021)

  • Wu, Y., Zou, D., Dou, S., Yang, W., Xu, D., Jin, H.: Vulcnn: An image-inspired scalable vulnerability detection system. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2365–2376 (2022)

  • Xu, S., Zhang, X., Wu, Y., Wei, F.: Sequence level contrastive learning for text summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11556–11565 (2022)

  • Yang, Y., Xia, X., Lo, D., Grundy, J.: A survey on deep learning for software engineering. ACM Comput. Surv. (CSUR). 54(10s), 1–73 (2022)

    Article  Google Scholar 

  • Zhang, Q., Chen, M., Bukharin, A., Karampatziakis, N., He, P., Cheng, Y., Chen, W., Zhao, T.: Adalora: Adaptive budget allocation for parameter-efficient fine-tuning (2023). arXiv:2303.10512

  • Zhang, H., Li, Z., Li, G., Ma, L., Liu, Y., Jin, Z.: Generating adversarial examples for holding robustness of source code processing models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1169–1176 (2020)

  • Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 32 (2019)

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China (Grant No. 61872263) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX25_2000).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed significantly to the research and preparation of this paper. Yi Cao conceptualized the study, designed the MCL-VD framework, and led the development of the LoRA-enhanced GraphCodeBERT model. Xiaolin Ju implemented the multi-modal contrastive learning approach and conducted the experiments, including dataset preparation and performance evaluation. Xiang Chen focused on the theoretical aspects of contrastive learning and contributed to the analysis and interpretation of the results. Lina Gong provided expertise in vulnerability detection, reviewed existing methods, and assisted in benchmarking the proposed framework against state-of-the-art baselines. All authors contributed to the writing and editing of the manuscript, with Yi Cao and Xiaolin Ju coordinating the overall structure and ensuring technical accuracy. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Xiaolin Ju or Xiang Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Ju, X., Chen, X. et al. MCL-VD: Multi-modal contrastive learning with LoRA-enhanced GraphCodeBERT for effective vulnerability detection. Autom Softw Eng 32, 67 (2025). https://doi.org/10.1007/s10515-025-00543-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s10515-025-00543-3

Keywords