Deep learning-based software engineering: progress, challenges, and opportunities

Chen, Xiangping; Hu, Xing; Huang, Yuan; Jiang, He; Ji, Weixing; Jiang, Yanjie; Jiang, Yanyan; Liu, Bo; Liu, Hui; Li, Xiaochen; Lian, Xiaoli; Meng, Guozhu; Peng, Xin; Sun, Hailong; Shi, Lin; Wang, Bo; Wang, Chong; Wang, Jiayi; Wang, Tiantian; Xuan, Jifeng; Xia, Xin; Yang, Yibiao; Yang, Yixin; Zhang, Li; Zhou, Yuming; Zhang, Lu

doi:10.1007/s11432-023-4127-5

Deep learning-based software engineering: progress, challenges, and opportunities

Review
Open access
Published: 24 December 2024

Volume 68, article number 111102, (2025)
Cite this article

Download PDF

You have full access to this open access article

Science China Information Sciences Aims and scope Submit manuscript

Deep learning-based software engineering: progress, challenges, and opportunities

Download PDF

2917 Accesses
5 Citations
Explore all metrics

Abstract

Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many studies have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However, although several surveys have provided overall pictures of the application of deep learning techniques in software engineering, they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-tuned for software engineering tasks. We still lack surveys explaining the advances of subareas in software engineering driven by deep learning techniques, as well as challenges and opportunities in each subarea. To this end, in this study, we present the first task-oriented survey on deep learning-based software engineering. It covers twelve major software engineering subareas significantly impacted by deep learning techniques. Such subareas spread out through the whole lifecycle of software development and maintenance, including requirements engineering, software development, testing, maintenance, and developer collaboration. As we believe that deep learning may provide an opportunity to revolutionize the whole discipline of software engineering, providing one survey covering as many subareas as possible in software engineering can help future research push forward the frontier of deep learning-based software engineering more systematically. For each of the selected subareas, we highlight the major advances achieved by applying deep learning techniques with pointers to the available datasets in such a subarea. We also discuss the challenges and opportunities concerning each of the surveyed software engineering subareas.

Article PDF

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504–507
Article MathSciNet MATH Google Scholar
Liu L, Ouyang W, Wang X, et al. Deep learning for generic object detection: a survey. Int J Comput Vis, 2020, 128: 261–318
Article MATH Google Scholar
Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Comput, 2006, 18: 1527–1554
Article MathSciNet MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90
Article MATH Google Scholar
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324
Article MATH Google Scholar
Elman J L. Finding structure in time. Cogn Sci, 1990, 14: 179–211
Article MATH Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780
Article MATH Google Scholar
Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Trans Signal Process, 1997, 45: 2673–2681
Article MATH Google Scholar
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 30
MATH Google Scholar
Yang Y M, Xia X, Lo D, et al. A survey on deep learning for software engineering. ACM Comput Surv, 2022, 54: 1–73
Article MATH Google Scholar
Nguyen G, Dlugolinsky S, Bobák M, et al. Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev, 2019, 52: 77–124
Article MATH Google Scholar
Wang J, Ma Y, Zhang L, et al. Deep learning for smart manufacturing: Methods and applications. J Manuf Syst, 2018, 48: 144–156
Article MATH Google Scholar
Shen D, Wu G, Suk H I. Deep learning in medical image analysis. Annu Rev Biomed Eng, 2017, 19: 221–248
Article MATH Google Scholar
Berman D S, Buczak A L, Chavis J S, et al. A survey of deep learning methods for cyber security. Information, 2019, 10: 122
Article MATH Google Scholar
Le T H, Chen H, Babar M A. Deep learning for source code modeling and generation: models, applications, and challenges. ACM Comput Surv, 2021, 53: 1–38
Article MATH Google Scholar
Svyatkovskiy A, Zhao Y, Fu S, et al. Pythia: AI-assisted code completion system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019. 2727–2735
Chapter MATH Google Scholar
Iyer S, Konstas I, Cheung A, et al. Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
Google Scholar
Aniche M, Maziero E, Durelli R, et al. The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Trans Software Eng, 2020, 48: 1432–1450
Article Google Scholar
Gu X, Zhang H, Kim S. Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, 2018. 933–944
Chapter MATH Google Scholar
Wardat M, Le W, Rajan H. Deeplocalize: fault localization for deep neural networks. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering, 2021. 251–262
MATH Google Scholar
Li Y, Wang S, Nguyen T N. DLFix: context-based code transformation learning for automated program repair. In: Proceedings of the 42nd International Conference on Software Engineering, Seoul, 2020. 602–614
MATH Google Scholar
Zou D, Wang S, Xu S, et al. μVulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans Dependable Secure Comput, 2019, 18: 2224–2236
MATH Google Scholar
Humbatova N, Jahangirova G, Tonella P. DeepCrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 67–78
Chapter Google Scholar
Watson C, Cooper N, Palacio D N, et al. A systematic literature review on the use of deep learning in software engineering research. ACM Trans Softw Eng Methodol, 2022, 31: 1–58
Article MATH Google Scholar
Niu C, Li C, Luo B, et al. Deep learning meets software engineering: a survey on pre-trained models of source code. 2022. ArXiv:2205.11739
MATH Google Scholar
Zhang Q, Fang C, Xie Y, et al. A survey on large language models for software engineering. 2023. ArXiv:2312.15223
MATH Google Scholar
Jin Z. Environment Modeling-Based Requirements Engineering for Software Intensive Systems. San Francisco: Morgan Kaufmann Publishers Inc., 2017
MATH Google Scholar
Huang Q, Xia X, Lo D, et al. Automating intention mining. IEEE Trans Software Eng, 2020, 46: 1098–1119
Article Google Scholar
Pudlitz F, Brokhausen F, Vogelsang A. Extraction of system states from natural language requirements. In: Proceedings of the IEEE 27th International Requirements Engineering Conference (RE), 2019. 211–222
Google Scholar
Li M, Shi L, Yang Y, et al. A deep multitask learning approach for requirements discovery and annotation from open forum. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2021. 336–348
MATH Google Scholar
Guo H, Singh M P. Caspar: extracting and synthesizing user stories of problems from app reviews. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 628–640
Chapter MATH Google Scholar
Mekala R R, Irfan A, Groen E C, et al. Classifying user requirements from online feedback in small dataset environments using deep learning. In: Proceedings of the IEEE 29th International Requirements Engineering Conference (RE), 2021. 139–149
MATH Google Scholar
Tizard J, Devine P, Wang H, et al. A software requirements ecosystem: linking forum, issue tracker, and faqs for requirements management. IEEE Trans Software Eng, 2023, 49: 2381–2393
Article MATH Google Scholar
Shi L, Xing M, Li M, et al. Detection of hidden feature requests from massive chat messages via deep Siamese network. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 641–653
Chapter MATH Google Scholar
Pan S, Bao L, Ren X, et al. Automating developer chat mining. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 854–866
MATH Google Scholar
Türetken O, Su O, Demirörs O. Automating software requirements generation from business process models. In: Proceedings of the 1st Conference on the Principles of Software Engineering (PRISE’04), 2004
Google Scholar
Cox K, Phalp K T, Bleistein S J, et al. Deriving requirements from process models via the problem frames approach. Inf Software Tech, 2005, 47: 319–337
Article MATH Google Scholar
Maiden N A M, Manning S, Jones S, et al. Generating requirements from systems models using patterns: a case study. Requir Eng, 2005, 10: 276–288
Article MATH Google Scholar
Yu E S K, Bois P D, Dubois E, et al. From organization models to system requirements: a ‘cooperating agents’ approach. In: Proceedings of the 3rd International Conference on Cooperative Information Systems (CoopIS-95), 1995. 194–204
MATH Google Scholar
Letier E, van Lamsweerde A. Deriving operational software specifications from system goals. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2002. 119–128
Chapter MATH Google Scholar
Landtsheer R D, Letier E, van Lamsweerde A. Deriving tabular event-based specifications from goal-oriented requirements models. Requir Eng, 2004, 9: 104–120
Article MATH Google Scholar
van Lamsweerde A. Goal-oriented requirements enginering: a roundtrip from research to practice [enginering read engineering]. In: Proceedings of the 12th IEEE International Requirements Engineering Conference, 2004. 4–7
MATH Google Scholar
van Lamsweerde A, Willemet L. Inferring declarative requirements specifications from operational scenarios. IEEE Trans Software Eng, 1998, 24: 1089–1114
Article MATH Google Scholar
Meziane F, Athanasakis N, Ananiadou S. Generating natural language specifications from UML class diagrams. Requir Eng, 2008, 13: 1–18
Article Google Scholar
Berenbach B. The automated extraction of requirements from UML models. In: Proceedings of the 11th IEEE International Conference on Requirements Engineering (RE 2003), 2003. 287
Chapter Google Scholar
Souag A, Mazo R, Salinesi C, et al. Using the AMAN-DA method to generate security requirements: a case study in the maritime domain. Requir Eng, 2018, 23: 557–580
Article Google Scholar
Zhao Z, Zhang L, Lian X, et al. ReqGen: keywords-driven software requirements generation. Mathematics, 2023, 11: 332
Article Google Scholar
Koscinski V, Hashemi S, Mirakhorli M. On-demand security requirements synthesis with relational generative adversarial networks. In: Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023. 1613–1625
Google Scholar
Li M, Yang Y, Shi L, et al. Automated extraction of requirement entities by leveraging LSTM-CRF and transfer learning. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2020. 208–219
MATH Google Scholar
Casillo F, Deufemia V, Gravino C. Detecting privacy requirements from user stories with NLP transfer learning models. Inf Software Tech, 2022, 146: 106853
Article Google Scholar
Ezzini S, Abualhaija S, Arora C, et al. Automated handling of anaphoric ambiguity in requirements: a multi-solution study. In: Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 187–199
Google Scholar
Wang Y, Shi L, Li M, et al. Detecting coreferent entities in natural language requirements. Requir Eng, 2022, 27: 351–373
Article MATH Google Scholar
Wang Y, Shi L, Li M, et al. A deep context-wise method for coreference detection in natural language requirements. In: Proceedings of the IEEE 28th International Requirements Engineering Conference (RE), 2020. 180–191
MATH Google Scholar
Ezzini S, Abualhaija S, Arora C, et al. AI-based question answering assistance for analyzing natural-language requirements. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Google Scholar
Baker C, Deng L, Chakraborty S, et al. Automatic multi-class non-functional software requirements classification using neural networks. In: Proceedings of the IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 2019. 610–615
MATH Google Scholar
Hey T, Keim J, Koziolek A, et al. NoRBERT: transfer learning for requirements classification. In: Proceedings of the IEEE 28th International Requirements Engineering Conference (RE), 2020. 169–179
MATH Google Scholar
Luo X, Xue Y, Xing Z, et al. PRCBERT: prompt learning for requirement classification using BERT-based pretrained language models. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2023
MATH Google Scholar
Winkler J P, Grönberg J, Vogelsang A. Predicting how to test requirements: an automated approach. In: Proceedings of the IEEE 27th International Requirements Engineering Conference (RE), 2019. 120–130
MATH Google Scholar
AlDhafer O, Ahmad I, Mahmood S. An end-to-end deep learning system for requirements classification using recurrent neural networks. Inf Software Tech, 2022, 147: 106877
Article MATH Google Scholar
Guo J, Cheng J, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 3–14
MATH Google Scholar
Jahan M S, Khan H U, Akbar S, et al. Bidirectional language modeling: a systematic literature review. Sci Program, 2021. doi: https://doi.org/10.1155/2021/6641832
Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2019, 36: 1234–1240
Article MATH Google Scholar
Feng Z, Guo D, Tang D, et al. CodeBERT: a pre-trained model for programming and natural languages. In: Proceedings of Findings of the Association for Computational Linguistics, 2020. 1536–1547
MATH Google Scholar
Lin J, Liu Y, Zeng Q, et al. Traceability transformed: generating more accurate links with pre-trained BERT models. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 324–335
MATH Google Scholar
Tian J, Zhang L, Lian X. A cross-level requirement trace link update model based on bidirectional encoder representations from transformers. Mathematics, 2023, 11: 623
Article MATH Google Scholar
Lin J, Liu Y, Cleland-Huang J. Information retrieval versus deep learning approaches for generating traceability links in bilingual projects. Empir Software Eng, 2022, 27: 5
Article MATH Google Scholar
ISO/IEC/IEEE International Standard. Systems and software engineering — life cycle processes — requirements engineering. ISO/IEC/IEEE 29148:2018(E), 2018. 1–104. https://www.iso.org/standard/72089.html.
Google Scholar
Mavin A, Wilkinson P, Harwood A, et al. Easy approach to requirements syntax (EARS). In: Proceedings of the 17th IEEE International Requirements Engineering Conference, 2009. 317–322
MATH Google Scholar
Franch X, Glinz M, Mendez D, et al. A study about the knowledge and use of requirements engineering standards in industry. IEEE Trans Software Eng, 2022, 48: 3310–3325
Article MATH Google Scholar
Liang J T, Yang C, Myers B A. A large-scale survey on the usability of AI programming assistants: successes and challenges. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2023
MATH Google Scholar
Kelly S, Tolvanen J P. Domain-Specific Modeling: Enabling Full Code Generation. Hoboken: John Wiley & Sons, 2008
Book MATH Google Scholar
Allamanis M, Barr E T, Devanbu P, et al. A survey of machine learning for big code and naturalness. ACM Comput Surv, 2018, 51: 1–37
Article MATH Google Scholar
Murphy G C, Kersten M, Findlater L. How are Java software developers using the Eclipse IDE? IEEE Softw, 2006, 23: 76–83
Article Google Scholar
Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009. 213–222
Chapter Google Scholar
Gvero T, Kuncak V, Kuraj I, et al. Complete completion using types and weights. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013. 27–38
Chapter MATH Google Scholar
Zheng Q, Xia X, Zou X, et al. CodeGeeX: a pre-trained model for code generation with multilingual evaluations on HumanEval-X. 2023. ArXiv:2303.17568
MATH Google Scholar
Rabinovich M, Stern M, Klein D. Abstract syntax networks for code generation and semantic parsing. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 1139–1149
MATH Google Scholar
Iyer S, Cheung A, Zettlemoyer L. Learning programmatic idioms for scalable semantic parsing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019. 5425–5434
Google Scholar
Yin P, Neubig G. A syntactic neural model for general-purpose code generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 440–450
MATH Google Scholar
Yin P, Neubig G. TRANX: a transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018. 7–12
MATH Google Scholar
Jiang H, Zhou C, Meng F, et al. Exploring dynamic selection of branch expansion orders for code generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 5076–5085
MATH Google Scholar
Dong L, Lapata M. Language to logical form with neural attention. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
MATH Google Scholar
Yu T, Zhang R, Yang K, et al. Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. 3911–3921
MATH Google Scholar
Sethi A, Sankaran A, Panwar N, et al. DLPaper2Code: auto-generation of code from deep learning research papers. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018
MATH Google Scholar
Yang G, Zhou Y, Chen X, et al. ExploitGen: template-augmented exploit code generation based on CodeBERT. J Syst Software, 2023, 197: 111577
Article Google Scholar
Ling W, Blunsom P, Grefenstette E, et al. Latent predictor networks for code generation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
MATH Google Scholar
Lyu C, Wang R, Zhang H, et al. Embedding API dependency graph for neural code generation. Empir Software Eng, 2021, 26: 61
Article MATH Google Scholar
Clement C B, Drain D, Timcheck J, et al. PyMT5: multi-mode translation of natural language and Python code with transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020. 9052–9065
MATH Google Scholar
Le H, Wang Y, Gotmare A D, et al. CodeRL: mastering code generation through pretrained models and deep reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2022. 35: 21314–21328
Google Scholar
Wang Y, Wang W, Joty S R, et al. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021. 8696–8708
MATH Google Scholar
Sun Y, Tang D, Duan N, et al. Semantic parsing with syntax- and table-aware SQL generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. 361–372
MATH Google Scholar
Wang X, Wang Y, Wan Y, et al. Compilable neural code generation with compiler feedback. In: Proceedings of Findings of the Association for Computational Linguistics, 2022. 9–19
MATH Google Scholar
Poesia G, Polozov A, Le V, et al. Synchromesh: reliable code generation from pre-trained language models. In: Proceedings of the 10th International Conference on Learning Representations, 2022
MATH Google Scholar
Wei B, Li G, Xia X, et al. Code generation as a dual task of code summarization. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 32
MATH Google Scholar
Ahmad W U, Chakraborty S, Ray B, et al. Unified pre-training for program understanding and generation. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021. 2655–2668
MATH Google Scholar
Ye W, Xie R, Zhang J, et al. Leveraging code generation to improve code retrieval and summarization via dual learning. In: Proceedings of the Web Conference 2020, 2020. 2309–2319
Chapter MATH Google Scholar
Hashimoto T B, Guu K, Oren Y, et al. A retrieve-and-edit framework for predicting structured outputs. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 31
MATH Google Scholar
Kulal S, Pasupat P, Chandra K, et al. SPoC: search-based pseudocode to code. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 32
MATH Google Scholar
Parvez M R, Ahmad W U, Chakraborty S, et al. Retrieval augmented code generation and summarization. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 2719–2734
MATH Google Scholar
Iyer S, Konstas I, Cheung A, et al. Mapping language to code in programmatic context. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018. 1643–1652
MATH Google Scholar
Guo D, Tang D, Duan N, et al. Coupling retrieval and meta-learning for context-dependent semantic parsing. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019. 855–866
MATH Google Scholar
Li J, Li Y, Li G, et al. SkCoder: a sketch-based approach for automatic code generation. 2023. ArXiv:2302.06144
MATH Google Scholar
Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. 731–742
MATH Google Scholar
Shen S, Zhu X, Dong Y, et al. Incorporating domain knowledge through task augmentation for front-end JavaScript code generation. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022. 1533–1543
Chapter MATH Google Scholar
Sun Z, Zhu Q, Mou L, et al. A grammar-based structural CNN decoder for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 7055–7062
MATH Google Scholar
Sun Z, Zhu Q, Xiong Y, et al. TreeGen: a tree-based transformer architecture for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 8984–8991
MATH Google Scholar
Xie B, Su J, Ge Y, et al. Improving tree-structured decoder training for code generation via mutual learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 14121–14128
MATH Google Scholar
Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014. ArXiv:1412.3555
MATH Google Scholar
Liu F, Li G, Zhao Y, et al. Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2021. 473–485
MATH Google Scholar
Izadi M, Gismondi R, Gousios G. CodeFill: multi-token code completion by jointly learning from structure and naming sequences. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 401–412
Chapter MATH Google Scholar
Tang Z, Ge J, Liu S, et al. Domain adaptive code completion via language models and decoupled domain databases. 2023. ArXiv:2308.09313
Book MATH Google Scholar
Sun Z, Du X, Song F, et al. CodeMark: imperceptible watermarking for code datasets against neural code completion models. 2023. ArXiv:2308.14401
Google Scholar
Wang C, Hu J, Gao C, et al. Practitioners’ expectations on code completion. 2023. ArXiv:2301.03846
MATH Google Scholar
Nie P, Banerjee R, Li J J, et al. Learning deep semantics for test completion. 2023. ArXiv:2302.10166
Book MATH Google Scholar
Dahal S, Maharana A, Bansal M. Analysis of tree-structured architectures for code generation. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 4382–4391
MATH Google Scholar
Norouzi S, Tang K, Cao Y. Code generation from natural language with less prior knowledge and more monolingual data. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 776–785
MATH Google Scholar
Mastropaolo A, Pascarella L, Guglielmi E, et al. On the robustness of code generation techniques: an empirical study on GitHub copilot. 2023. ArXiv:2302.00438
MATH Google Scholar
Xu F F, Vasilescu B, Neubig G. In-IDE code generation from natural language: promise and challenges. ACM Trans Softw Eng Methodol, 2022, 31: 1–47
Google Scholar
Liang Q, Sun Z, Zhu Q, et al. Lyra: a benchmark for turducken-style code generation. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, 2022. 4238–4244
MATH Google Scholar
Hendrycks D, Basart S, Kadavath S, et al. Measuring coding challenge competence with APPS. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
MATH Google Scholar
Lu S, Guo D, Ren S, et al. CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
MATH Google Scholar
Shen X, Chen Z, Backes M, et al. In ChatGPT we trust? Measuring and characterizing the reliability of ChatGPT. 2023. ArXiv:2304.08979
MATH Google Scholar
Lukins S K, Kraft N A, Etzkorn L H. Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of the 15th Working Conference on Reverse Engineering, Antwerp, 2008. 155–164
MATH Google Scholar
Chatterjee S, Juvekar S, Sen K. SNIFF: a search engine for Java using free-form queries. In: Fundamental Approaches to Software Engineering. Berlin: Springer, 2009. 385–400
Chapter MATH Google Scholar
Hill E, Roldan-Vega M, Fails J A, et al. NL-based query refinement and contextualized code search results: a user study. In: Proceedings of IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, 2014. 34–43
Google Scholar
McMillan C, Grechanik M, Poshyvanyk D, et al. Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, 2011. 111–120
Chapter MATH Google Scholar
Li X, Wang Z, Wang Q, et al. Relationship-aware code search for JavaScript frameworks. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016. 690–701
MATH Google Scholar
Sachdev S, Li H, Luan S, et al. Retrieval on source code: a neural code search. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018. 31–41
Chapter MATH Google Scholar
Zou Y, Ling C, Lin Z, et al. Graph embedding based code search in software project. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, 2018. 1–10
MATH Google Scholar
Gu W, Li Z, Gao C, et al. Cradle: deep code retrieval based on semantic dependency learning. Neural Networks, 2021, 141: 385–394
Article MATH Google Scholar
Wan Y, Shu J, Sui Y, et al. Multi-modal attention network learning for semantic source code retrieval. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 13–25
MATH Google Scholar
Ling X, Wu L, Wang S, et al. Deep graph matching and searching for semantic code retrieval. ACM Trans Knowledge Discov Data, 2021, 15: 1–21
Article MATH Google Scholar
Liu S, Xie X, Ma L, et al. GraphSearchNET: enhancing GNNs via capturing global dependency for semantic code search. 2021. ArXiv:2111.02671
MATH Google Scholar
Li X, Gong Y, Shen Y, et al. CodeRetriever: unimodal and bimodal contrastive learning. 2022. ArXiv:2201.10866
MATH Google Scholar
Jiang H, Nie L, Sun Z, et al. ROSF: leveraging Information Retrieval and Supervised Learning for Recommending Code Snippets. IEEE Trans Serv Comput, 2019, 12: 34–46
Article MATH Google Scholar
Guo D, Ren S, Lu S, et al. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations, 2021
MATH Google Scholar
Guo D, Lu S, Duan N, et al. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 7212–7225
MATH Google Scholar
Shi Z, Xiong Y, Zhang X, et al. Cross-modal contrastive learning for code search. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 94–105
MATH Google Scholar
Bui N D Q, Yu Y, Jiang L. Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021. 511–521
Chapter MATH Google Scholar
Shi E, Wang Y, Gu W, et al. CoCoSoDa: effective contrastive learning for code search. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023. 2198–2210
MATH Google Scholar
Bajracharya S K, Ngo T C, Linstead E, et al. Sourcerer: a search engine for open source code supporting structure-based search. In: Proceedings of Companion to the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2006. 681–682
Google Scholar
Lu M, Sun X, Wang S, et al. Query expansion via WordNet for effective code search. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, 2015. 545–549
MATH Google Scholar
Lv F, Zhang H, Lou J, et al. CodeHow: effective code search based on API understanding and extended Boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015. 260–270
MATH Google Scholar
Rahman M M. Supporting code search with context-aware, analytics-driven, effective query reformulation. In: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, 2019. 226–229
MATH Google Scholar
Hill E, Pollock L L, Vijay-Shanker K. Improving source code search with natural language phrasal representations of method signatures. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, 2011. 524–527
MATH Google Scholar
Liu J, Kim S, Murali V, et al. Neural query expansion for code search. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2019. 29–37
Chapter MATH Google Scholar
Cao K, Chen C, Baltes S, et al. Automated query reformulation for efficient search based on query logs from stack overflow. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 1273–1285
MATH Google Scholar
Li D, Shen Y, Jin R, et al. Generation-augmented query expansion for code retrieval. 2022. arXiv:2212.10692
MATH Google Scholar
Luan S, Yang D, Barnaby C, et al. Aroma: code recommendation via structural code search. Proc ACM Program Lang, 2019, 3: 1–28
Article MATH Google Scholar
Mathew G, Stolee K T. Cross-language code search using static and dynamic analyses. In: Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, 2021. 205–217
MATH Google Scholar
Perez D, Chiba S. Cross-language clone detection by learning over abstract syntax trees. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 518–528
MATH Google Scholar
Nguyen T D, Nguyen A T, Phan H D, et al. Exploring API embedding for API usages and applications. In: Proceedings of the 39th International Conference on Software Engineering, 2017. 438–449
MATH Google Scholar
Chen B, Abedjan Z. Interactive cross-language code retrieval with auto-encoders. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 167–178
Google Scholar
Huang J, Tang D, Shou L, et al. CoSQA: 20,000+ web queries for code search and question answering. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 5690–5700
MATH Google Scholar
Khan M A M, Bari M S, Do X L, et al. xCodeEval: a large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. 2023. ArXiv:2303.03004
MATH Google Scholar
Wang C, Peng X, Xing Z C, et al. XCoS: explainable code search based on query scoping and knowledge graph. ACM Trans Softw Eng Methodol, 2023, 32: 1–28
MATH Google Scholar
Sun Z, Li L, Liu Y, et al. On the importance of building high-quality training datasets for neural code search. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, 2022. 1609–1620
MATH Google Scholar
Gotmare A D, Li J, Joty S R, et al. Cascaded fast and slow models for efficient semantic code search. 2021. ArXiv:2110.07811
MATH Google Scholar
Gu W, Wang Y, Du L, et al. Accelerating code search with deep hashing and code classification. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 2534–2544
MATH Google Scholar
Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015. 379–389
MATH Google Scholar
Alon U, Brody S, Levy O, et al. code2seq: generating sequences from structured representations of code. In: Proceedings of the 7th International Conference on Learning Representations, 2019
MATH Google Scholar
Xu K, Wu L, Wang Z, et al. Graph2Seq: graph to sequence learning with attention-based neural networks. 2018. ArXiv:1804.00823
MATH Google Scholar
Sridhara G, Hill E, Muppaneni D, et al. Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, 2010. 43–52
Chapter MATH Google Scholar
Abid N J, Dragan N, Collard M L, et al. Using stereotypes in the automatic generation of natural language summaries for C++ methods. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 2015. 561–565
MATH Google Scholar
Haiduc S, Aponte J, Moreno L, et al. On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering, 2010. 35–44
MATH Google Scholar
Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010. 223–226
MATH Google Scholar
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
MATH Google Scholar
Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33rd International Conference on Machine Learning, 2016. 2091–2100
MATH Google Scholar
Ahmad W U, Chakraborty S, Ray B, et al. A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. 4998–5007
Chapter MATH Google Scholar
Wang R, Zhang H, Lu G, et al. Fret: functional reinforced transformer with BERT for code summarization. IEEE Access, 2020, 8: 135591
Article MATH Google Scholar
Zhang J, Wang X, Zhang H, et al. Retrieval-based neural source code summarization. In: Proceedings of the 42nd International Conference on Software Engineering, Seoul, 2020. 1385–1397
MATH Google Scholar
LeClair A, Bansal A, McMillan C. Ensemble models for neural source code summarization of subroutines. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 2021. 286–297
MATH Google Scholar
Gong Z, Gao C, Wang Y, et al. Source code summarization with structural relative position guided transformer. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering, 2022. 13–24
MATH Google Scholar
Chen Q, Zhou M. A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 826–831
Chapter MATH Google Scholar
Jiang S, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017. 135–146
Google Scholar
Jiang S, McMillan C. Towards automatic generation of short summaries of commits. In: Proceedings of the 25th International Conference on Program Comprehension, 2017. 320–323
MATH Google Scholar
Jiang S. Boosting neural commit message generation with code semantic analysis. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 1280–1282
MATH Google Scholar
Liu Z, Xia X, Treude C, et al. Automatic generation of pull request descriptions. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 176–188
MATH Google Scholar
Bansal A, Haque S, McMillan C. Project-level encoding for neural source code summarization of subroutines. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 253–264
MATH Google Scholar
Xie R, Ye W, Sun J, et al. Exploiting method names to improve code summarization: a deliberation multi-task learning approach. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 138–148
MATH Google Scholar
Hu X, Li G, Xia X, et al. Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, 2018. 200–210
Chapter MATH Google Scholar
Hu X, Li G, Xia X, et al. Deep code comment generation with hybrid lexical and syntactical information. Empir Software Eng, 2020, 25: 2179–2217
Article MATH Google Scholar
Huang Y, Huang S, Chen H, et al. Towards automatically generating block comments for code snippets. Inf Software Tech, 2020, 127: 106373
Article MATH Google Scholar
Tang Z, Shen X, Li C, et al. AST-Trans: code summarization with efficient tree-structured attention. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, 2022. 150–162
MATH Google Scholar
Liu S, Gao C, Chen S, et al. ATOM: commit message generation based on abstract syntax tree and hybrid ranking. IEEE Trans Software Eng, 2022, 48: 1800–1817
Article MATH Google Scholar
Wan Y, Zhao Z, Yang M, et al. Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 397–407
Chapter MATH Google Scholar
LeClair A, Jiang S, McMillan C. A neural model for generating natural language summaries of program subroutines. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 795–806
MATH Google Scholar
Xu S, Yao Y, Xu F, et al. Commit message generation for source code changes. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019. 3975–3981
MATH Google Scholar
Zhou Y, Shen J, Zhang X, et al. Automatic source code summarization with graph attention networks. J Syst Softw, 2022, 188: 111257
Article MATH Google Scholar
Liang Y, Zhu K. Automatic generation of text descriptive comments for code blocks. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018
MATH Google Scholar
Wang W, Zhang Y, Zeng Z, et al. TranS³: a transformer-based framework for unifying code summarization and code search. 2020. ArXiv:2003.03238
MATH Google Scholar
Lin C, Ouyang Z, Zhuang J, et al. Improving code summarization with block-wise abstract syntax tree splitting. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 184–195
MATH Google Scholar
Shi E, Wang Y, Du L, et al. CAST: enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021. 4053–4062
MATH Google Scholar
Fernandes P, Allamanis M, Brockschmidt M. Structured neural summarization. In: Proceedings of the 7th International Conference on Learning Representations, 2019
MATH Google Scholar
LeClair A, Haque S, Wu L, et al. Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, Seoul, 2020. 184–195
Chapter Google Scholar
Liu S, Chen Y, Xie X, et al. Retrieval-augmented generation for code summarization via hybrid GNN. In: Proceedings of the 9th International Conference on Learning Representations, 2021
MATH Google Scholar
Liu X, Wang D, Wang A Y, et al. HAConvGNN: hierarchical attention based convolutional graph neural network for code documentation generation in Jupyter notebooks. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 4473–4485
MATH Google Scholar
Cheng W, Hu P, Wei S, et al. Keyword-guided abstractive code summarization via incorporating structural and contextual information. Inf Software Tech, 2022, 150: 106987
Article MATH Google Scholar
Guo J, Liu J, Wan Y, et al. Modeling hierarchical syntax structure with triplet position for source code summarization. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 486–500
MATH Google Scholar
Ma Z, Gao Y, Lyu L, et al. MMF3: neural code summarization based on multi-modal fine-grained feature fusion. In: Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Helsinki Finland, 2022. 171–182
MATH Google Scholar
Wang Y, Dong Y, Lu X, et al. GypSum: learning hybrid representations for code summarization. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022. 12–23
Chapter MATH Google Scholar
Hu X, Li G, Xia X, et al. Summarizing source code with transferred API knowledge. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. 2269–2275
MATH Google Scholar
Shahbazi R, Sharma R, Fard F H. API2Com: on the improvement of automatically generated code comments using API documentations. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 411–421
MATH Google Scholar
Gao X, Jiang X, Wu Q, et al. GT-SimNet: improving code automatic summarization via multi-modal similarity networks. J Syst Software, 2022, 194: 111495
Article MATH Google Scholar
Zhou Y, Yan X, Yang W, et al. Augmenting Java method comments generation with context information based on neural networks. J Syst Software, 2019, 156: 328–340
Article MATH Google Scholar
Wang W, Zhang Y, Sui Y, et al. Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Trans Software Eng, 2022, 48: 102–119
Article MATH Google Scholar
Wang Y, Du L, Shi E, et al. CoCoGUM: Contextual Code Summarization With Multi-Relational GNN on UMLs. Microsoft, Technical Report, MSR-TR-2020-16, 2020
MATH Google Scholar
Son J, Hahn J, Seo H, et al. Boosting code summarization by embedding code structures. In: Proceedings of the 29th International Conference on Computational Linguistics, 2022. 5966–5977
MATH Google Scholar
Zhang C, Zhou Q, Qiao M, et al. Re_Trans: combined retrieval and transformer model for source code summarization. Entropy, 2022, 24: 1372
Article MATH Google Scholar
Huang Y, Huang J, Chen X, et al. BCGen: a comment generation method for bytecode. Autom Softw Eng, 2023, 30: 5
Article MATH Google Scholar
Barone A V M, Sennrich R. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. In: Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017. 314–319
MATH Google Scholar
Guo H Y, Chen X P, Huang Y, et al. Snippet comment generation based on code context expansion. ACM Trans Softw Eng Methodol, 2024, 33: 1–30
MATH Google Scholar
Fowler M, Beck K, Brant J, et al. Refactoring: Improving the Design of Existing Code. Redding: Addison-Wesley Professional, 1999
MATH Google Scholar
Tsantalis N, Chatzigeorgiou A. Identification of move method refactoring opportunities. IEEE Trans Software Eng, 2009, 35: 347–367
Article MATH Google Scholar
Terra R, Valente M T, Miranda S, et al. JMove: a novel heuristic and tool to detect move method refactoring opportunities. J Syst Software, 2018, 138: 19–36
Article Google Scholar
Liu H, Xu Z, Zou Y. Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 385–396
Chapter MATH Google Scholar
Kurbatova Z, Veselov I, Golubev Y, et al. Recommendation of move method refactoring using path-based representation of code. In: Proceedings of the 4th International Workshop on Refactoring, 2020. 315–322
MATH Google Scholar
Sharma T, Efstathiou V, Louridas P, et al. Code smell detection by deep direct-learning and transfer-learning. J Syst Software, 2021, 176: 110936
Article MATH Google Scholar
Liu H, Jin J H, Xu Z F, et al. Deep learning based code smell detection. IEEE Trans Software Eng, 2021, 47: 1811–1837
MATH Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
Article MATH Google Scholar
Wang X, Zhao Y, Pourpanah F. Recent advances in deep learning. Int J Mach Learn Cyber, 2020, 11: 747–750
Article MATH Google Scholar
Barbez A, Khomh F, Guéhéneuc Y G. Deep learning anti-patterns from code metrics history. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2019. 114–124
Google Scholar
Yu D, Xu Y, Weng L, et al. Detecting and refactoring feature envybased on graph neural network. In: Proceedings of the 33rd International Symposium on Software Reliability Engineering, 2022. 458–469
MATH Google Scholar
Alon U, Zilberstein M, Levy O, et al. Code2vec: learning distributed representations of code. In: Proceedings of the ACM on Programming Languages, 2019. 1–29
MATH Google Scholar
Cui D, Wang S, Luo Y, et al. RMove: recommending move method refactoring opportunities using structural and semantic representations of code. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2022. 281–292
MATH Google Scholar
Yedida R, Menzies T. On the value of oversampling for deep learning in software defect prediction. IEEE Trans Software Eng, 2022, 48: 3103–3116
Article MATH Google Scholar
Yedida R, Menzies T. How to improve deep learning for software analytics: (a case study with code smell detection). In: Proceedings of the 19th International Conference on Mining Software Repositories, 2022. 156–166
Chapter MATH Google Scholar
Liu H, Liu Q, Liu Y, et al. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Software Eng, 2015, 41: 887–900
Article MATH Google Scholar
Liang J, Zou W, Zhang J, et al. A deep method renaming prediction and refinement approach for Java projects. In: Proceedings of the 21st International Conference on Software Quality, Reliability and Security), 2021. 404–413
MATH Google Scholar
Kenton J D M W C, Toutanova L K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 4171–4186
MATH Google Scholar
Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation, 2017. 502–518
MATH Google Scholar
Liu K, Kim D, Bissyandé T F, et al. Learning to spot and refactor inconsistent method names. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 1–12
Google Scholar
Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, 2014. 1188–1196
MATH Google Scholar
Tufano M, Pantiuchina J, Watson C, et al. On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 25–36
MATH Google Scholar
Nyamawe A S, Liu H, Niu N, et al. Feature requests-based recommendation of software refactorings. Empir Software Eng, 2020, 25: 4315–4347
Article MATH Google Scholar
AlOmar E A, Ivanov A, Kurbatova Z, et al. Just-in-time code duplicates extraction. Inf Software Tech, 2023, 158: 107169
Article Google Scholar
Chi X Y, Liu H, Li G J, et al. An automated approach to extracting local variables. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, 2023
MATH Google Scholar
Desai U, Bandyopadhyay S, Tamilselvam S. Graph neural network to dilute outliers for refactoring monolith application. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 72–80
MATH Google Scholar
Madeyski L, Lewowski T. MLCQ: industry-relevant code smell data set. In: Proceedings of the 24th Evaluation and Assessment in Software Engineering, 2020. 342–347
Chapter MATH Google Scholar
Liu B, Liu H, Li G J, et al. Deep learning based feature envy detection boosted by real-world examples. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, 2023
MATH Google Scholar
Tsantalis N, Ketkar A, Dig D. RefactoringMiner 2.0. IEEE Trans Software Eng, 2022, 48: 930–950
Article Google Scholar
Silva D, da Silva J P, Santos G, et al. RefDiff 2.0: a multi-language refactoring detection tool. IEEE Trans Software Eng, 2021, 47: 2786–2802
Article MATH Google Scholar
Kim M, Gee M, Loh A, et al. Ref-Finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Santa Fe, 2010. 371–372
Google Scholar
Yin X, Shi C, Zhao S. Local and global feature based explainable feature envy detection. In: Proceedings of the IEEE 45th Annual Computers, Software, and Applications Conference, 2021. 942–951
MATH Google Scholar
Liu B, Liu H, Li G J, et al. Automated software entity matching between successive versions. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023
MATH Google Scholar
Svajlenko J, Islam J F, Keivanloo I, et al. Towards a big data curated benchmark of inter-project code clones. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2014. 476–480
Google Scholar
Chochlov M, Ahmed G A, Patten J V, et al. Using a nearest-neighbour, BERT-based approach for scalable clone detection. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 582–591
MATH Google Scholar
Sajnani H, Saini V, Svajlenko J, et al. SourcererCC: scaling code clone detection to big-code. In: Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE), 2016. 1157–1168
Google Scholar
Arshad S, Abid S, Shamail S. CodeBERT for code clone detection: a replication study. In: Proceedings of the IEEE 16th International Workshop on Software Clones (IWSC), 2022. 39–45
MATH Google Scholar
Mehrotra N, Agarwal N, Gupta P, et al. Modeling functional similarity in source code with graph-based siamese networks. IEEE Trans Software Eng, 2022, 48: 3771–3789
Article MATH Google Scholar
Xue Z, Jiang Z, Huang C, et al. SEED: semantic graph based deep detection for Type-4 clone. In: Proceedings of Reuse and Software Quality, 2022. 120–137
Chapter MATH Google Scholar
Karthik S, Rajdeepa B. A collaborative method for code clone detection using a deep learning model. Adv Eng Software, 2022, 174: 103327
Article MATH Google Scholar
Li B, Ye C, Guan S, et al. Semantic code clone detection via event embedding tree and gat network. In: Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020. 382–393
MATH Google Scholar
Zhang A, Liu K, Fang L, et al. Learn to align: a code alignment network for code clone detection. In: Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC), 2021. 1–11
MATH Google Scholar
Jo Y B, Lee J, Yoo C J. Two-pass technique for clone detection and type classification using tree-based convolution neural network. Appl Sci, 2021, 11: 6613
Article MATH Google Scholar
Kim D K. A deep neural network-based approach to finding similar code segments. IEICE Trans Inf Syst, 2020, E103.D: 874–878
Article MATH Google Scholar
Wu Y, Zou D, Dou S, et al. SCDetector: software functional clone detection based on semantic tokens analysis. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020. 821–833
Chapter Google Scholar
Feng C, Wang T, Yu Y, et al. Sia-RAE: a siamese network based on recursive AutoEncoder for effective clone detection. In: Proceedings of the 27th Asia-Pacific Software Engineering Conference (APSEC), 2020. 238–246
MATH Google Scholar
Yuan Y, Kong W, Hou G, et al. From local to global semantic clone detection. In: Proceedings of the 6th International Conference on Dependable Systems and Their Applications (DSA), 2020. 13–24
MATH Google Scholar
Hua W, Sui Y, Wan Y, et al. FCCA: hybrid code representation for functional clone detection using attention networks. IEEE Trans Rel, 2021, 70: 304–318
Article MATH Google Scholar
Wang W, Li G, Ma B, et al. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2020. 261–271
MATH Google Scholar
Fang C, Liu Z, Shi Y, et al. Functional code clone detection with syntax and semantics fusion learning. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 516–527
Chapter MATH Google Scholar
Guo C, Yang H, Huang D, et al. Review sharing via deep semi-supervised code clone detection. IEEE Access, 2020, 8: 24948–24965
Article Google Scholar
Meng Y, Liu L. A deep learning approach for a source code detection model using self-attention. Complexity, 2020, 2020: 1–15
MATH Google Scholar
Zeng J, Ben K, Li X, et al. Fast code clone detection based on weighted recursive autoencoders. IEEE Access, 2019, 7: 125062
Article MATH Google Scholar
Zhang Y Y, Li M. Find me if you can: deep software clone detection by exploiting the contest between the plagiarist and the detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 33: 5813–5820
Article MATH Google Scholar
Büch L, Andrzejak A. Learning-based recursive aggregation of abstract syntax trees for code clone detection. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 95–104
MATH Google Scholar
Yu H, Lam W, Chen L, et al. Neural detection of semantic code clones via tree-based convolution. In: Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 2019. 70–80
MATH Google Scholar
Wang C, Gao J, Jiang Y, et al. Go-clone: graph-embedding based clone detector for Golang. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 374–377
Chapter MATH Google Scholar
Shi H, Wang R, Fu Y, et al. Vulnerable code clone detection for operating system through correlation-induced learning. IEEE Trans Ind Inf, 2019, 15: 6551–6559
Article MATH Google Scholar
Saini V, Farmahinifarahani F, Lu Y, et al. Oreo: detection of clones in the twilight zone. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 354–365
MATH Google Scholar
Zhao G, Huang J. DeepSim: deep learning code functional similarity. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 141–151
MATH Google Scholar
Sheneamer A. CCDLC detection framework-combining clustering with deep learning classification for semantic clones. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018. 701–706
MATH Google Scholar
Wei H H, Li M. Positive and unlabeled learning for detecting software functional clones with adversarial training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. 2840–2846
MATH Google Scholar
Wei H H, Li M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 3034–3040
MATH Google Scholar
White M, Tufano M, Vendome C, et al. Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 87–98
Chapter Google Scholar
Sheneamer A, Kalita J. Semantic clone detection using machine learning. In: Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016. 1024–1028
MATH Google Scholar
Zhang J, Wang X, Zhang H, et al. A novel neural source code representation based on abstract syntax tree. In: Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019. 783–794
MATH Google Scholar
Wu M, Wang P, Yin K, et al. LVMapper: a large-variance clone detector using sequencing alignment approach. IEEE Access, 2020, 8: 27986–27997
Article Google Scholar
Li L, Feng H, Zhuang W, et al. CCLearner: a deep learning-based clone detection approach. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 249–260
MATH Google Scholar
Jiang L, Misherghi G, Su Z, et al. DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, 2007. 96–105
MATH Google Scholar
Svajlenko J, Roy C K. Fast and flexible large-scale clone detection with cloneworks. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017. 27–30
MATH Google Scholar
Roy C K, Cordy J R. NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: Proceedings of the 16th IEEE International Conference on Program Comprehension, 2008. 172–181
MATH Google Scholar
Kim S, Woo S, Lee H, et al. VUDDY: a scalable approach for vulnerable code clone discovery. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2017. 595–614
MATH Google Scholar
Wang D, Jia Z, Li S, et al. Bridging pre-trained models and downstream tasks for source code understanding. In: Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 287–298
MATH Google Scholar
Siow J K, Liu S, Xie X, et al. Learning program semantics with code representations: an empirical study. In: Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022. 554–565
MATH Google Scholar
Karakatič S, Miloševič A, Heričko T. Software system comparison with semantic source code embeddings. Empir Software Eng, 2022, 27: 70
Article MATH Google Scholar
Bui N D Q, Yu Y, Jiang L. InferCode: self-supervised learning of code representations by predicting subtrees. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 1186–1197
MATH Google Scholar
Wu Q, Jiang X, Zheng Z, et al. Code representation based on hybrid graph modelling. In: Proceedings of Neural Information Processing. Cham: Springer International Publishing, 2021. 298–306
Chapter MATH Google Scholar
Chen L, Ye W, Zhang S. Capturing source code semantics via tree-based convolution over API-enhanced AST. In: Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019. 174–182
Chapter MATH Google Scholar
Gao Y, Wang Z, Liu S, et al. TECCD: a tree embedding approach for code clone detection. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019. 145–156
MATH Google Scholar
Tufano M, Watson C, Bavota G, et al. Deep learning similarities from different representations of source code. In: Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), 2018. 542–553
Chapter Google Scholar
Mou L, Li G, Zhang L, et al. Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016. 1287–1293
MATH Google Scholar
Wang P, Svajlenko J, Wu Y, et al. CCAligner: a token based large-gap clone detector. In: Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018. 1066–1077
Chapter MATH Google Scholar
Terra R, Miranda L F, Valente M T, et al. Qualitas.class corpus: a compiled version of the qualitas corpus. SIGSOFT Softw Eng Notes, 2013, 38: 1–4
Article MATH Google Scholar
Yahya M A, Kim D K. CLCD-I: cross-language clone detection by using deep learning with InferCode. Computers, 2023, 12: 12
Article Google Scholar
Wang K, Yan M, Zhang H, et al. Unified abstract syntax tree representation learning for cross-language program classification. In: Proceedings of the IEEE/ACM 30th International Conference on Program Comprehension (ICPC), 2022. 390–400
MATH Google Scholar
Bui N D Q, Yu Y, Jiang L. Bilateral dependency neural networks for cross-language algorithm classification. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 422–433
MATH Google Scholar
Nafi K W, Kar T S, Roy B, et al. CLCDSA: cross language code clone detection using syntactical features and API documentation. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019. 1026–1037
MATH Google Scholar
Bromley J, Guyon I, LeCun Y, et al. Signature verification using a “Siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, San Francisco, 1993. 737–744
Google Scholar
Vislavski T, Rakić G, Cardozo N, et al. LICCA: a tool for cross-language clone detection. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018. 512–516
Google Scholar
Cheng X, Peng Z, Jiang L, et al. Mining revision histories to detect cross-language clones without intermediates. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), 2016. 696–701
Chapter MATH Google Scholar
Marastoni N, Giacobazzi R, Preda M D. A deep learning approach to program similarity. In: Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, 2018. 26–35
MATH Google Scholar
Xue H, Venkataramani G, Lan T. Clone-Slicer: detecting domain specific binary code clones through program slicing. In: Proceedings of the Workshop on Forming an Ecosystem Around Software Transformation, 2018. 27–33
Chapter MATH Google Scholar
Xu X, Liu C, Feng Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017. 363–376
MATH Google Scholar
Xue H, Venkataramani G, Lan T. Clone-hunter: accelerated bound checks elimination via binary code clone detection. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018. 11–19
Chapter MATH Google Scholar
Feng Q, Zhou R, Xu C, et al. Scalable graph-based bug search for firmware images. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2016. 480–491
MATH Google Scholar
Mostaeen G, Svajlenko J, Roy B, et al. On the use of machine learning techniques towards the design of cloud based automatic code clone validation tools. In: Proceedings of the IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2018. 155–164
Google Scholar
Saini V, Farmahinifarahani F, Lu Y, et al. Towards automating precision studies of clone detectors. In: Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019. 49–59
Google Scholar
Liu C, Lin Z, Lou J G, et al. Can neural clone detection generalize to unseen functionalities? In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 617–629
MATH Google Scholar
Yu H, Hu X, Li G, et al. Assessing and improving an evaluation dataset for detecting semantic code clones via deep learning. ACM Trans Softw Eng Methodol, 2022, 31: 1–25
Article MATH Google Scholar
Krinke J, Ragkhitwetsagul C. Bigclonebench considered harmful for machine learning. In: Proceedings of the IEEE 16th International Workshop on Software Clones (IWSC), 2022. 1–7
Google Scholar
Al-Omari F, Roy C K, Chen T. SemanticCloneBench: a semantic code clone benchmark using crowd-source knowledge. In: Proceedings of the IEEE 14th International Workshop on Software Clones (IWSC), 2020. 57–63
MATH Google Scholar
Kamp M, Kreutzer P, Philippsen M. SeSaMe: a data set of semantically similar Java methods. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 529–533
MATH Google Scholar
Yang X, Lo D, Xia X, et al. Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security, 2015. 17–26
MATH Google Scholar
Phan A V, Nguyen M L, Bui L T. Convolutional neural networks over control flow graphs for software defect prediction. In: Proceedings of the IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), 2017. 45–52
MATH Google Scholar
Li J, He P, Zhu J, et al. Software defect prediction via convolutional neural network. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS), 2017. 318–328
MATH Google Scholar
Huo X, Yang Y, Li M, et al. Learning semantic features for software defect prediction by code comments embedding. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2018. 1049–1054
MATH Google Scholar
Liu Y, Li Y, Guo J, et al. Connecting software metrics across versions to predict defects. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018. 232–243
MATH Google Scholar
Tong H, Liu B, Wang S. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Software Tech, 2018, 96: 94–111
Article MATH Google Scholar
Qiu S, Lu L, Cai Z, et al. Cross-project defect prediction via transferable deep learning-generated and handcrafted features. In: Proceedings of International Conference on Software Engineering and Knowledge Engineering, 2019
MATH Google Scholar
Hoang T, Dam H K, Kamei Y, et al. DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 34–45
MATH Google Scholar
Zhou T, Sun X, Xia X, et al. Improving defect prediction with deep forest. Inf Software Tech, 2019, 114: 204–216
Article MATH Google Scholar
Xu Z, Li S, Xu J, et al. LDFR: learning deep feature representation for software defect prediction. J Syst Software, 2019, 158: 110402
Article MATH Google Scholar
Turabieh H, Mafarja M, Li X. Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl, 2019, 122: 27–42
Article MATH Google Scholar
Dam H K, Pham T, Ng S W, et al. Lessons learned from using a deep tree-based model for software defect prediction in practice. In: Proceedings of the 16th International Conference on Mining Software Repositories, 2019. 46–57
MATH Google Scholar
Li H, Li X, Chen X, et al. Cross-project defect prediction via AST Token2Vec and BLSTM-based neural network. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2019. 1–8
MATH Google Scholar
Chen J, Hu K, Yu Y, et al. Software visualization and deep transfer learning for effective software defect prediction. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 578–589
Chapter MATH Google Scholar
Zhu K, Zhang N, Ying S, et al. Within-project and cross-project just-in-time defect prediction based on denoising autoencoder and convolutional neural network. IET Softw, 2020, 14: 185–195
Article MATH Google Scholar
Wang S, Liu T, Nam J, et al. Deep semantic feature learning for software defect prediction. IEEE Trans Software Eng, 2020, 46: 1267–1293
Article MATH Google Scholar
Deng J, Lu L, Qiu S. Software defect prediction via LSTM. IET softw, 2020, 14: 443–450
Article MATH Google Scholar
Shi K, Lu Y, Chang J, et al. PathPair2Vec: an AST path pair-based code representation method for defect prediction. J Comput Languages, 2020, 59: 100979
Article Google Scholar
Majd A, Vahidi-Asl M, Khalilian A, et al. SLDeep: statement-level software defect prediction using deep-learning model on static code features. Expert Syst Appl, 2020, 147: 113156
Article MATH Google Scholar
Wen M, Wu R, Cheung S C. How well do change sequences predict defects? Sequence learning from software changes. IEEE Trans Software Eng, 2018, 46: 1155–1175
Article MATH Google Scholar
Shi K, Lu Y, Liu G, et al. MPT-embedding: an unsupervised representation learning of code for software defect prediction. J Software Evolu Process, 2021, 33: e2330
Article MATH Google Scholar
Xu Z, Zhao K, Zhang T, et al. Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans Rel, 2022, 71: 204–220
Article MATH Google Scholar
Xu J, Wang F, Ai J. Defect prediction with semantics and context features of codes based on graph representation learning. IEEE Trans Rel, 2020, 70: 613–625
Article MATH Google Scholar
Zeng C, Zhou C Y, Lv S K, et al. GCN2defect: graph convolutional networks for SMOTETomek-based software defect prediction. In: Proceedings of the IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 2021. 69–79
MATH Google Scholar
Xu J, Ai J, Liu J, et al. ACGDP: an augmented code graph-based system for software defect prediction. IEEE Trans Rel, 2022, 71: 850–864
Article MATH Google Scholar
Wang H, Zhuang W, Zhang X. Software defect prediction based on gated hierarchical LSTMs. IEEE Trans Rel, 2021, 70: 711–727
Article MATH Google Scholar
Zou Q, Lu L, Yang Z, et al. Joint feature representation learning and progressive distribution matching for cross-project defect prediction. Inf Software Tech, 2021, 137: 106588
Article MATH Google Scholar
Zhang N, Ying S, Zhu K, et al. Software defect prediction based on stacked sparse denoising autoencoders and enhanced extreme learning machine. IET Software, 2022, 16: 29–47
Article MATH Google Scholar
Uddin M N, Li B, Ali Z, et al. Software defect prediction employing BiLSTM and BERT-based semantic feature. Soft Comput, 2022, 26: 7877–7891
Article MATH Google Scholar
Ardimento P, Aversano L, Bernardi M L, et al. Just-in-time software defect prediction using deep temporal convolutional networks. Neural Comput Applic, 2022, 34: 3981–4001
Article Google Scholar
Pornprasit C, Tantithamthavorn C K. DeepLineDP: towards a deep learning approach for line-level defect prediction. IEEE Trans Software Eng, 2023, 49: 84–98
Article Google Scholar
Qiu S, Huang H, Jiang W, et al. Defect prediction via tree-based encoding with hybrid granularity for software sustainability. IEEE Trans Sustain Comput, 2024, 9: 249–260
Article MATH Google Scholar
Johnson S C. Lint, a C program checker. 1977. oai:CiteSeerX.psu:10.1.1.56.1841
MATH Google Scholar
Hovemeyer D, Pugh W. Finding bugs is easy. ACM SIGPLAN Not, 2004, 39: 92–106
Article Google Scholar
Facebook. Infer: a tool to detect bugs in Java and C/C++/objective-C code before it ships, 2015. https://fbinfer.com/
Google Scholar
Orso A, Rothermel G. Software testing: a research travelogue (2000–2014). In: Proceedings of Future of Software Engineering Proceedings, 2014
MATH Google Scholar
Cadar C, Dunbar D, Engler D R, et al. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 2008
MATH Google Scholar
Nelson L, Sigurbjarnarson H, Zhang K, et al. Hyperkernel: push-button verification of an OS kernel. In: Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017
Google Scholar
Leroy X. Formal verification of a realistic compiler. Commun ACM, 2009, 52: 107–115
Article MATH Google Scholar
Klein G, Andronick J, Elphinstone K, et al. seL4: formal verification of an OS kernel. Commun ACM, 2010, 53: 107–115
Article MATH Google Scholar
D’Silva V, Kroening D, Weissenbacher G. A survey of automated techniques for formal software verification. IEEE Trans Comput-Aided Des Integr Circ Syst, 2008, 27: 1165–1178
Article MATH Google Scholar
Knuth D E. The Art of Computer Programming, Volume 1: Fundamental Algorithms. 3rd ed. Redding: Addison-Wesley Professional, 1997
MATH Google Scholar
Hou X, Zhao Y, Liu Y, et al. Large language models for software engineering: a systematic literature review. 2023. ArXiv:2308.10620
MATH Google Scholar
Fan A, Gokkaya B, Harman M, et al. Large language models for software engineering: survey and open problems. 2023. ArXiv:2310.03533
MATH Google Scholar
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489
Article MATH Google Scholar
Qiao S, Ou Y, Zhang N, et al. Reasoning with language model prompting: a survey. 2022. ArXiv:2212.09597
MATH Google Scholar
Huang J, Chang K C C. Towards reasoning in large language models: a survey. 2022. ArXiv:2212.10403
MATH Google Scholar
Abelson H, Sussman G J. Structure and Interpretation of Computer Programs. 2nd ed. Cambridge: The MIT Press, 1996
MATH Google Scholar
Hindle A, Barr E T, Gabel M, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), 2016
MATH Google Scholar
van Rossum G, Warsaw B, Coghlan N. PEP 8–style guide for python code. 2001. https://peps.python.org/pep-0008/
MATH Google Scholar
Reddy A. Java coding style guide, 2000
MATH Google Scholar
Engler D, Chen D Y, Hallem S, et al. Bugs as deviant behavior: a general approach to inferring errors in systems code. SIGOPS Oper Syst Rev, 2001, 35: 57–72
Article MATH Google Scholar
Li Z, Lu S, Myagmar S, et al. CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Software Eng, 2006, 32: 176–192
Article MATH Google Scholar
Allamanis M, Jackson-Flux H, Brockschmidt M. Self-supervised bug detection and repair. In: Proceedings of Advances in Neural Information Processing Systems, 2021. 27865–27876
Google Scholar
Sharma T, Kechagia M, Georgiou S, et al. A survey on machine learning techniques for source code analysis. 2021. ArXiv:2110.09610v2
MATH Google Scholar
Jiang Y, Liu H, Zhang Y, et al. Do bugs lead to unnaturalness of source code? In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022
MATH Google Scholar
Rice H G. Classes of recursively enumerable sets and their decision problems. Trans Amer Math Soc, 1953, 74: 358–366
Article MathSciNet MATH Google Scholar
Livshits B, Sridharan M, Smaragdakis Y, et al. In defense of soundiness: a manifesto. Commun ACM, 2015, 58: 44–46
Article MATH Google Scholar
Heo K, Oh H, Yang H. Resource-aware program analysis via online abstraction coarsening. In: Proceedings of IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
MATH Google Scholar
Ko Y, Oh H. Learning to boost disjunctive static bug-finders. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Li H, Hao Y, Zhai Y, et al. The hitchhiker’s guide to program analysis: a journey with large language models. 2023. ArXiv:2308.00245
MATH Google Scholar
Chae K, Oh H, Heo K, et al. Automatically generating features for learning program analysis heuristics for C-like languages. In: Proceedings of the ACM on Programming Languages, 2017
MATH Google Scholar
Heo K, Oh H, Yi K. Machine-learning-guided selectively unsound static analysis. In: Proceedings of IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017
MATH Google Scholar
Jeon M, Lee M, Oh H. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features. In: Proceedings of the ACM on Programming Languages, 2020
MATH Google Scholar
Jeong S, Jeon M, Cha S, et al. Data-driven context-sensitivity for points-to analysis. In: Proceedings of the ACM on Programming Languages, 2017
MATH Google Scholar
He J, Singh G, Püschel M, et al. Learning fast and precise numerical analysis. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, 2020
MATH Google Scholar
Zaremba W, Sutskever I. Learning to execute. 2014. ArXiv:1410.4615
MATH Google Scholar
Malik R S, Patra J, Pradel M. NL2Type: inferring JavaScript function types from natural language information. In: Proceedings of IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
MATH Google Scholar
Jesse K, Devanbu P T, Ahmed T. Learning type annotation: is big data enough? In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021
MATH Google Scholar
Yu D, Yang B, Liu D, et al. A survey on neural-symbolic learning systems. Neural Netws, 2023, 166: 105–126
Article MATH Google Scholar
Wang W, Yang Y, Wu F. Towards data-and knowledge-driven AI: a survey on neuro-symbolic computing. IEEE Trans Pattern Anal Mach Intell, 2024. doi: https://doi.org/10.1109/TPAMI.2024.3483273
She D, Pei K, Epstein D, et al. NEUZZ: efficient fuzzing with neural program smoothing. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2019. 803–817
Google Scholar
She D, Krishna R, Yan L, et al. MTFuzz: fuzzing with a multi-task neural network. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020. 737–749
Chapter MATH Google Scholar
Wu M, Jiang L, Xiang J, et al. Evaluating and improving neural program-smoothing-based fuzzing. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 847–858
Chapter MATH Google Scholar
Nicolae M I, Eisele M, Zeller A. Revisiting neural program smoothing for fuzzing. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
MATH Google Scholar
Zeller A. Mining specifications: a roadmap. In: The Future of Software Engineering. Berlin: Springer, 2011
MATH Google Scholar
Serebryany K, Bruening D, Potapenko A, et al. AddressSanitizer: a fast address sanity checker. In: Proceedings of USENIX Annual Technical Conference, 2012
MATH Google Scholar
Serebryany K, Iskhodzhanov T. ThreadSanitizer: data race detection in practice. In: Proceedings of the Workshop on Binary Instrumentation and Applications, 2009. 62–71
Chapter Google Scholar
Jackson D. Software Abstractions: Logic, Language, and Analysis. Cambridge: The MIT Press, 2012
MATH Google Scholar
Lemieux C, Inala J P, Lahiri S K, et al. CODAMOSA: escaping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Google Scholar
Khanfir A, Degiovanni R, Papadakis M, et al. Efficient mutation testing via pre-trained language models. 2023. ArXiv:2301.03543v1
MATH Google Scholar
Chen Z, Liu J, Gu W, et al. Experience report: deep learning-based system log analysis for anomaly detection. 2021. ArXiv:2107.05908
MATH Google Scholar
Wang J, Huang Y, Chen C, et al. Software testing with large language model: survey, landscape, and vision. 2023. ArXiv:2307.07221
MATH Google Scholar
Durelli V H S, Durelli R S, Borges S S, et al. Machine learning applied to software testing: a systematic mapping study. IEEE Trans Rel, 2019, 68: 1189–1212
Article MATH Google Scholar
Tufano M, Drain D, Svyatkovskiy A, et al. Unit test case generation with transformers and focal context. 2020. ArXiv:2009.05617v2
MATH Google Scholar
Watson C, Tufano M, Moran K, et al. On learning meaningful assert statements for unit test cases. In: Proceedings of IEEE/ACM 42nd International Conference on Software Engineering (ICSE), 2020
MATH Google Scholar
Tufano M, Drain D, Svyatkovskiy A, et al. Generating accurate assert statements for unit test cases using pretrained transformers. 2022. ArXiv:2009.05634
Book MATH Google Scholar
Blasi A, Gorla A, Ernst M D, et al. Call Me Maybe: using NLP to automatically generate unit test cases respecting temporal constraints. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
MATH Google Scholar
Dinella E, Ryan G, Mytkowicz T, et al. TOGA: a neural method for test Oracle generation. 2022. ArXiv:2109.09262
Google Scholar
Xie Z, Chen Y, Zhi C, et al. ChatUniTest: a ChatGPT-based automated unit test generation tool. 2023. ArXiv:2305.04764
MATH Google Scholar
Alagarsamy S, Tantithamthavorn C, Aleti A. A3Test: assertion-augmented automated test case generation. 2023. ArXiv:2302.10352
Google Scholar
Feldmeier P, Fraser G. Neuroevolution-based generation of tests and oracles for games. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
MATH Google Scholar
Schäfer M, Nadi S, Eghbali A, et al. Adaptive test generation using a large language model. 2023. ArXiv:2302.06527
MATH Google Scholar
Siddiq M L, Santos J, Tanvir R H, et al. Exploring the effectiveness of large language models in generating unit tests. 2023. ArXiv:2305.00418v1
MATH Google Scholar
Hossain S B, Filieri A, Dwyer M B, et al. Neural-based test oracle generation: a large-scale evaluation and lessons learned. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
MATH Google Scholar
Liu Z, Liu K, Xia X, et al. Towards more realistic evaluation for neural test oracle generation. 2023. ArXiv:2305.17047
Book MATH Google Scholar
Yuan Z, Lou Y, Liu M, et al. No more manual tests? Evaluating and improving ChatGPT for unit test generation. 2023. ArXiv:2305.04207
MATH Google Scholar
Wong W E, Horgan J R, London S, et al. A study of effective regression testing in practice. In: Proceedings of the 8th International Symposium on Software Reliability Engineering, 1997
MATH Google Scholar
Yoo S, Harman M. Regression testing minimization, selection and prioritization: a survey. Softw Test Verif Reliab, 2012, 22: 67–120
Article MATH Google Scholar
Manes V J M, Han H S, Han C, et al. The art, science, and engineering of fuzzing: a survey. IEEE Trans Software Eng, 2021, 47: 2312–2331
Article MATH Google Scholar
Zhu X, Wen S, Camtepe S, et al. Fuzzing: a survey for roadmap. ACM Comput Surv, 2022, 54: 1–36
Article MATH Google Scholar
Li J, Zhao B, Zhang C. Fuzzing: a survey. Cybersecurity, 2018, 1: 6
Article MATH Google Scholar
Lee M, Cha S, Oh H. Learning seed-adaptive mutation strategies for greybox fuzzing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Wang J, Song C, Yin H. Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing. In: Proceedings of Network and Distributed Systems Security (NDSS) Symposium, 2021
MATH Google Scholar
Wang Y, Wu Z, Wei Q, et al. NeuFuzz: efficient fuzzing with deep neural network. IEEE Access, 2019, 7: 36340–36352
Article Google Scholar
Deng Y, Xia C S, Peng H, et al. Large language models are zero-shot fuzzers: fuzzing deep-learning libraries via large language models. 2023. ArXiv:2212.14834
MATH Google Scholar
Deng Y, Xia C S, Yang C, et al. Large language models are edge-case fuzzers: testing deep learning libraries via FuzzGPT. 2023. ArXiv:2304.02014
MATH Google Scholar
Yang C, Deng Y, Lu R, et al. White-box compiler fuzzing empowered by large language models. 2023. ArXiv:2310.15991
MATH Google Scholar
Xia C S, Paltenghi M, Tian J L, et al. Universal fuzzing via large language models. 2023. ArXiv:2308.04748v1
MATH Google Scholar
Ye G, Tang Z, Tan S H, et al. Automated conformance testing for JavaScript engines via deep compiler fuzzing. In: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021. 435–450
Chapter MATH Google Scholar
Cummins C, Petoumenos P, Murray A, et al. Compiler fuzzing through deep learning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018
MATH Google Scholar
Lin M, Zeng Y, Li Y. RegFuzz: a linear regression-based approach for seed scheduling in directed fuzzing. In: Proceedings of the 4th Information Communication Technologies Conference (ICTC), 2023
MATH Google Scholar
Meng R, Mirchev M, Böhme M, et al. Large language model guided protocol fuzzing. In: Proceedings of Network and Distributed System Security (NDSS) Symposium, 2024
MATH Google Scholar
Su J, Dai H N, Zhao L, et al. Effectively generating vulnerable transaction sequences in smart contracts with reinforcement learning-guided fuzzing. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
MATH Google Scholar
Luo W, Chai D, Ruan X, et al. Graph-based fuzz testing for deep learning inference engines. In: Proceedings of IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021
MATH Google Scholar
Chen Y, Poskitt C M, Sun J, et al. Learning-guided network fuzzing for testing cyber-physical system defences. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019
MATH Google Scholar
Jiang L, Yuan H, Wu M, et al. Evaluating and improving hybrid fuzzing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
He J, Balunović M, Ambroladze N, et al. Learning to fuzz from symbolic execution with application to smart contracts. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2019. 531–548
MATH Google Scholar
Jia H, Wen M, Xie Z, et al. Detecting JVM JIT compiler bugs via exploring two-dimensional input spaces. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Zheng Y, Liu Y, Xie X, et al. Automatic web testing using curiosity-driven reinforcement learning. In: Proceedings of the 43rd International Conference on Software Engineering, 2021. 423–435
MATH Google Scholar
Zhang S, Liu S, Sun J, et al. FIGCPS: effective failure-inducing input generation for cyber-physical systems with deep reinforcement learning. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021
MATH Google Scholar
Liu Z, Chen C, Wang J, et al. Fill in the blank: context-aware automated text input generation for mobile GUI testing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Google Scholar
YazdaniBanafsheDaragh F, Malek S. Deep GUI: black-box GUI input generation with deep learning. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021
MATH Google Scholar
Feng S, Xie M, Chen C. Efficiency matters: speeding up automated testing with GUI rendering inference. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Ran D, Wang H, Wang W, et al. Badge: prioritizing UI events with hierarchical multi-armed bandits for automated UI testing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Pan M, Huang A, Wang G, et al. Reinforcement learning based curiosity-driven testing of Android applications. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 153–164
Chapter MATH Google Scholar
Zhao Y, Talebipour S, Baral K, et al. Avgust: automating usage-based test generation from videos of app executions. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022
MATH Google Scholar
Wang X, Zhao L. APICAD: augmenting API misuse detection through specifications from code and documents. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
MATH Google Scholar
Kim M, Corradini D, Sinha S, et al. Enhancing REST API testing with NLP techniques. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023
Google Scholar
Kim M, Sinha S, Orso A. Adaptive REST API testing with reinforcement learning. 2023. ArXiv:2309.04583
Book Google Scholar
Alyahya T N, Menai M E B, Mathkour H. On the structure of the boolean satisfiability problem: a survey. ACM Comput Surv, 2023, 55: 1–34
Article MATH Google Scholar
Guo W, Zhen H L, Li X, et al. Machine learning methods in solving the Boolean satisfiability problem. Mach Intell Res, 2023, 20: 640–655
Article MATH Google Scholar
Avgerinos T, Rebert A, Cha S K, et al. Enhancing symbolic execution with veritesting. In: Proceedings of the 36th International Conference on Software Engineering, 2014. 1083–1094
Chapter MATH Google Scholar
Baldoni R, Coppa E, D’elia D C, et al. A survey of symbolic execution techniques. ACM Comput Surv, 2019, 51: 1–39
Article MATH Google Scholar
He J, Sivanrupan G, Tsankov P, et al. Learning to explore paths for symbolic execution. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2021
MATH Google Scholar
Cha S, Oh H. Concolic testing with adaptively changing search heuristics. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019
MATH Google Scholar
Cha S, Hong S, Lee J, et al. Automatically generating search heuristics for concolic testing. In: Proceedings of IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018
MATH Google Scholar
Zhang T, Zhang Y, Chen Z, et al. Efficient multiplex symbolic execution with adaptive search strategy. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020
MATH Google Scholar
Cha S, Oh H. Making symbolic execution promising by learning aggressive state-pruning strategy. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020
MATH Google Scholar
Chen Z, Chen Z, Shuai Z, et al. Synthesize solving strategy for symbolic execution. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 348–360
Chapter MATH Google Scholar
Luo S, Xu H, Bi Y, et al. Boosting symbolic execution via constraint solving time prediction (experience paper). In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 336–347
Chapter MATH Google Scholar
Cha S, Lee M, Lee S, et al. SYMTUNER: maximizing the power of symbolic execution by adaptively tuning external parameters. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022
MATH Google Scholar
Chen J, Hu W, Zhang L, et al. Learning to accelerate symbolic execution via code transformation. In: Proceedings of the 32nd European Conference on Object-Oriented Programming, 2018
MATH Google Scholar
Development team T C. The Coq proof assistant. 1984. https://coq.inria.fr/coq-84
MATH Google Scholar
Development team T I. Isabelle. 1986. https://isabelle.in.tum.de/index.html
MATH Google Scholar
Paulson L C. Natural deduction as higher-order resolution. 1986. ArXiv:cs/9301104
Book MATH Google Scholar
Lample G, Lachaux M A, Lavril T, et al. HyperTree proof search for neural theorem proving. 2022. ArXiv:2205.11491
Google Scholar
Wu Y, Jiang A Q, Li W, et al. Autoformalization with large language models. In: Proceedings of Advances in Neural Information Processing Systems, 2022
MATH Google Scholar
First E, Brun Y. Diversity-driven automated formal verification. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 749–761
Chapter MATH Google Scholar
Yang K, Swope A M, Gu A, et al. LeanDojo: theorem proving with retrieval-augmented language models. 2023. ArXiv:2306.15626
Google Scholar
Chakraborty S, Lahiri S K, Fakhoury S, et al. Ranking LLM-generated loop invariants for program verification. 2023. ArXiv:2310.09342
Book MATH Google Scholar
Zimmeck S, Wang Z, Zou L, et al. Automated analysis of privacy requirements for mobile apps. In: Proceedings of the AAAI Fall Symposium Series, 2016
MATH Google Scholar
Mahanipour A, Nezamabadi-pour H. GSP: an automatic programming technique with gravitational search algorithm. Appl Intell, 2019, 49: 1502–1516
Article MATH Google Scholar
Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems, 2013. 26
MATH Google Scholar
Liu S, Zhao B, Guo R, et al. Have you been properly notified? Automatic compliance analysis of privacy policy text with GDPR article 13. In: Proceedings of the Web Conference 2021, 2021. 2154–2164
Chapter MATH Google Scholar
Rubio-González C, Liblit B. Expect the unexpected: error code mismatches between documentation and the real world. In: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, 2010. 73–80
Chapter MATH Google Scholar
Tan L, Yuan D, Krishna G, et al. /*icomment: bugs or bad comments?*/. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007. 145–158
Chapter MATH Google Scholar
Tan S H, Marinov D, Tan L, et al. @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation, 2012. 260–269
MATH Google Scholar
Wen F, Nagy C, Bavota G, et al. A large-scale empirical study on code-comment inconsistencies. In: Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 2019. 53–64
MATH Google Scholar
Pandita R, Taneja K, Williams L, et al. ICON: inferring temporal constraints from natural language API descriptions. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016. 378–388
MATH Google Scholar
Ren X, Ye X, Xing Z, et al. API-misuse detection driven by fine-grained API-constraint knowledge graph. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 461–472
Chapter MATH Google Scholar
Lv T, Li R, Yang Y, et al. RTFM! automatic assumption discovery and verification derivation from library document for API misuse detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2020. 1837–1852
MATH Google Scholar
Yun I, Min C, Si X, et al. APISan: sanitizing API usages through semantic cross-checking. In: Proceedings of Usenix Security Symposium, 2016. 363–378
MATH Google Scholar
Kang Y, Ray B, Jana S. APEx: automated inference of error specifications for C APIs. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 472–482
Chapter MATH Google Scholar
Li C, Zhou M, Gu Z, et al. Ares: inferring error specifications through static analysis. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019. 1174–1177
MATH Google Scholar
Takanen A, Demott J D, Miller C, et al. Fuzzing for Software Security Testing and Quality Assurance. Norwood: Artech House, Inc. 2018
MATH Google Scholar
You W, Zong P, Chen K, et al. SemFuzz: semantics-based automatic generation of proof-of-concept exploits. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017. 2139–2154
MATH Google Scholar
Godefroid P, Peleg H, Singh R. Learn&Fuzz: machine learning for input fuzzing. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. 50–59
MATH Google Scholar
Liu X, Li X, Prajapati R, et al. DeepFuzz: automatic generation of syntax valid C programs for fuzz testing. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 1044–1051
MATH Google Scholar
Lee S, Han H, Cha S K, et al. Montage: a neural network language model-guided JavaScript engine fuzzer. In: Proceedings of the 29th USENIX Conference on Security Symposium, 2020. 2613–2630
MATH Google Scholar
Chen P, Chen H. Angora: efficient fuzzing by principled search. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2018. 711–725
MATH Google Scholar
Funahashi K I. On the approximate realization of continuous mappings by neural networks. Neural Netws, 1989, 2: 183–192
Article MATH Google Scholar
Nagy S, Hicks M. Full-speed fuzzing: reducing fuzzing overhead through coverage-guided tracing. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2019. 787–802
MATH Google Scholar
Zhou C, Wang M, Liang J, et al. Zeror: speed up fuzzing with coverage-sensitive tracing and scheduling. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 858–870
Chapter MATH Google Scholar
Zong P, Lv T, Wang D, et al. FuzzGuard: filtering out unreachable inputs in directed grey-box fuzzing through deep learning. In: Proceedings of the 29th USENIX Conference on Security Symposium, 2020. 2255–2269
MATH Google Scholar
Jung R, Jourdan J H, Krebbers R, et al. Safe systems programming in Rust. Commun ACM, 2021, 64: 144–152
Article MATH Google Scholar
Wong W E, Gao R, Li Y, et al. A survey on software fault localization. IEEE Trans Software Eng, 2016, 42: 707–740
Article MATH Google Scholar
Zakari A, Lee S P, Abreu R, et al. Multiple fault localization of software programs: a systematic literature review. Inf Software Tech, 2020, 124: 106312
Article MATH Google Scholar
Xie X, Liu Z, Song S, et al. Revisit of automatic debugging via human focus-tracking analysis. In: Proceedings of the 38th International Conference on Software Engineering, 2016. 808–819
Chapter MATH Google Scholar
Agrawal H, Horgan J, London S, et al. Fault localization using execution slices and dataflow tests. In: Proceedings of the 6th International Symposium on Software Reliability Engineering, 1995. 143–151
Chapter MATH Google Scholar
Wong C P, Xiong Y, Zhang H, et al. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2014. 181–190
MATH Google Scholar
Zhang X, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering, New York, 2006. 272–281
MATH Google Scholar
Jones J A, Harrold M J, Stasko J. Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering, 2002. 467–477
MATH Google Scholar
Liblit B, Naik M, Zheng A X, et al. Scalable statistical bug isolation. ACM SIGPLAN Not, 2005, 40: 15–26
Article MATH Google Scholar
Abreu R, Zoeteweij P, Golsteijn R, et al. A practical evaluation of spectrum-based fault localization. J Syst Software, 2009, 82: 1780–1792
Article MATH Google Scholar
Xie X Y, Chen T Y, Kuo F-C, et al. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Trans Softw Eng Methodol, 2013, 22: 1–40
Article MATH Google Scholar
Zou D, Liang J, Xiong Y, et al. An empirical study of fault localization families and their combinations. IEEE Trans Software Eng, 2019, 47: 332–347
Article MATH Google Scholar
Widyasari R, Prana G A A, Haryono S A, et al. XAI4FL: enhancing spectrum-based fault localization with explainable artificial intelligence. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022. 499–510
Chapter MATH Google Scholar
Moon S, Kim Y, Kim M, et al. Ask the mutants: mutating faulty programs for fault localization. In: Proceedings of the IEEE 7th International Conference on Software Testing, Verification and Validation, 2014. 153–162
MATH Google Scholar
Papadakis M, Traon Y L. Metallaxis-FL: mutation-based fault localization. Software Testing Verif Rel, 2015, 25: 605–628
Article MATH Google Scholar
Wong W E, Qi Y U. Bp neural network-based effective fault localization. Int J Soft Eng Knowl Eng, 2009, 19: 573–597
Article MATH Google Scholar
Wong W E, Debroy V, Golden R, et al. Effective software fault localization using an RBF neural network. IEEE Trans Rel, 2012, 61: 149–169
Article MATH Google Scholar
Zheng W, Hu D, Wang J. Fault localization analysis based on deep neural network. Math Problems Eng, 2016, 2016: 1–11
MATH Google Scholar
Zhang Z, Lei Y, Tan Q, et al. Deep learning-based fault localization with contextual information. IEICE Trans Inf Syst, 2017, E100.D: 3027–3031
Article MATH Google Scholar
Li X, Li W, Zhang Y, et al. DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 169–180
Chapter MATH Google Scholar
Zhang Z, Lei Y, Mao X G, et al. CNN-FL: an effective approach for localizing faults using convolutional neural networks. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019
MATH Google Scholar
Li Y, Wang S, Nguyen T. Fault localization with code coverage representation learning. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 661–673
MATH Google Scholar
Lou Y, Zhu Q, Dong J, et al. Boosting coverage-based fault localization via graph-based representation learning. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021. 664–676
Chapter MATH Google Scholar
Qian J, Ju X, Chen X, et al. AGFL: a graph convolutional neural network-based method for fault localization. In: Proceedings of the IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 2021. 672–680
MATH Google Scholar
Qian J, Ju X, Chen X. GNet4FL: effective fault localization via graph convolutional neural network. Autom Softw Eng, 2023, 30: 16
Article Google Scholar
Zhang Z, Lei Y, Mao X, et al. Context-aware neural fault localization. IEEE Trans Software Eng, 2023, 49: 3939–3954
Article MATH Google Scholar
Li Y, Wang S, Nguyen T N. Fault localization to detect co-change fixing locations. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, 2022. 659–671
MATH Google Scholar
Dutta A, Manral R, Mitra P, et al. Hierarchically localizing software faults using DNN. IEEE Trans Rel, 2020, 69: 1267–1292
Article Google Scholar
Yu J, Lei Y, Xie H, et al. Context-based cluster fault localization. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, New York, 2022. 482–493
MATH Google Scholar
Li Z, Tang E, Chen X, et al. Graph neural network based two-phase fault localization approach. In: Proceedings of the 13th Asia-Pacific Symposium on Internetware, 2022. 85–95
Chapter MATH Google Scholar
Yousofvand L, Soleimani S, Rafe V. Automatic bug localization using a combination of deep learning and model transformation through node classification. Software Qual J, 2023, 31: 1045–1063
Article Google Scholar
Wu S, Li Z, Liu Y, et al. GMBFL: optimizing mutation-based fault localization via graph representation. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2023. 245–257
MATH Google Scholar
Cao J, Yang S, Jiang W, et al. BugPecker: locating faulty methods with deep learning on revision graphs. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020. 1214–1218
Chapter MATH Google Scholar
Ciborowska A, Damevski K. Fast changeset-based bug localization with BERT. In: Proceedings of the 44th International Conference on Software Engineering, New York, 2022. 946–957
Chapter MATH Google Scholar
Zhang Z, Lei Y, Mao X, et al. A study of effectiveness of deep learning in locating real faults. Inf Software Tech, 2021, 131: 106486
Article MATH Google Scholar
Zhong H, Mei H. Learning a graph-based classifier for fault localization. Sci China Inf Sci, 2020, 63: 162101
Article MathSciNet MATH Google Scholar
Zhang Z, Lei Y, Mao X, et al. Improving deep-learning-based fault localization with resampling. J Software Evolu Process, 2021, 33: e2312
Article MATH Google Scholar
Xie H, Lei Y, Yan M, et al. A universal data augmentation approach for fault localization. In: Proceedings of the 44th International Conference on Software Engineering, New York, 2022. 48–60
Chapter MATH Google Scholar
Hu J, Xie H, Lei Y, et al. A light-weight data augmentation method for fault localization. Inf Software Tech, 2023, 157: 107148
Article MATH Google Scholar
Lei Y, Liu C, Xie H, et al. BCL-FL: a data augmentation approach with between-class learning for fault localization. In: Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022. 289–300
MATH Google Scholar
Lei Y, Wen T, Xie H, et al. Mitigating the effect of class imbalance in fault localization using context-aware generative adversarial network. In: Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension, 2023
MATH Google Scholar
Zhang Z, Lei Y, Su T, et al. Influential global and local contexts guided trace representation for fault localization. ACM Trans Softw Eng Methodol, 2023, 32: 1–27
Article MATH Google Scholar
Tian Z, Chen J, Zhu Q, et al. Learning to construct better mutation faults. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022. 1–13
MATH Google Scholar
Zhang Z, Lei Y, Mao X, et al. Improving fault localization using model-domain synthesized failing test generation. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 199–210
MATH Google Scholar
Just R, Jalali D, Ernst M D. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the International Symposium on Software Testing and Analysis, 2014. 437–440
MATH Google Scholar
Madeiral F, Urli S, Maia M, et al. BEARS: an extensible Java bug benchmark for automatic program repair studies. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 468–478
MATH Google Scholar
Do H, Elbaum S, Rothermel G. Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Software Eng, 2005, 10: 405–435
Article Google Scholar
Goues C L, Holtschulte N, Smith E K, et al. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Software Eng, 2015, 41: 1236–1256
Article MATH Google Scholar
Weiß C, Premraj R, Zimmermann T, et al. How long will it take to fix this bug? In: Proceedings of the 4th International Workshop on Mining Software Repositories, 2007
MATH Google Scholar
Gazzola L, Micucci D, Mariani L. Automatic software repair: a survey. IEEE Trans Software Eng, 2019, 45: 34–67
Article MATH Google Scholar
Xuan J, Ren Z, Wang Z, et al. Progress on approaches to automatic program repair (in Chinese). J Software, 2016, 27: 771–784
MathSciNet MATH Google Scholar
Monperrus M. The Living Review on Automated Program Repair. Research Report hal-01956501, HAL Archives Ouvertes, 2018. Version: 5
MATH Google Scholar
Tufano M, Watson C, Bavota G, et al. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans Softw Eng Methodol, 2019, 28: 1–29
Article Google Scholar
Kern C, Esparza J. Automatic error correction of Java programs. In: Proceedings of the 15th International Workshop on Formal Methods for Industrial Critical Systems, 2010. 67–81
Chapter MATH Google Scholar
Tian Y, Ray B. Automatically diagnosing and repairing error handling bugs in C. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017. 752–762
MATH Google Scholar
Carvalho A, Luz W P, Marcilio D, et al. C-3PR: a bot for fixing static analysis violations via pull requests. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, 2020. 161–171
MATH Google Scholar
Aho A V, Peterson T G. A minimum distance error-correcting parser for context-free languages. SIAM J Comput, 1972, 1: 305–312
Article MathSciNet MATH Google Scholar
Graham S L, Rhodes S P. Practical syntactic error recovery. In: Proceedings of Conference Record of the ACM Symposium on Principles of Programming Languages, Boston, 1973. 52–58
MATH Google Scholar
Anderson S O, Backhouse R C. Locally least-cost error recovery in Earley’s algorithm. ACM Trans Program Lang Syst, 1981, 3: 318–347
Article MATH Google Scholar
Burke M G, Fisher G A. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Trans Program Lang Syst, 1987, 9: 164–197
Article MATH Google Scholar
Gupta R, Pal S, Kanade A, et al. DeepFix: fixing common C language errors by deep learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 1345–1351
Google Scholar
Bhatia S, Kohli P, Singh R. Neuro-symbolic program corrector for introductory programming assignments. In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 60–70
Chapter MATH Google Scholar
Ahmed U Z, Kumar P, Karkare A, et al. Compilation error repair: for the student programs, from the student programs. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training, 2018. 78–87
Chapter MATH Google Scholar
Santos E A, Campbell J C, Patel D, et al. Syntax and sensibility: using language models to detect and correct syntax errors. In: Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering, 2018. 311–322
MATH Google Scholar
Brown N C C, Kölling M, McCall D, et al. Blackbox: a large scale repository of novice programmers’ activity. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education, Atlanta, 2014. 223–228
Chapter MATH Google Scholar
Mesbah A, Rice A, Johnston E, et al. DeepDelta: learning to repair compilation errors. In: Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, 2019. 925–936
Google Scholar
Gupta R, Kanade A, Shevade S K. Deep reinforcement learning for syntactic error repair in student programs. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 2019. 930–937
Google Scholar
Wu L, Li F, Wu Y, et al. GGF: a graph-based method for programming language syntax error correction. In: Proceedings of the 28th International Conference on Program Comprehension, Seoul, 2020. 139–148
Chapter MATH Google Scholar
Yasunaga M, Liang P. Graph-based, self-supervised program repair from diagnostic feedback. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 10799–10808
MATH Google Scholar
Hajipour H, Bhattacharyya A, Staicu C, et al. SampleFix: learning to generate functionally diverse fixes. In: Proceedings of Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. 119–133
Chapter MATH Google Scholar
Yasunaga M, Liang P. Break-it-fix-it: unsupervised learning for program repair. In: Proceedings of the 38th International Conference on Machine Learning, 2021. 11941–11952
MATH Google Scholar
Ahmed T, Devanbu P, Hellendoorn V J. Learning lenient parsing & typing via indirect supervision. Empir Software Eng, 2021, 26: 29
Article Google Scholar
Sakkas G, Endres M, Guo P J, et al. Seq2Parse: neurosymbolic parse error repair. Proc ACM Program Lang, 2022, 6: 1180–1206
Article Google Scholar
Li X, Liu S, Feng R, et al. TransRepair: context-aware program repair for compilation errors. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, 2022. 1–13
MATH Google Scholar
Ahmed T, Ledesma N R, Devanbu P. SynShine: improved fixing of syntax errors. IEEE Trans Software Eng, 2023, 49: 2169–2181
Article MATH Google Scholar
Liu Z, Lin W, Shi Y, et al. A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th China National Conference on Chinese Computational Linguistics, Hohhot, 2021. 471–484
Chapter MATH Google Scholar
Gu Y F, Ma P, Jia X Y, et al. Progress on software crash research (in Chinese). Sci Sin Inform, 2019, 49: 1383–1398
Article MATH Google Scholar
Goues C L, Nguyen T V, Forrest S, et al. GenProg: a generic method for automatic software repair. IEEE Trans Software Eng, 2012, 38: 54–72
Article Google Scholar
Wong C, Santiesteban P, Kästner C, et al. VarFix: balancing edit expressiveness and search effectiveness in automated program repair. In: Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, 2021. 354–366
Google Scholar
Nguyen H D T, Qi D, Roychoudhury A, et al. SemFix: program repair via semantic analysis. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 772–781
MATH Google Scholar
Mechtaev S, Yi J, Roychoudhury A. Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 691–701
MATH Google Scholar
Xuan J, Martinez M, DeMarco F, et al. Nopol: automatic repair of conditional statement bugs in Java programs. IEEE Trans Software Eng, 2017, 43: 34–55
Article MATH Google Scholar
Tan S H, Roychoudhury A. relifix: automated repair of software regressions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, Florence, 2015. 471–482
MATH Google Scholar
Saha S, Saha R K, Prasad M R. Harnessing evolution for multi-hunk program repair. In: Proceedings of the 41st International Conference on Software Engineering, Montreal, 2019. 13–24
MATH Google Scholar
Liu K, Koyuncu A, Kim D, et al. TBar: revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, Beijing, 2019. 31–42
Chapter MATH Google Scholar
White M, Tufano M, Martinez M, et al. Sorting and transforming program repair ingredients via deep learning code similarities. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, Hangzhou, 2019. 479–490
MATH Google Scholar
Chen Z, Kommrusch S J, Tufano M, et al. SequenceR: sequence-to-sequence learning for end-to-end program repair. IEEE Trans Software Eng, 2021, 47: 1943–1959
MATH Google Scholar
Jiang N, Lutellier T, Tan L. CURE: code-aware neural machine translation for automatic program repair. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering, Madrid, 2021. 1161–1173
MATH Google Scholar
Long F, Rinard M C. Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016. 298–312
MATH Google Scholar
Goues C L, Dewey-Vogt M, Forrest S, et al. A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering, 2012. 3–13
Google Scholar
Tufano M, Watson C, Bavota G, et al. An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, 2018. 832–837
Chapter Google Scholar
Sun Z, Xin C, Sun Y. An automatic semantic code repair service based on deep learning for programs with single error. In: Proceedings of the IEEE World Congress on Services, Milan, 2019. 360–361
MATH Google Scholar
Ding Y, Ray B, Devanbu P T, et al. Patching as translation: the data and the metaphor. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, 2020. 275–286
MATH Google Scholar
Yang G, Min K, Lee B. Applying deep learning algorithm to automatic bug localization and repair. In: Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing, 2020. 1634–1641
Chapter MATH Google Scholar
Yu L, Zhang W, Wang J, et al. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 2852–2858
MATH Google Scholar
Lutellier T, Pham H V, Pang L, et al. CoCoNuT: combining context-aware neural translation models using ensemble for program repair. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 101–114
Chapter Google Scholar
Martinez M, Durieux T, Sommerard R, et al. Automatic repair of real bugs in Java: a large-scale experiment on the defects4j dataset. Empir Software Eng, 2017, 22: 1936–1964
Article Google Scholar
Saha R K, Lyu Y, Lam W, et al. Bugs.jar: a large-scale, diverse dataset of real-world Java bugs. In: Proceedings of the 15th International Conference on Mining Software Repositories, Gothenburg, 2018. 10–13
Chapter MATH Google Scholar
Tian H, Liu K, Kaboré A K, et al. Evaluating representation learning of code changes for predicting patch correctness in program repair. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, 2020. 981–992
MATH Google Scholar
Dinella E, Dai H, Li Z, et al. Hoppity: learning graph transformations to detect and fix bugs in programs. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020
MATH Google Scholar
Tang Y, Zhou L, Blanco A, et al. Grammar-based patches generation for automated program repair. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 1300–1305
MATH Google Scholar
Huang S, Zhou X, Chin S. Application of Seq2Seq models on code correction. Front Artif Intell, 2021, 4: 590215
Article MATH Google Scholar
Rahman M M, Watanobe Y, Nakamura K. A bidirectional LSTM language model for code evaluation and repair. Symmetry, 2021, 13: 247
Article MATH Google Scholar
Berabi B, He J, Raychev V, et al. TFix: learning to fix coding errors with a text-to-text transformer. In: Proceedings of the 38th International Conference on Machine Learning, 2021. 780–791
MATH Google Scholar
Tang B, Li B, Bo L, et al. GrasP: graph-to-sequence learning for automated program repair. In: Proceedings of the 21st IEEE International Conference on Software Quality, Reliability and Security, Hainan, 2021. 819–828
MATH Google Scholar
Szalontai B, Vadász A, Borsi Z R, et al. Detecting and fixing nonidiomatic snippets in Python source code with deep learning. In: Proceedings of Intelligent Systems and Applications, Amsterdam, 2021. 129–147
MATH Google Scholar
Li Y, Wang S, Nguyen T N. DEAR: a novel deep learning-based approach for automated program repair. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 511–523
MATH Google Scholar
Xu X, Wang X, Xue J. M3V: multi-modal multi-view context embedding for repair operator prediction. In: Proceedings of IEEE/ACM International Symposium on Code Generation and Optimization, Seoul, 2022. 266–277
MATH Google Scholar
Meng X, Wang X, Zhang H, et al. Improving fault localization and program repair with deep semantic features and transferred knowledge. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 1169–1180
MATH Google Scholar
Kim M, Kim Y, Heo J, et al. Impact of defect instances for successful deep learning-based automatic program repair. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, Limassol, 2022. 419–423
MATH Google Scholar
Wardat M, Cruz B D, Le W, et al. DeepDiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 561–572
Google Scholar
Yao J, Rao B, Xing W, et al. Bug-Transformer: automated program repair using attention-based deep neural network. J Circuit Syst Comp, 2022, 31: 2250210
Article MATH Google Scholar
Yan D, Liu K, Niu Y, et al. Crex: predicting patch correctness in automated repair of C programs through transfer learning of execution semantics. Inf Software Tech, 2022, 152: 107043
Article MATH Google Scholar
Pei K, Xuan Z, Yang J, et al. Learning approximate execution semantics from traces for binary function similarity. IEEE Trans Software Eng, 2023, 49: 2776–2790
Article MATH Google Scholar
Chakraborty S, Ding Y, Allamanis M, et al. CODIT: code editing with tree-based neural models. IEEE Trans Software Eng, 2022, 48: 1385–1399
Article MATH Google Scholar
Ye H, Martinez M, Monperrus M. Neural program repair with execution-based backpropagation. In: Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, 2022. 1506–1518
Chapter MATH Google Scholar
Ye H, Gu J, Martinez M, et al. Automated classification of overfitting patches with statically extracted code features. IEEE Trans Software Eng, 2022, 48: 2920–2938
Article MATH Google Scholar
Ye H, Martinez M, Luo X, et al. SelfAPR: self-supervised program repair with test execution diagnostics. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, 2022. 1–13
MATH Google Scholar
Xia C S, Zhang L. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022. 959–971
MATH Google Scholar
Kim M, Kim Y, Jeong H, et al. An empirical study of deep transfer learning-based program repair for Kotlin projects. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022. 1441–1452
MATH Google Scholar
Tian H, Li Y, Pian W, et al. Predicting patch correctness based on the similarity of failing test cases. ACM Trans Softw Eng Methodol, 2022, 31: 1–30
Article MATH Google Scholar
Yuan W, Zhang Q, He T, et al. CIRCLE: continual repair across programming languages. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022. 678–690
Chapter MATH Google Scholar
Chen L, Pei Y, Pan M, et al. Program repair with repeated learning. IEEE Trans Software Eng, 2023, 49: 831–848
Article MATH Google Scholar
Stocco A, Yandrapally R, Mesbah A. Visual web test repair. In: Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Lake Buena Vista, 2018. 503–514
Google Scholar
Pan M, Xu T, Pei Y, et al. GUI-guided test script repair for mobile apps. IEEE Trans Software Eng, 2022, 48: 910–929
MATH Google Scholar
Ren Z, Sun S, Xuan J, et al. Automated patching for unreproducible builds. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 200–211
MATH Google Scholar
Hassan F, Wang X. HireBuild: an automatic approach to history-driven repair of build scripts. In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 1078–1089
Chapter MATH Google Scholar
Lou Y, Chen J, Zhang L, et al. History-driven build failure fixing: how far are we? In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 43–54
Chapter MATH Google Scholar
Loriot B, Madeiral F, Monperrus M. Styler: learning formatting conventions to repair Checkstyle violations. Empir Software Eng, 2022, 27: 149
Article Google Scholar
Ma S, Thung F, Lo D, et al. VuRLE: automatic vulnerability detection and repair by learning from examples. In: Proceedings of the 22nd European Symposium on Research in Computer Security, Oslo, 2017. 229–246
MATH Google Scholar
Harer J, Ozdemir O, Lazovich T, et al. Learning to repair software vulnerabilities with generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 7944–7954
MATH Google Scholar
Zhou Z, Bo L, Wu X, et al. SPVF: security property assisted vulnerability fixing via attention-based models. Empir Software Eng, 2022, 27: 171
Article MATH Google Scholar
Huang K, Yang S, Sun H, et al. Repairing security vulnerabilities using pre-trained programming language models. In: Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022. 111–116
MATH Google Scholar
Chen Z, Kommrusch S, Monperrus M. Neural transfer learning for repairing security vulnerabilities in C code. IEEE Trans Software Eng, 2023, 49: 147–165
Article Google Scholar
Chi J, Qu Y, Liu T, et al. SeqTrans: automatic vulnerability fix via sequence to sequence learning. IEEE Trans Software Eng, 2023, 49: 564–585
Article MATH Google Scholar
Das R, Ahmed U Z, Karkare A, et al. Prutor: a system for tutoring CS1 and collecting student programs for analysis. 2016. ArXiv:1608.03828
MATH Google Scholar
Brown N C C, Altadmri A, Sentance S, et al. Blackbox, five years on: an evaluation of a large-scale programming data collection project. In: Proceedings of the ACM Conference on International Computing Education Research, New York, 2018. 196–204
MATH Google Scholar
Motwani M, Sankaranarayanan S, Just R, et al. Do automated program repair techniques repair hard and important bugs? In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 25
Chapter MATH Google Scholar
Jiang Y, Liu H, Niu N, et al. Extracting concise bug-fixing patches from human-written patches in version control systems. In: Proceedings of the 43rd International Conference on Software Engineering (ICSE’21), 2021
MATH Google Scholar
Jiang Y, Liu H, Luo X, et al. BugBuilder: an automated approach to building bug repository. IEEE Trans Software Eng, 2023, 49: 1443–1463
Article MATH Google Scholar
Bui Q C, Scandariato R, Ferreyra N E D. Vul4J: a dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques. In: Proceedings of the IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), 2022. 464–468
Chapter Google Scholar
Nikitopoulos G, Dritsa K, Louridas P, et al. CrossVul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021. 1565–1569
Chapter Google Scholar
Zou W, Lo D, Chen Z, et al. How practitioners perceive automated bug report management techniques. IEEE Trans Software Eng, 2018, 46: 836–862
Article Google Scholar
Bettenburg N, Just S, Schröter A, et al. What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008. 308–318
Chapter MATH Google Scholar
Lee D G, Seo Y S. Systematic review of bug report processing techniques to improve software management performance. J Inf Process Syst, 2019, 15: 967–985
MATH Google Scholar
Anvik J. Automating bug report assignment. In: Proceedings of the 28th International Conference on Software Engineering, 2006. 937–940
Chapter MATH Google Scholar
Jiang H, Li X, Ren Z, et al. Toward better summarizing bug reports with crowdsourcing elicited attributes. IEEE Trans Rel, 2018, 68: 2–22
Article Google Scholar
Tan Y, Xu S, Wang Z, et al. Bug severity prediction using question-and-answer pairs from Stack Overflow. J Syst Software, 2020, 165: 110567
Article MATH Google Scholar
Zhang T, Han D, Vinayakarao V, et al. Duplicate bug report detection: how far are we? ACM Trans Softw Eng Methodol, 2023, 32: 1–32
Article Google Scholar
Li X, Jiang H, Liu D, et al. Unsupervised deep bug report summarization. In: Proceedings of the 26th Conference on Program Comprehension, 2018. 144–155
Chapter MATH Google Scholar
Fang F, Wu J, Li Y, et al. On the classification of bug reports to improve bug localization. Soft Comput, 2021, 25: 7307–7323
Article MATH Google Scholar
Zhou C, Li B, Sun X, et al. Leveraging multi-level embeddings for knowledge-aware bug report reformulation. J Syst Software, 2023, 198: 111617
Article Google Scholar
He J, Xu L, Yan M, et al. Duplicate bug report detection using dual-channel convolutional neural networks. In: Proceedings of the 28th International Conference on Program Comprehension, 2020. 117–127
Chapter MATH Google Scholar
Xiao G, Du X, Sui Y, et al. HINDBR: heterogeneous information network based duplicate bug report prediction. In: Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020. 195–206
MATH Google Scholar
Xie Q, Wen Z, Zhu J, et al. Detecting duplicate bug reports with convolutional neural networks. In: Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC), 2018. 416–425
MATH Google Scholar
Deshmukh J, Annervaz K, Podder S, et al. Towards accurate duplicate bug retrieval using deep learning techniques. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 115–124
Google Scholar
Budhiraja A, Dutta K, Reddy R, et al. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, 2018. 193–194
Chapter MATH Google Scholar
Isotani H, Washizaki H, Fukazawa Y, et al. Duplicate bug report detection by using sentence embedding and fine-tuning. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2021. 535–544
MATH Google Scholar
Jiang Y, Su X, Treude C, et al. Does deep learning improve the performance of duplicate bug report detection? An empirical study. J Syst Software, 2023, 198: 111607
Article Google Scholar
Koc U, Wei S, Foster J S, et al. An empirical assessment of machine learning approaches for triaging reports of a Java static analysis tool. In: Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019. 288–299
MATH Google Scholar
Florea A C, Anvik J, Andonie R. Parallel implementation of a bug report assignment recommender using deep learning. In: Proceedings of the 26th International Conference on Artificial Neural Networks and Machine Learning, 2017. 64–71
MATH Google Scholar
Lee S R, Heo M J, Lee C G, et al. Applying deep learning based automatic bug triager to industrial projects. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017
MATH Google Scholar
Mani S, Sankaran A, Aralikatte R. DeepTriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019. 171–179
Chapter MATH Google Scholar
Liu Y, Qi X, Zhang J, et al. Automatic bug triaging via deep reinforcement learning. Appl Sci, 2022, 12: 3565
Article MATH Google Scholar
Han Z, Li X, Xing Z, et al. Learning to predict severity of software vulnerability using only vulnerability description. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 125–136
MATH Google Scholar
Gomes L A F, Torres R S, Côrtes M L. Bug report severity level prediction in open source software: a survey and research opportunities. Inf Software Tech, 2019, 115: 58–78
Article MATH Google Scholar
Noyori Y, Washizaki H, Fukazawa Y, et al. Deep learning and gradient-based extraction of bug report features related to bug fixing time. Front Comput Sci, 2023, 5: 1032440
Article MATH Google Scholar
Liu H, Yu Y, Li S, et al. How to cherry pick the bug report for better summarization? Empir Software Eng, 2021, 26: 119
Article MATH Google Scholar
Liu H, Yu Y, Li S, et al. BugSum: deep context understanding for bug report summarization. In: Proceedings of the 28th International Conference on Program Comprehension, 2020. 94–105
Chapter MATH Google Scholar
Chen S, Xie X, Yin B, et al. Stay professional and efficient: automatically generate titles for your bug reports. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 385–397
Chapter MATH Google Scholar
Lin H, Chen X, Chen X, et al. TitleGen-FL: quality prediction-based filter for automated issue title generation. J Syst Software, 2023, 195: 111513
Article MATH Google Scholar
Xiao Y, Keung J, Bennin K E, et al. Improving bug localization with word embedding and enhanced convolutional neural networks. Inf Software Tech, 2019, 105: 17–29
Article Google Scholar
Xiao Y, Keung J, Mi Q, et al. Improving bug localization with an enhanced convolutional neural network. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC), 2017. 338–347
MATH Google Scholar
Wang B, Xu L, Yan M, et al. Multi-dimension convolutional neural network for bug localization. IEEE Trans Serv Comput, 2020, 15: 1649–1663
Article MATH Google Scholar
Lam A N, Nguyen A T, Nguyen H A, et al. Bug localization with combination of deep learning and information retrieval. In: Proceedings of the IEEE/ACM 25th International Conference on Program Comprehension (ICPC), 2017. 218–229
MATH Google Scholar
Cheng S, Yan X, Khan A A. A similarity integration method based information retrieval and word embedding in bug localization. In: Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020. 180–187
MATH Google Scholar
Lam A N, Nguyen A T, Nguyen H A, et al. Combining deep learning with information retrieval to localize buggy files for bug reports (N). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015. 476–481
MATH Google Scholar
Loyola P, Gajananan K, Satoh F. Bug localization by learning to rank and represent bug inducing changes. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018. 657–665
Chapter MATH Google Scholar
Zhu Z, Li Y, Tong H H, et al. CooBa: cross-project bug localization via adversarial transfer learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020
MATH Google Scholar
Han J, Huang C, Sun S, et al. bjXnet: an improved bug localization model based on code property graph and attention mechanism. Autom Softw Eng, 2023, 30: 12
Article MATH Google Scholar
Liang H, Hang D, Li X. Modeling function-level interactions for file-level bug localization. Empir Software Eng, 2022, 27: 186
Article MATH Google Scholar
Choetkiertikul M, Dam H K, Tran T, et al. Automatically recommending components for issue reports using deep learning. Empir Software Eng, 2021, 26: 1–39
Article Google Scholar
Huo X, Thung F, Li M, et al. Deep transfer bug localization. IEEE Trans Software Eng, 2019, 47: 1368–1380
Article MATH Google Scholar
Haering M, Stanik C, Maalej W. Automatically matching bug reports with related app reviews. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 970–981
Google Scholar
Ruan H, Chen B, Peng X, et al. DeepLink: recovering issue-commit links based on deep learning. J Syst Software, 2019, 158: 110406
Article Google Scholar
Xie R, Chen L, Ye W, et al. DeepLink: a code knowledge graph based deep learning approach for issue-commit link recovery. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 434–444
MATH Google Scholar
Xi S, Yao Y, Xiao X, et al. An effective approach for routing the bug reports to the right fixers. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, 2018. 1–10
MATH Google Scholar
Fu W, Menzies T. Easy over hard: a case study on deep learning. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, New York, 2017. 49–60
MATH Google Scholar
Biswas E, Vijay-Shanker K, Pollock L. Exploring word embedding techniques to improve sentiment analysis of software engineering texts. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 68–78
MATH Google Scholar
Nizamani Z A, Liu H, Chen D M, et al. Automatic approval prediction for software enhancement requests. Autom Softw Eng, 2018, 25: 347–381
Article MATH Google Scholar
Li X, Jiang H, Kamei Y, et al. Bridging semantic gaps between natural languages and APIs with word embedding. IEEE Trans Software Eng, 2018, 46: 1081–1097
Article MATH Google Scholar
Rhu M, Gimelshein N, Clemons J, et al. VDNN: virtualized deep neural networks for scalable, memory-efficient neural network design. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
MATH Google Scholar
Wang L, Ye J, Zhao Y, et al. Superneurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, 2018. 41–53
MATH Google Scholar
Moran K, Bernal-Cardenas C, Curcio M, et al. Machine learning-based prototyping of graphical user interfaces for mobile apps. IEEE Trans Software Eng, 2018, 46: 196–221
Article MATH Google Scholar
Brooks F P. The Mythical Man-Month: Essays on Software Engineering. Reading: Addison-Wesley, 1975
MATH Google Scholar
Mockus A, Herbsleb J D. Expertise browser: a quantitative approach to identifying expertise. In: Proceedings of the 24th International Conference on Software Engineering, New York, 2002. 503–512
MATH Google Scholar
Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, New York, 2006. 361–370
MATH Google Scholar
Ma D, Schuler D, Zimmermann T, et al. Expert recommendation with usage expertise. In: Proceedings of the IEEE International Conference on Software Maintenance, 2009. 535–538
MATH Google Scholar
Zhou M, Mockus A. Developer fluency: achieving true mastery in software projects. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, 2010. 137–146
MATH Google Scholar
Fritz T, Murphy G C, Murphy-Hill E, et al. Degree-of-knowledge: modeling a developer’s knowledge of code. ACM Trans Softw Eng Methodol, 2014, 23: 1–42
Article MATH Google Scholar
Joblin M, Mauerer W, Apel S, et al. From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th International Conference on Software Engineering, 2015. 563–573
MATH Google Scholar
Meng X, Miller B P, Williams W R, et al. Mining software repositories for accurate authorship. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM), 2013. 250–259
MATH Google Scholar
Baltes S, Diehl S. Towards a theory of software development expertise. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018
MATH Google Scholar
Ren J, Yin H, Hu Q, et al. Towards quantifying the development value of code contributions. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 775–779
MATH Google Scholar
Venkataramani R, Gupta A, Asadullah A, et al. Discovery of technical expertise from open source code repositories. In: Proceedings of the 22nd International Conference on World Wide Web, 2013. 97–98
Chapter MATH Google Scholar
Saxena R, Pedanekar N. I know what you coded last summer: mining candidate expertise from GitHub repositories. In: Proceedings of Companion of the ACM Conference on Computer Supported Cooperative Work and Social Computing, 2017. 299–302
MATH Google Scholar
Liu S, Wang S, Zhu F, et al. HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014. 51–62
MATH Google Scholar
Kouters E, Vasilescu B, Serebrenik A, et al. Who’s who in Gnome: using LSA to merge software repository identities. In: Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM), 2012. 592–595
MATH Google Scholar
Mo W, Shen B, Chen Y, et al. TbIL: a tagging-based approach to identity linkage across software communities. In: Proceedings of Software Engineering Conference (APSEC), 2015. 56–63
MATH Google Scholar
Lee R K, Lo D. GitHub and stack overflow: analyzing developer interests across multiple social collaborative platforms. In: Proceedings of the 9th International Conference on Social Informatics, 2017. 245–256
Chapter MATH Google Scholar
Huang W, Mo W, Shen B, et al. CPDScorer: modeling and evaluating developer programming ability across software communities. In: Proceedings of SEKE, 2016. 87–92
MATH Google Scholar
Yan J, Sun H, Wang X, et al. Profiling developer expertise across software communities with heterogeneous information network analysis. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, Beijing, 2018. 1–9
MATH Google Scholar
Montandon J E, Valente M T, Silva L L. Mining the technical roles of GitHub users. Inf Software Tech, 2021, 131: 106485
Article MATH Google Scholar
Song X, Yan J, Huang Y, et al. A collaboration-aware approach to profiling developer expertise with cross-community data. In: Proceedings of IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), 2022. 344–355
MATH Google Scholar
Dey T, Karnauch A, Mockus A. Representation of developer expertise in open source software. In: Proceedings of IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2020. 995–1007
Google Scholar
Ma Y, Bogart C, Amreen S, et al. World of Code: an infrastructure for mining the universe of open source VCS data. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 143–154
Google Scholar
Dakhel A M, Desmarais M C, Khomh F. Dev2vec: representing domain expertise of developers in an embedding space. Inf Software Tech, 2022, 159: 107218
Article MATH Google Scholar
Javeed F, Siddique A, Munir A, et al. Discovering software developer’s coding expertise through deep learning. IET softw, 2020, 14: 213–220
Article Google Scholar
Wang Z, Sun H, Fu Y, et al. Recommending crowdsourced software developers in consideration of skill improvement. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. 717–722
MATH Google Scholar
Zhang Z, Sun H, Zhang H. Developer recommendation for Topcoder through a meta-learning based policy model. Empir Software Eng, 2019, 25: 859–889
Article MATH Google Scholar
Yu X, He Y, Fu Y, et al. Cross-domain developer recommendation algorithm based on feature matching. In: Proceedings of CCF Conference on Computer Supported Cooperative Work and Social Computing, 2019. 443–457
MATH Google Scholar
Wang J J, Yang Y, Wang S, et al. Context-aware personalized crowdtesting task recommendation. IEEE Trans Software Eng, 2021, 48: 3131–3144
Article MATH Google Scholar
Wang J, Yang Y, Wang S, et al. Context- and fairness-aware in-process crowdworker recommendation. ACM Trans Softw Eng Methodol, 2022, 31: 1–31
Article MATH Google Scholar
Ying H, Chen L, Liang T, et al. EARec: leveraging expertise and authority for pull-request reviewer recommendation in GitHub. In: Proceedings of the 3rd International Workshop on CrowdSourcing in Software Engineering, 2016. 29–35
Chapter MATH Google Scholar
Jiang J, Yang Y, He J, et al. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Inf Software Tech, 2017, 84: 48–62
Article MATH Google Scholar
Zhang J, Maddila C S, Bairi R, et al. Using large-scale heterogeneous graph representation learning for code review recommendations at Microsoft. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering, 2022. 162–172
MATH Google Scholar
Rebai S, Amich A, Molaei S, et al. Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations. Autom Softw Eng, 2020, 27: 301–328
Article MATH Google Scholar
Zanjani M B, Kagdi H, Bird C. Automatically recommending peer reviewers in modern code review. IEEE Trans Software Eng, 2016, 42: 530–543
Article Google Scholar
Hannebauer C, Patalas M, Stünkel S, et al. Automatically recommending code reviewers based on their expertise: an empirical comparison. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), 2016. 99–110
Chapter MATH Google Scholar
Rong G, Zhang Y, Yang L, et al. Modeling review history for reviewer recommendation: a hypergraph approach. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 1381–1392
MATH Google Scholar
Kovalenko V, Tintarev N, Pasynkov E, et al. Does reviewer recommendation help developers? IEEE Trans Software Eng, 2020, 46: 710–731
Article Google Scholar
Ahasanuzzaman M, Oliva G A, Hassan A E. Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study. Empir Software Eng, 2024, 29: 33
Article Google Scholar
Gonçalves P W, Calikli G, Serebrenik A, et al. Competencies for code review. In: Proceedings of the ACM on Human-Computer Interaction, 2023. 1–33
MATH Google Scholar
Huang Y, Sun H. Best answerers prediction with topic based GAT in Q&A sites. In: Proceedings of the 12th Asia-Pacific Symposium on Internetware, 2020. 156–164
Chapter MATH Google Scholar
Jin Y, Bai Y, Zhu Y, et al. Code recommendation for open source software developers. In: Proceedings of the ACM Web Conference, 2023
MATH Google Scholar
Xiao W, He H, Xu W, et al. Recommending good first issues in GitHub OSS projects. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 1830–1842
MATH Google Scholar
Santos F. Supporting the task-driven skill identification in open source project issue tracking systems. ACM SIGSOFT Softw Eng Notes, 2023, 48: 54–58
Article MATH Google Scholar
Costa C, Figueiredo J, Pimentel J F, et al. Recommending participants for collaborative merge sessions. IEEE Trans Software Eng, 2021, 47: 1198–1210
Article Google Scholar
Constantino K, Figueiredo E. CoopFinder: finding collaborators based on co-changed files. In: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2022. 1–3
MATH Google Scholar
Constantino K, Belém F, Figueiredo E. Dual analysis for helping developers to find collaborators based on co-changed files: an empirical study. Softw Pract Exp, 2023, 53: 1438–1464
Article Google Scholar
Surian D, Liu N, Lo D, et al. Recommending people in developers’ collaboration network. In: Proceedings of the 18th Working Conference on Reverse Engineering, 2011. 379–388
MATH Google Scholar
Canfora G, Penta M D, Oliveto R, et al. Who is going to mentor newcomers in open source projects? In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
MATH Google Scholar
Ye L, Sun H, Wang X, et al. Personalized teammate recommendation for crowdsourced software developers. In: Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018. 808–813
MATH Google Scholar
Fry T, Dey T, Karnauch A, et al. A dataset and an approach for identity resolution of 38 million author IDs extracted from 2B Git commits. In: Proceedings of IEEE/ACM 17th International Conference on Mining Software Repositories (MSR), 2020
Google Scholar

Download references

Acknowledgements

We thank the following persons for their prior contributions to the manuscript preparation (in alphabetical order): Yuze GUO (Beihang University), Ruiqi HONG (Beihang University), Mingwei LIU (Fudan University), Xiaofan LIU (Wuhan University), Di WU (Beihang University), Hongjun YANG (Beihang University), Yanming YANG (Zhejiang University), Binquan ZHANG (Beihang University), and Zhuang ZHAO (Wuhan University).

Author information

All authors have the same contribution to this work.

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, 100871, China
Yanjie Jiang & Lu Zhang
School of Journalism and Communication, Sun Yat-sen University, Guangzhou, 510275, China
Xiangping Chen
School of Software Technology, Zhejiang University, Hangzhou, 310058, China
Xing Hu
School of Software Engineering, Sun Yat-sen University, Guangzhou, 510275, China
Yuan Huang
School of Software, Dalian University of Technology, Dalian, 116024, China
He Jiang & Xiaochen Li
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Weixing Ji, Bo Liu & Hui Liu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Yanyan Jiang, Jiayi Wang, Yibiao Yang & Yuming Zhou
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Xiaoli Lian & Li Zhang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100864, China
Guozhu Meng
School of Computer Science, Fudan University, Shanghai, 200433, China
Xin Peng & Chong Wang
State Key Laboratory of Complex & Critical Software Environment (CCSE), School of Software, Beihang University, Beijing, 100191, China
Hailong Sun, Lin Shi & Yixin Yang
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Bo Wang
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Tiantian Wang
School of Computer Science, Wuhan University, Wuhan, 430072, China
Jifeng Xuan
Huawei Technologies, Hangzhou, 310056, China
Xin Xia

Authors

Xiangping Chen
View author publications
You can also search for this author inPubMed Google Scholar
Xing Hu
View author publications
You can also search for this author inPubMed Google Scholar
Yuan Huang
View author publications
You can also search for this author inPubMed Google Scholar
He Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Weixing Ji
View author publications
You can also search for this author inPubMed Google Scholar
Yanjie Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Yanyan Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Bo Liu
View author publications
You can also search for this author inPubMed Google Scholar
Hui Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaochen Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoli Lian
View author publications
You can also search for this author inPubMed Google Scholar
Guozhu Meng
View author publications
You can also search for this author inPubMed Google Scholar
Xin Peng
View author publications
You can also search for this author inPubMed Google Scholar
Hailong Sun
View author publications
You can also search for this author inPubMed Google Scholar
Lin Shi
View author publications
You can also search for this author inPubMed Google Scholar
Bo Wang
View author publications
You can also search for this author inPubMed Google Scholar
Chong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jiayi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Tiantian Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jifeng Xuan
View author publications
You can also search for this author inPubMed Google Scholar
Xin Xia
View author publications
You can also search for this author inPubMed Google Scholar
Yibiao Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yixin Yang
View author publications
You can also search for this author inPubMed Google Scholar
Li Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yuming Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Xiangping Chen, Xing Hu, He Jiang, Yanjie Jiang, Yanyan Jiang, Xiaoli Lian, Guozhu Meng, Xin Peng, Hailong Sun, Lin Shi, Bo Wang, Tiantian Wang, Jifeng Xuan, Yibiao Yang, Yuming Zhou or Lu Zhang.

Rights and permissions

Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, X., Hu, X., Huang, Y. et al. Deep learning-based software engineering: progress, challenges, and opportunities. Sci. China Inf. Sci. 68, 111102 (2025). https://doi.org/10.1007/s11432-023-4127-5

Download citation

Received: 26 September 2023
Revised: 31 December 2023
Accepted: 01 April 2024
Published: 24 December 2024
DOI: https://doi.org/10.1007/s11432-023-4127-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep learning-based software engineering: progress, challenges, and opportunities

Abstract

Article PDF

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords