Abstract
Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many studies have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However, although several surveys have provided overall pictures of the application of deep learning techniques in software engineering, they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-tuned for software engineering tasks. We still lack surveys explaining the advances of subareas in software engineering driven by deep learning techniques, as well as challenges and opportunities in each subarea. To this end, in this study, we present the first task-oriented survey on deep learning-based software engineering. It covers twelve major software engineering subareas significantly impacted by deep learning techniques. Such subareas spread out through the whole lifecycle of software development and maintenance, including requirements engineering, software development, testing, maintenance, and developer collaboration. As we believe that deep learning may provide an opportunity to revolutionize the whole discipline of software engineering, providing one survey covering as many subareas as possible in software engineering can help future research push forward the frontier of deep learning-based software engineering more systematically. For each of the selected subareas, we highlight the major advances achieved by applying deep learning techniques with pointers to the available datasets in such a subarea. We also discuss the challenges and opportunities concerning each of the surveyed software engineering subareas.
Article PDF
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504–507
Liu L, Ouyang W, Wang X, et al. Deep learning for generic object detection: a survey. Int J Comput Vis, 2020, 128: 261–318
Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Comput, 2006, 18: 1527–1554
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324
Elman J L. Finding structure in time. Cogn Sci, 1990, 14: 179–211
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780
Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Trans Signal Process, 1997, 45: 2673–2681
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 30
Yang Y M, Xia X, Lo D, et al. A survey on deep learning for software engineering. ACM Comput Surv, 2022, 54: 1–73
Nguyen G, Dlugolinsky S, Bobák M, et al. Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev, 2019, 52: 77–124
Wang J, Ma Y, Zhang L, et al. Deep learning for smart manufacturing: Methods and applications. J Manuf Syst, 2018, 48: 144–156
Shen D, Wu G, Suk H I. Deep learning in medical image analysis. Annu Rev Biomed Eng, 2017, 19: 221–248
Berman D S, Buczak A L, Chavis J S, et al. A survey of deep learning methods for cyber security. Information, 2019, 10: 122
Le T H, Chen H, Babar M A. Deep learning for source code modeling and generation: models, applications, and challenges. ACM Comput Surv, 2021, 53: 1–38
Svyatkovskiy A, Zhao Y, Fu S, et al. Pythia: AI-assisted code completion system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019. 2727–2735
Iyer S, Konstas I, Cheung A, et al. Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
Aniche M, Maziero E, Durelli R, et al. The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Trans Software Eng, 2020, 48: 1432–1450
Gu X, Zhang H, Kim S. Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, 2018. 933–944
Wardat M, Le W, Rajan H. Deeplocalize: fault localization for deep neural networks. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering, 2021. 251–262
Li Y, Wang S, Nguyen T N. DLFix: context-based code transformation learning for automated program repair. In: Proceedings of the 42nd International Conference on Software Engineering, Seoul, 2020. 602–614
Zou D, Wang S, Xu S, et al. μVulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans Dependable Secure Comput, 2019, 18: 2224–2236
Humbatova N, Jahangirova G, Tonella P. DeepCrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 67–78
Watson C, Cooper N, Palacio D N, et al. A systematic literature review on the use of deep learning in software engineering research. ACM Trans Softw Eng Methodol, 2022, 31: 1–58
Niu C, Li C, Luo B, et al. Deep learning meets software engineering: a survey on pre-trained models of source code. 2022. ArXiv:2205.11739
Zhang Q, Fang C, Xie Y, et al. A survey on large language models for software engineering. 2023. ArXiv:2312.15223
Jin Z. Environment Modeling-Based Requirements Engineering for Software Intensive Systems. San Francisco: Morgan Kaufmann Publishers Inc., 2017
Huang Q, Xia X, Lo D, et al. Automating intention mining. IEEE Trans Software Eng, 2020, 46: 1098–1119
Pudlitz F, Brokhausen F, Vogelsang A. Extraction of system states from natural language requirements. In: Proceedings of the IEEE 27th International Requirements Engineering Conference (RE), 2019. 211–222
Li M, Shi L, Yang Y, et al. A deep multitask learning approach for requirements discovery and annotation from open forum. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2021. 336–348
Guo H, Singh M P. Caspar: extracting and synthesizing user stories of problems from app reviews. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 628–640
Mekala R R, Irfan A, Groen E C, et al. Classifying user requirements from online feedback in small dataset environments using deep learning. In: Proceedings of the IEEE 29th International Requirements Engineering Conference (RE), 2021. 139–149
Tizard J, Devine P, Wang H, et al. A software requirements ecosystem: linking forum, issue tracker, and faqs for requirements management. IEEE Trans Software Eng, 2023, 49: 2381–2393
Shi L, Xing M, Li M, et al. Detection of hidden feature requests from massive chat messages via deep Siamese network. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 641–653
Pan S, Bao L, Ren X, et al. Automating developer chat mining. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 854–866
Türetken O, Su O, Demirörs O. Automating software requirements generation from business process models. In: Proceedings of the 1st Conference on the Principles of Software Engineering (PRISE’04), 2004
Cox K, Phalp K T, Bleistein S J, et al. Deriving requirements from process models via the problem frames approach. Inf Software Tech, 2005, 47: 319–337
Maiden N A M, Manning S, Jones S, et al. Generating requirements from systems models using patterns: a case study. Requir Eng, 2005, 10: 276–288
Yu E S K, Bois P D, Dubois E, et al. From organization models to system requirements: a ‘cooperating agents’ approach. In: Proceedings of the 3rd International Conference on Cooperative Information Systems (CoopIS-95), 1995. 194–204
Letier E, van Lamsweerde A. Deriving operational software specifications from system goals. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2002. 119–128
Landtsheer R D, Letier E, van Lamsweerde A. Deriving tabular event-based specifications from goal-oriented requirements models. Requir Eng, 2004, 9: 104–120
van Lamsweerde A. Goal-oriented requirements enginering: a roundtrip from research to practice [enginering read engineering]. In: Proceedings of the 12th IEEE International Requirements Engineering Conference, 2004. 4–7
van Lamsweerde A, Willemet L. Inferring declarative requirements specifications from operational scenarios. IEEE Trans Software Eng, 1998, 24: 1089–1114
Meziane F, Athanasakis N, Ananiadou S. Generating natural language specifications from UML class diagrams. Requir Eng, 2008, 13: 1–18
Berenbach B. The automated extraction of requirements from UML models. In: Proceedings of the 11th IEEE International Conference on Requirements Engineering (RE 2003), 2003. 287
Souag A, Mazo R, Salinesi C, et al. Using the AMAN-DA method to generate security requirements: a case study in the maritime domain. Requir Eng, 2018, 23: 557–580
Zhao Z, Zhang L, Lian X, et al. ReqGen: keywords-driven software requirements generation. Mathematics, 2023, 11: 332
Koscinski V, Hashemi S, Mirakhorli M. On-demand security requirements synthesis with relational generative adversarial networks. In: Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023. 1613–1625
Li M, Yang Y, Shi L, et al. Automated extraction of requirement entities by leveraging LSTM-CRF and transfer learning. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2020. 208–219
Casillo F, Deufemia V, Gravino C. Detecting privacy requirements from user stories with NLP transfer learning models. Inf Software Tech, 2022, 146: 106853
Ezzini S, Abualhaija S, Arora C, et al. Automated handling of anaphoric ambiguity in requirements: a multi-solution study. In: Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 187–199
Wang Y, Shi L, Li M, et al. Detecting coreferent entities in natural language requirements. Requir Eng, 2022, 27: 351–373
Wang Y, Shi L, Li M, et al. A deep context-wise method for coreference detection in natural language requirements. In: Proceedings of the IEEE 28th International Requirements Engineering Conference (RE), 2020. 180–191
Ezzini S, Abualhaija S, Arora C, et al. AI-based question answering assistance for analyzing natural-language requirements. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Baker C, Deng L, Chakraborty S, et al. Automatic multi-class non-functional software requirements classification using neural networks. In: Proceedings of the IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 2019. 610–615
Hey T, Keim J, Koziolek A, et al. NoRBERT: transfer learning for requirements classification. In: Proceedings of the IEEE 28th International Requirements Engineering Conference (RE), 2020. 169–179
Luo X, Xue Y, Xing Z, et al. PRCBERT: prompt learning for requirement classification using BERT-based pretrained language models. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2023
Winkler J P, Grönberg J, Vogelsang A. Predicting how to test requirements: an automated approach. In: Proceedings of the IEEE 27th International Requirements Engineering Conference (RE), 2019. 120–130
AlDhafer O, Ahmad I, Mahmood S. An end-to-end deep learning system for requirements classification using recurrent neural networks. Inf Software Tech, 2022, 147: 106877
Guo J, Cheng J, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 3–14
Jahan M S, Khan H U, Akbar S, et al. Bidirectional language modeling: a systematic literature review. Sci Program, 2021. doi: https://doi.org/10.1155/2021/6641832
Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2019, 36: 1234–1240
Feng Z, Guo D, Tang D, et al. CodeBERT: a pre-trained model for programming and natural languages. In: Proceedings of Findings of the Association for Computational Linguistics, 2020. 1536–1547
Lin J, Liu Y, Zeng Q, et al. Traceability transformed: generating more accurate links with pre-trained BERT models. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 324–335
Tian J, Zhang L, Lian X. A cross-level requirement trace link update model based on bidirectional encoder representations from transformers. Mathematics, 2023, 11: 623
Lin J, Liu Y, Cleland-Huang J. Information retrieval versus deep learning approaches for generating traceability links in bilingual projects. Empir Software Eng, 2022, 27: 5
ISO/IEC/IEEE International Standard. Systems and software engineering — life cycle processes — requirements engineering. ISO/IEC/IEEE 29148:2018(E), 2018. 1–104. https://www.iso.org/standard/72089.html.
Mavin A, Wilkinson P, Harwood A, et al. Easy approach to requirements syntax (EARS). In: Proceedings of the 17th IEEE International Requirements Engineering Conference, 2009. 317–322
Franch X, Glinz M, Mendez D, et al. A study about the knowledge and use of requirements engineering standards in industry. IEEE Trans Software Eng, 2022, 48: 3310–3325
Liang J T, Yang C, Myers B A. A large-scale survey on the usability of AI programming assistants: successes and challenges. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2023
Kelly S, Tolvanen J P. Domain-Specific Modeling: Enabling Full Code Generation. Hoboken: John Wiley & Sons, 2008
Allamanis M, Barr E T, Devanbu P, et al. A survey of machine learning for big code and naturalness. ACM Comput Surv, 2018, 51: 1–37
Murphy G C, Kersten M, Findlater L. How are Java software developers using the Eclipse IDE? IEEE Softw, 2006, 23: 76–83
Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009. 213–222
Gvero T, Kuncak V, Kuraj I, et al. Complete completion using types and weights. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013. 27–38
Zheng Q, Xia X, Zou X, et al. CodeGeeX: a pre-trained model for code generation with multilingual evaluations on HumanEval-X. 2023. ArXiv:2303.17568
Rabinovich M, Stern M, Klein D. Abstract syntax networks for code generation and semantic parsing. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 1139–1149
Iyer S, Cheung A, Zettlemoyer L. Learning programmatic idioms for scalable semantic parsing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019. 5425–5434
Yin P, Neubig G. A syntactic neural model for general-purpose code generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 440–450
Yin P, Neubig G. TRANX: a transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018. 7–12
Jiang H, Zhou C, Meng F, et al. Exploring dynamic selection of branch expansion orders for code generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 5076–5085
Dong L, Lapata M. Language to logical form with neural attention. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
Yu T, Zhang R, Yang K, et al. Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. 3911–3921
Sethi A, Sankaran A, Panwar N, et al. DLPaper2Code: auto-generation of code from deep learning research papers. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018
Yang G, Zhou Y, Chen X, et al. ExploitGen: template-augmented exploit code generation based on CodeBERT. J Syst Software, 2023, 197: 111577
Ling W, Blunsom P, Grefenstette E, et al. Latent predictor networks for code generation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016
Lyu C, Wang R, Zhang H, et al. Embedding API dependency graph for neural code generation. Empir Software Eng, 2021, 26: 61
Clement C B, Drain D, Timcheck J, et al. PyMT5: multi-mode translation of natural language and Python code with transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020. 9052–9065
Le H, Wang Y, Gotmare A D, et al. CodeRL: mastering code generation through pretrained models and deep reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2022. 35: 21314–21328
Wang Y, Wang W, Joty S R, et al. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021. 8696–8708
Sun Y, Tang D, Duan N, et al. Semantic parsing with syntax- and table-aware SQL generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. 361–372
Wang X, Wang Y, Wan Y, et al. Compilable neural code generation with compiler feedback. In: Proceedings of Findings of the Association for Computational Linguistics, 2022. 9–19
Poesia G, Polozov A, Le V, et al. Synchromesh: reliable code generation from pre-trained language models. In: Proceedings of the 10th International Conference on Learning Representations, 2022
Wei B, Li G, Xia X, et al. Code generation as a dual task of code summarization. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 32
Ahmad W U, Chakraborty S, Ray B, et al. Unified pre-training for program understanding and generation. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021. 2655–2668
Ye W, Xie R, Zhang J, et al. Leveraging code generation to improve code retrieval and summarization via dual learning. In: Proceedings of the Web Conference 2020, 2020. 2309–2319
Hashimoto T B, Guu K, Oren Y, et al. A retrieve-and-edit framework for predicting structured outputs. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 31
Kulal S, Pasupat P, Chandra K, et al. SPoC: search-based pseudocode to code. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 32
Parvez M R, Ahmad W U, Chakraborty S, et al. Retrieval augmented code generation and summarization. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 2719–2734
Iyer S, Konstas I, Cheung A, et al. Mapping language to code in programmatic context. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018. 1643–1652
Guo D, Tang D, Duan N, et al. Coupling retrieval and meta-learning for context-dependent semantic parsing. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019. 855–866
Li J, Li Y, Li G, et al. SkCoder: a sketch-based approach for automatic code generation. 2023. ArXiv:2302.06144
Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. 731–742
Shen S, Zhu X, Dong Y, et al. Incorporating domain knowledge through task augmentation for front-end JavaScript code generation. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022. 1533–1543
Sun Z, Zhu Q, Mou L, et al. A grammar-based structural CNN decoder for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 7055–7062
Sun Z, Zhu Q, Xiong Y, et al. TreeGen: a tree-based transformer architecture for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 8984–8991
Xie B, Su J, Ge Y, et al. Improving tree-structured decoder training for code generation via mutual learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 14121–14128
Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014. ArXiv:1412.3555
Liu F, Li G, Zhao Y, et al. Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2021. 473–485
Izadi M, Gismondi R, Gousios G. CodeFill: multi-token code completion by jointly learning from structure and naming sequences. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 401–412
Tang Z, Ge J, Liu S, et al. Domain adaptive code completion via language models and decoupled domain databases. 2023. ArXiv:2308.09313
Sun Z, Du X, Song F, et al. CodeMark: imperceptible watermarking for code datasets against neural code completion models. 2023. ArXiv:2308.14401
Wang C, Hu J, Gao C, et al. Practitioners’ expectations on code completion. 2023. ArXiv:2301.03846
Nie P, Banerjee R, Li J J, et al. Learning deep semantics for test completion. 2023. ArXiv:2302.10166
Dahal S, Maharana A, Bansal M. Analysis of tree-structured architectures for code generation. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 4382–4391
Norouzi S, Tang K, Cao Y. Code generation from natural language with less prior knowledge and more monolingual data. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 776–785
Mastropaolo A, Pascarella L, Guglielmi E, et al. On the robustness of code generation techniques: an empirical study on GitHub copilot. 2023. ArXiv:2302.00438
Xu F F, Vasilescu B, Neubig G. In-IDE code generation from natural language: promise and challenges. ACM Trans Softw Eng Methodol, 2022, 31: 1–47
Liang Q, Sun Z, Zhu Q, et al. Lyra: a benchmark for turducken-style code generation. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, 2022. 4238–4244
Hendrycks D, Basart S, Kadavath S, et al. Measuring coding challenge competence with APPS. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
Lu S, Guo D, Ren S, et al. CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
Shen X, Chen Z, Backes M, et al. In ChatGPT we trust? Measuring and characterizing the reliability of ChatGPT. 2023. ArXiv:2304.08979
Lukins S K, Kraft N A, Etzkorn L H. Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of the 15th Working Conference on Reverse Engineering, Antwerp, 2008. 155–164
Chatterjee S, Juvekar S, Sen K. SNIFF: a search engine for Java using free-form queries. In: Fundamental Approaches to Software Engineering. Berlin: Springer, 2009. 385–400
Hill E, Roldan-Vega M, Fails J A, et al. NL-based query refinement and contextualized code search results: a user study. In: Proceedings of IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, 2014. 34–43
McMillan C, Grechanik M, Poshyvanyk D, et al. Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, 2011. 111–120
Li X, Wang Z, Wang Q, et al. Relationship-aware code search for JavaScript frameworks. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016. 690–701
Sachdev S, Li H, Luan S, et al. Retrieval on source code: a neural code search. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018. 31–41
Zou Y, Ling C, Lin Z, et al. Graph embedding based code search in software project. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, 2018. 1–10
Gu W, Li Z, Gao C, et al. Cradle: deep code retrieval based on semantic dependency learning. Neural Networks, 2021, 141: 385–394
Wan Y, Shu J, Sui Y, et al. Multi-modal attention network learning for semantic source code retrieval. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 13–25
Ling X, Wu L, Wang S, et al. Deep graph matching and searching for semantic code retrieval. ACM Trans Knowledge Discov Data, 2021, 15: 1–21
Liu S, Xie X, Ma L, et al. GraphSearchNET: enhancing GNNs via capturing global dependency for semantic code search. 2021. ArXiv:2111.02671
Li X, Gong Y, Shen Y, et al. CodeRetriever: unimodal and bimodal contrastive learning. 2022. ArXiv:2201.10866
Jiang H, Nie L, Sun Z, et al. ROSF: leveraging Information Retrieval and Supervised Learning for Recommending Code Snippets. IEEE Trans Serv Comput, 2019, 12: 34–46
Guo D, Ren S, Lu S, et al. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations, 2021
Guo D, Lu S, Duan N, et al. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 7212–7225
Shi Z, Xiong Y, Zhang X, et al. Cross-modal contrastive learning for code search. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 94–105
Bui N D Q, Yu Y, Jiang L. Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021. 511–521
Shi E, Wang Y, Gu W, et al. CoCoSoDa: effective contrastive learning for code search. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023. 2198–2210
Bajracharya S K, Ngo T C, Linstead E, et al. Sourcerer: a search engine for open source code supporting structure-based search. In: Proceedings of Companion to the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2006. 681–682
Lu M, Sun X, Wang S, et al. Query expansion via WordNet for effective code search. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, 2015. 545–549
Lv F, Zhang H, Lou J, et al. CodeHow: effective code search based on API understanding and extended Boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015. 260–270
Rahman M M. Supporting code search with context-aware, analytics-driven, effective query reformulation. In: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, 2019. 226–229
Hill E, Pollock L L, Vijay-Shanker K. Improving source code search with natural language phrasal representations of method signatures. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, 2011. 524–527
Liu J, Kim S, Murali V, et al. Neural query expansion for code search. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2019. 29–37
Cao K, Chen C, Baltes S, et al. Automated query reformulation for efficient search based on query logs from stack overflow. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 1273–1285
Li D, Shen Y, Jin R, et al. Generation-augmented query expansion for code retrieval. 2022. arXiv:2212.10692
Luan S, Yang D, Barnaby C, et al. Aroma: code recommendation via structural code search. Proc ACM Program Lang, 2019, 3: 1–28
Mathew G, Stolee K T. Cross-language code search using static and dynamic analyses. In: Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, 2021. 205–217
Perez D, Chiba S. Cross-language clone detection by learning over abstract syntax trees. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 518–528
Nguyen T D, Nguyen A T, Phan H D, et al. Exploring API embedding for API usages and applications. In: Proceedings of the 39th International Conference on Software Engineering, 2017. 438–449
Chen B, Abedjan Z. Interactive cross-language code retrieval with auto-encoders. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 167–178
Huang J, Tang D, Shou L, et al. CoSQA: 20,000+ web queries for code search and question answering. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. 5690–5700
Khan M A M, Bari M S, Do X L, et al. xCodeEval: a large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. 2023. ArXiv:2303.03004
Wang C, Peng X, Xing Z C, et al. XCoS: explainable code search based on query scoping and knowledge graph. ACM Trans Softw Eng Methodol, 2023, 32: 1–28
Sun Z, Li L, Liu Y, et al. On the importance of building high-quality training datasets for neural code search. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, 2022. 1609–1620
Gotmare A D, Li J, Joty S R, et al. Cascaded fast and slow models for efficient semantic code search. 2021. ArXiv:2110.07811
Gu W, Wang Y, Du L, et al. Accelerating code search with deep hashing and code classification. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 2534–2544
Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015. 379–389
Alon U, Brody S, Levy O, et al. code2seq: generating sequences from structured representations of code. In: Proceedings of the 7th International Conference on Learning Representations, 2019
Xu K, Wu L, Wang Z, et al. Graph2Seq: graph to sequence learning with attention-based neural networks. 2018. ArXiv:1804.00823
Sridhara G, Hill E, Muppaneni D, et al. Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, 2010. 43–52
Abid N J, Dragan N, Collard M L, et al. Using stereotypes in the automatic generation of natural language summaries for C++ methods. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 2015. 561–565
Haiduc S, Aponte J, Moreno L, et al. On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering, 2010. 35–44
Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010. 223–226
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33rd International Conference on Machine Learning, 2016. 2091–2100
Ahmad W U, Chakraborty S, Ray B, et al. A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. 4998–5007
Wang R, Zhang H, Lu G, et al. Fret: functional reinforced transformer with BERT for code summarization. IEEE Access, 2020, 8: 135591
Zhang J, Wang X, Zhang H, et al. Retrieval-based neural source code summarization. In: Proceedings of the 42nd International Conference on Software Engineering, Seoul, 2020. 1385–1397
LeClair A, Bansal A, McMillan C. Ensemble models for neural source code summarization of subroutines. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 2021. 286–297
Gong Z, Gao C, Wang Y, et al. Source code summarization with structural relative position guided transformer. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering, 2022. 13–24
Chen Q, Zhou M. A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 826–831
Jiang S, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017. 135–146
Jiang S, McMillan C. Towards automatic generation of short summaries of commits. In: Proceedings of the 25th International Conference on Program Comprehension, 2017. 320–323
Jiang S. Boosting neural commit message generation with code semantic analysis. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 1280–1282
Liu Z, Xia X, Treude C, et al. Automatic generation of pull request descriptions. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019. 176–188
Bansal A, Haque S, McMillan C. Project-level encoding for neural source code summarization of subroutines. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 253–264
Xie R, Ye W, Sun J, et al. Exploiting method names to improve code summarization: a deliberation multi-task learning approach. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 138–148
Hu X, Li G, Xia X, et al. Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, 2018. 200–210
Hu X, Li G, Xia X, et al. Deep code comment generation with hybrid lexical and syntactical information. Empir Software Eng, 2020, 25: 2179–2217
Huang Y, Huang S, Chen H, et al. Towards automatically generating block comments for code snippets. Inf Software Tech, 2020, 127: 106373
Tang Z, Shen X, Li C, et al. AST-Trans: code summarization with efficient tree-structured attention. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, 2022. 150–162
Liu S, Gao C, Chen S, et al. ATOM: commit message generation based on abstract syntax tree and hybrid ranking. IEEE Trans Software Eng, 2022, 48: 1800–1817
Wan Y, Zhao Z, Yang M, et al. Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 397–407
LeClair A, Jiang S, McMillan C. A neural model for generating natural language summaries of program subroutines. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 795–806
Xu S, Yao Y, Xu F, et al. Commit message generation for source code changes. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019. 3975–3981
Zhou Y, Shen J, Zhang X, et al. Automatic source code summarization with graph attention networks. J Syst Softw, 2022, 188: 111257
Liang Y, Zhu K. Automatic generation of text descriptive comments for code blocks. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018
Wang W, Zhang Y, Zeng Z, et al. TranS3: a transformer-based framework for unifying code summarization and code search. 2020. ArXiv:2003.03238
Lin C, Ouyang Z, Zhuang J, et al. Improving code summarization with block-wise abstract syntax tree splitting. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 184–195
Shi E, Wang Y, Du L, et al. CAST: enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021. 4053–4062
Fernandes P, Allamanis M, Brockschmidt M. Structured neural summarization. In: Proceedings of the 7th International Conference on Learning Representations, 2019
LeClair A, Haque S, Wu L, et al. Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, Seoul, 2020. 184–195
Liu S, Chen Y, Xie X, et al. Retrieval-augmented generation for code summarization via hybrid GNN. In: Proceedings of the 9th International Conference on Learning Representations, 2021
Liu X, Wang D, Wang A Y, et al. HAConvGNN: hierarchical attention based convolutional graph neural network for code documentation generation in Jupyter notebooks. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 4473–4485
Cheng W, Hu P, Wei S, et al. Keyword-guided abstractive code summarization via incorporating structural and contextual information. Inf Software Tech, 2022, 150: 106987
Guo J, Liu J, Wan Y, et al. Modeling hierarchical syntax structure with triplet position for source code summarization. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022. 486–500
Ma Z, Gao Y, Lyu L, et al. MMF3: neural code summarization based on multi-modal fine-grained feature fusion. In: Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Helsinki Finland, 2022. 171–182
Wang Y, Dong Y, Lu X, et al. GypSum: learning hybrid representations for code summarization. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022. 12–23
Hu X, Li G, Xia X, et al. Summarizing source code with transferred API knowledge. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. 2269–2275
Shahbazi R, Sharma R, Fard F H. API2Com: on the improvement of automatically generated code comments using API documentations. In: Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension, 2021. 411–421
Gao X, Jiang X, Wu Q, et al. GT-SimNet: improving code automatic summarization via multi-modal similarity networks. J Syst Software, 2022, 194: 111495
Zhou Y, Yan X, Yang W, et al. Augmenting Java method comments generation with context information based on neural networks. J Syst Software, 2019, 156: 328–340
Wang W, Zhang Y, Sui Y, et al. Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Trans Software Eng, 2022, 48: 102–119
Wang Y, Du L, Shi E, et al. CoCoGUM: Contextual Code Summarization With Multi-Relational GNN on UMLs. Microsoft, Technical Report, MSR-TR-2020-16, 2020
Son J, Hahn J, Seo H, et al. Boosting code summarization by embedding code structures. In: Proceedings of the 29th International Conference on Computational Linguistics, 2022. 5966–5977
Zhang C, Zhou Q, Qiao M, et al. Re_Trans: combined retrieval and transformer model for source code summarization. Entropy, 2022, 24: 1372
Huang Y, Huang J, Chen X, et al. BCGen: a comment generation method for bytecode. Autom Softw Eng, 2023, 30: 5
Barone A V M, Sennrich R. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. In: Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017. 314–319
Guo H Y, Chen X P, Huang Y, et al. Snippet comment generation based on code context expansion. ACM Trans Softw Eng Methodol, 2024, 33: 1–30
Fowler M, Beck K, Brant J, et al. Refactoring: Improving the Design of Existing Code. Redding: Addison-Wesley Professional, 1999
Tsantalis N, Chatzigeorgiou A. Identification of move method refactoring opportunities. IEEE Trans Software Eng, 2009, 35: 347–367
Terra R, Valente M T, Miranda S, et al. JMove: a novel heuristic and tool to detect move method refactoring opportunities. J Syst Software, 2018, 138: 19–36
Liu H, Xu Z, Zou Y. Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 385–396
Kurbatova Z, Veselov I, Golubev Y, et al. Recommendation of move method refactoring using path-based representation of code. In: Proceedings of the 4th International Workshop on Refactoring, 2020. 315–322
Sharma T, Efstathiou V, Louridas P, et al. Code smell detection by deep direct-learning and transfer-learning. J Syst Software, 2021, 176: 110936
Liu H, Jin J H, Xu Z F, et al. Deep learning based code smell detection. IEEE Trans Software Eng, 2021, 47: 1811–1837
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
Wang X, Zhao Y, Pourpanah F. Recent advances in deep learning. Int J Mach Learn Cyber, 2020, 11: 747–750
Barbez A, Khomh F, Guéhéneuc Y G. Deep learning anti-patterns from code metrics history. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2019. 114–124
Yu D, Xu Y, Weng L, et al. Detecting and refactoring feature envybased on graph neural network. In: Proceedings of the 33rd International Symposium on Software Reliability Engineering, 2022. 458–469
Alon U, Zilberstein M, Levy O, et al. Code2vec: learning distributed representations of code. In: Proceedings of the ACM on Programming Languages, 2019. 1–29
Cui D, Wang S, Luo Y, et al. RMove: recommending move method refactoring opportunities using structural and semantic representations of code. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2022. 281–292
Yedida R, Menzies T. On the value of oversampling for deep learning in software defect prediction. IEEE Trans Software Eng, 2022, 48: 3103–3116
Yedida R, Menzies T. How to improve deep learning for software analytics: (a case study with code smell detection). In: Proceedings of the 19th International Conference on Mining Software Repositories, 2022. 156–166
Liu H, Liu Q, Liu Y, et al. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Software Eng, 2015, 41: 887–900
Liang J, Zou W, Zhang J, et al. A deep method renaming prediction and refinement approach for Java projects. In: Proceedings of the 21st International Conference on Software Quality, Reliability and Security), 2021. 404–413
Kenton J D M W C, Toutanova L K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 4171–4186
Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation, 2017. 502–518
Liu K, Kim D, Bissyandé T F, et al. Learning to spot and refactor inconsistent method names. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 1–12
Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, 2014. 1188–1196
Tufano M, Pantiuchina J, Watson C, et al. On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st International Conference on Software Engineering, 2019. 25–36
Nyamawe A S, Liu H, Niu N, et al. Feature requests-based recommendation of software refactorings. Empir Software Eng, 2020, 25: 4315–4347
AlOmar E A, Ivanov A, Kurbatova Z, et al. Just-in-time code duplicates extraction. Inf Software Tech, 2023, 158: 107169
Chi X Y, Liu H, Li G J, et al. An automated approach to extracting local variables. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, 2023
Desai U, Bandyopadhyay S, Tamilselvam S. Graph neural network to dilute outliers for refactoring monolith application. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 72–80
Madeyski L, Lewowski T. MLCQ: industry-relevant code smell data set. In: Proceedings of the 24th Evaluation and Assessment in Software Engineering, 2020. 342–347
Liu B, Liu H, Li G J, et al. Deep learning based feature envy detection boosted by real-world examples. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, 2023
Tsantalis N, Ketkar A, Dig D. RefactoringMiner 2.0. IEEE Trans Software Eng, 2022, 48: 930–950
Silva D, da Silva J P, Santos G, et al. RefDiff 2.0: a multi-language refactoring detection tool. IEEE Trans Software Eng, 2021, 47: 2786–2802
Kim M, Gee M, Loh A, et al. Ref-Finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Santa Fe, 2010. 371–372
Yin X, Shi C, Zhao S. Local and global feature based explainable feature envy detection. In: Proceedings of the IEEE 45th Annual Computers, Software, and Applications Conference, 2021. 942–951
Liu B, Liu H, Li G J, et al. Automated software entity matching between successive versions. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023
Svajlenko J, Islam J F, Keivanloo I, et al. Towards a big data curated benchmark of inter-project code clones. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2014. 476–480
Chochlov M, Ahmed G A, Patten J V, et al. Using a nearest-neighbour, BERT-based approach for scalable clone detection. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 582–591
Sajnani H, Saini V, Svajlenko J, et al. SourcererCC: scaling code clone detection to big-code. In: Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE), 2016. 1157–1168
Arshad S, Abid S, Shamail S. CodeBERT for code clone detection: a replication study. In: Proceedings of the IEEE 16th International Workshop on Software Clones (IWSC), 2022. 39–45
Mehrotra N, Agarwal N, Gupta P, et al. Modeling functional similarity in source code with graph-based siamese networks. IEEE Trans Software Eng, 2022, 48: 3771–3789
Xue Z, Jiang Z, Huang C, et al. SEED: semantic graph based deep detection for Type-4 clone. In: Proceedings of Reuse and Software Quality, 2022. 120–137
Karthik S, Rajdeepa B. A collaborative method for code clone detection using a deep learning model. Adv Eng Software, 2022, 174: 103327
Li B, Ye C, Guan S, et al. Semantic code clone detection via event embedding tree and gat network. In: Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020. 382–393
Zhang A, Liu K, Fang L, et al. Learn to align: a code alignment network for code clone detection. In: Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC), 2021. 1–11
Jo Y B, Lee J, Yoo C J. Two-pass technique for clone detection and type classification using tree-based convolution neural network. Appl Sci, 2021, 11: 6613
Kim D K. A deep neural network-based approach to finding similar code segments. IEICE Trans Inf Syst, 2020, E103.D: 874–878
Wu Y, Zou D, Dou S, et al. SCDetector: software functional clone detection based on semantic tokens analysis. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020. 821–833
Feng C, Wang T, Yu Y, et al. Sia-RAE: a siamese network based on recursive AutoEncoder for effective clone detection. In: Proceedings of the 27th Asia-Pacific Software Engineering Conference (APSEC), 2020. 238–246
Yuan Y, Kong W, Hou G, et al. From local to global semantic clone detection. In: Proceedings of the 6th International Conference on Dependable Systems and Their Applications (DSA), 2020. 13–24
Hua W, Sui Y, Wan Y, et al. FCCA: hybrid code representation for functional clone detection using attention networks. IEEE Trans Rel, 2021, 70: 304–318
Wang W, Li G, Ma B, et al. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2020. 261–271
Fang C, Liu Z, Shi Y, et al. Functional code clone detection with syntax and semantics fusion learning. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 516–527
Guo C, Yang H, Huang D, et al. Review sharing via deep semi-supervised code clone detection. IEEE Access, 2020, 8: 24948–24965
Meng Y, Liu L. A deep learning approach for a source code detection model using self-attention. Complexity, 2020, 2020: 1–15
Zeng J, Ben K, Li X, et al. Fast code clone detection based on weighted recursive autoencoders. IEEE Access, 2019, 7: 125062
Zhang Y Y, Li M. Find me if you can: deep software clone detection by exploiting the contest between the plagiarist and the detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 33: 5813–5820
Büch L, Andrzejak A. Learning-based recursive aggregation of abstract syntax trees for code clone detection. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 95–104
Yu H, Lam W, Chen L, et al. Neural detection of semantic code clones via tree-based convolution. In: Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 2019. 70–80
Wang C, Gao J, Jiang Y, et al. Go-clone: graph-embedding based clone detector for Golang. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 374–377
Shi H, Wang R, Fu Y, et al. Vulnerable code clone detection for operating system through correlation-induced learning. IEEE Trans Ind Inf, 2019, 15: 6551–6559
Saini V, Farmahinifarahani F, Lu Y, et al. Oreo: detection of clones in the twilight zone. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 354–365
Zhao G, Huang J. DeepSim: deep learning code functional similarity. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 141–151
Sheneamer A. CCDLC detection framework-combining clustering with deep learning classification for semantic clones. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018. 701–706
Wei H H, Li M. Positive and unlabeled learning for detecting software functional clones with adversarial training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. 2840–2846
Wei H H, Li M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 3034–3040
White M, Tufano M, Vendome C, et al. Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 87–98
Sheneamer A, Kalita J. Semantic clone detection using machine learning. In: Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016. 1024–1028
Zhang J, Wang X, Zhang H, et al. A novel neural source code representation based on abstract syntax tree. In: Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019. 783–794
Wu M, Wang P, Yin K, et al. LVMapper: a large-variance clone detector using sequencing alignment approach. IEEE Access, 2020, 8: 27986–27997
Li L, Feng H, Zhuang W, et al. CCLearner: a deep learning-based clone detection approach. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 249–260
Jiang L, Misherghi G, Su Z, et al. DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, 2007. 96–105
Svajlenko J, Roy C K. Fast and flexible large-scale clone detection with cloneworks. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017. 27–30
Roy C K, Cordy J R. NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: Proceedings of the 16th IEEE International Conference on Program Comprehension, 2008. 172–181
Kim S, Woo S, Lee H, et al. VUDDY: a scalable approach for vulnerable code clone discovery. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2017. 595–614
Wang D, Jia Z, Li S, et al. Bridging pre-trained models and downstream tasks for source code understanding. In: Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 287–298
Siow J K, Liu S, Xie X, et al. Learning program semantics with code representations: an empirical study. In: Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022. 554–565
Karakatič S, Miloševič A, Heričko T. Software system comparison with semantic source code embeddings. Empir Software Eng, 2022, 27: 70
Bui N D Q, Yu Y, Jiang L. InferCode: self-supervised learning of code representations by predicting subtrees. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 1186–1197
Wu Q, Jiang X, Zheng Z, et al. Code representation based on hybrid graph modelling. In: Proceedings of Neural Information Processing. Cham: Springer International Publishing, 2021. 298–306
Chen L, Ye W, Zhang S. Capturing source code semantics via tree-based convolution over API-enhanced AST. In: Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019. 174–182
Gao Y, Wang Z, Liu S, et al. TECCD: a tree embedding approach for code clone detection. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019. 145–156
Tufano M, Watson C, Bavota G, et al. Deep learning similarities from different representations of source code. In: Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), 2018. 542–553
Mou L, Li G, Zhang L, et al. Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016. 1287–1293
Wang P, Svajlenko J, Wu Y, et al. CCAligner: a token based large-gap clone detector. In: Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018. 1066–1077
Terra R, Miranda L F, Valente M T, et al. Qualitas.class corpus: a compiled version of the qualitas corpus. SIGSOFT Softw Eng Notes, 2013, 38: 1–4
Yahya M A, Kim D K. CLCD-I: cross-language clone detection by using deep learning with InferCode. Computers, 2023, 12: 12
Wang K, Yan M, Zhang H, et al. Unified abstract syntax tree representation learning for cross-language program classification. In: Proceedings of the IEEE/ACM 30th International Conference on Program Comprehension (ICPC), 2022. 390–400
Bui N D Q, Yu Y, Jiang L. Bilateral dependency neural networks for cross-language algorithm classification. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 422–433
Nafi K W, Kar T S, Roy B, et al. CLCDSA: cross language code clone detection using syntactical features and API documentation. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019. 1026–1037
Bromley J, Guyon I, LeCun Y, et al. Signature verification using a “Siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, San Francisco, 1993. 737–744
Vislavski T, Rakić G, Cardozo N, et al. LICCA: a tool for cross-language clone detection. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018. 512–516
Cheng X, Peng Z, Jiang L, et al. Mining revision histories to detect cross-language clones without intermediates. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), 2016. 696–701
Marastoni N, Giacobazzi R, Preda M D. A deep learning approach to program similarity. In: Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, 2018. 26–35
Xue H, Venkataramani G, Lan T. Clone-Slicer: detecting domain specific binary code clones through program slicing. In: Proceedings of the Workshop on Forming an Ecosystem Around Software Transformation, 2018. 27–33
Xu X, Liu C, Feng Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017. 363–376
Xue H, Venkataramani G, Lan T. Clone-hunter: accelerated bound checks elimination via binary code clone detection. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018. 11–19
Feng Q, Zhou R, Xu C, et al. Scalable graph-based bug search for firmware images. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2016. 480–491
Mostaeen G, Svajlenko J, Roy B, et al. On the use of machine learning techniques towards the design of cloud based automatic code clone validation tools. In: Proceedings of the IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2018. 155–164
Saini V, Farmahinifarahani F, Lu Y, et al. Towards automating precision studies of clone detectors. In: Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019. 49–59
Liu C, Lin Z, Lou J G, et al. Can neural clone detection generalize to unseen functionalities? In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021. 617–629
Yu H, Hu X, Li G, et al. Assessing and improving an evaluation dataset for detecting semantic code clones via deep learning. ACM Trans Softw Eng Methodol, 2022, 31: 1–25
Krinke J, Ragkhitwetsagul C. Bigclonebench considered harmful for machine learning. In: Proceedings of the IEEE 16th International Workshop on Software Clones (IWSC), 2022. 1–7
Al-Omari F, Roy C K, Chen T. SemanticCloneBench: a semantic code clone benchmark using crowd-source knowledge. In: Proceedings of the IEEE 14th International Workshop on Software Clones (IWSC), 2020. 57–63
Kamp M, Kreutzer P, Philippsen M. SeSaMe: a data set of semantically similar Java methods. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 529–533
Yang X, Lo D, Xia X, et al. Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security, 2015. 17–26
Phan A V, Nguyen M L, Bui L T. Convolutional neural networks over control flow graphs for software defect prediction. In: Proceedings of the IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), 2017. 45–52
Li J, He P, Zhu J, et al. Software defect prediction via convolutional neural network. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS), 2017. 318–328
Huo X, Yang Y, Li M, et al. Learning semantic features for software defect prediction by code comments embedding. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2018. 1049–1054
Liu Y, Li Y, Guo J, et al. Connecting software metrics across versions to predict defects. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018. 232–243
Tong H, Liu B, Wang S. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Software Tech, 2018, 96: 94–111
Qiu S, Lu L, Cai Z, et al. Cross-project defect prediction via transferable deep learning-generated and handcrafted features. In: Proceedings of International Conference on Software Engineering and Knowledge Engineering, 2019
Hoang T, Dam H K, Kamei Y, et al. DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 34–45
Zhou T, Sun X, Xia X, et al. Improving defect prediction with deep forest. Inf Software Tech, 2019, 114: 204–216
Xu Z, Li S, Xu J, et al. LDFR: learning deep feature representation for software defect prediction. J Syst Software, 2019, 158: 110402
Turabieh H, Mafarja M, Li X. Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl, 2019, 122: 27–42
Dam H K, Pham T, Ng S W, et al. Lessons learned from using a deep tree-based model for software defect prediction in practice. In: Proceedings of the 16th International Conference on Mining Software Repositories, 2019. 46–57
Li H, Li X, Chen X, et al. Cross-project defect prediction via AST Token2Vec and BLSTM-based neural network. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2019. 1–8
Chen J, Hu K, Yu Y, et al. Software visualization and deep transfer learning for effective software defect prediction. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020. 578–589
Zhu K, Zhang N, Ying S, et al. Within-project and cross-project just-in-time defect prediction based on denoising autoencoder and convolutional neural network. IET Softw, 2020, 14: 185–195
Wang S, Liu T, Nam J, et al. Deep semantic feature learning for software defect prediction. IEEE Trans Software Eng, 2020, 46: 1267–1293
Deng J, Lu L, Qiu S. Software defect prediction via LSTM. IET softw, 2020, 14: 443–450
Shi K, Lu Y, Chang J, et al. PathPair2Vec: an AST path pair-based code representation method for defect prediction. J Comput Languages, 2020, 59: 100979
Majd A, Vahidi-Asl M, Khalilian A, et al. SLDeep: statement-level software defect prediction using deep-learning model on static code features. Expert Syst Appl, 2020, 147: 113156
Wen M, Wu R, Cheung S C. How well do change sequences predict defects? Sequence learning from software changes. IEEE Trans Software Eng, 2018, 46: 1155–1175
Shi K, Lu Y, Liu G, et al. MPT-embedding: an unsupervised representation learning of code for software defect prediction. J Software Evolu Process, 2021, 33: e2330
Xu Z, Zhao K, Zhang T, et al. Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans Rel, 2022, 71: 204–220
Xu J, Wang F, Ai J. Defect prediction with semantics and context features of codes based on graph representation learning. IEEE Trans Rel, 2020, 70: 613–625
Zeng C, Zhou C Y, Lv S K, et al. GCN2defect: graph convolutional networks for SMOTETomek-based software defect prediction. In: Proceedings of the IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 2021. 69–79
Xu J, Ai J, Liu J, et al. ACGDP: an augmented code graph-based system for software defect prediction. IEEE Trans Rel, 2022, 71: 850–864
Wang H, Zhuang W, Zhang X. Software defect prediction based on gated hierarchical LSTMs. IEEE Trans Rel, 2021, 70: 711–727
Zou Q, Lu L, Yang Z, et al. Joint feature representation learning and progressive distribution matching for cross-project defect prediction. Inf Software Tech, 2021, 137: 106588
Zhang N, Ying S, Zhu K, et al. Software defect prediction based on stacked sparse denoising autoencoders and enhanced extreme learning machine. IET Software, 2022, 16: 29–47
Uddin M N, Li B, Ali Z, et al. Software defect prediction employing BiLSTM and BERT-based semantic feature. Soft Comput, 2022, 26: 7877–7891
Ardimento P, Aversano L, Bernardi M L, et al. Just-in-time software defect prediction using deep temporal convolutional networks. Neural Comput Applic, 2022, 34: 3981–4001
Pornprasit C, Tantithamthavorn C K. DeepLineDP: towards a deep learning approach for line-level defect prediction. IEEE Trans Software Eng, 2023, 49: 84–98
Qiu S, Huang H, Jiang W, et al. Defect prediction via tree-based encoding with hybrid granularity for software sustainability. IEEE Trans Sustain Comput, 2024, 9: 249–260
Johnson S C. Lint, a C program checker. 1977. oai:CiteSeerX.psu:10.1.1.56.1841
Hovemeyer D, Pugh W. Finding bugs is easy. ACM SIGPLAN Not, 2004, 39: 92–106
Facebook. Infer: a tool to detect bugs in Java and C/C++/objective-C code before it ships, 2015. https://fbinfer.com/
Orso A, Rothermel G. Software testing: a research travelogue (2000–2014). In: Proceedings of Future of Software Engineering Proceedings, 2014
Cadar C, Dunbar D, Engler D R, et al. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 2008
Nelson L, Sigurbjarnarson H, Zhang K, et al. Hyperkernel: push-button verification of an OS kernel. In: Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017
Leroy X. Formal verification of a realistic compiler. Commun ACM, 2009, 52: 107–115
Klein G, Andronick J, Elphinstone K, et al. seL4: formal verification of an OS kernel. Commun ACM, 2010, 53: 107–115
D’Silva V, Kroening D, Weissenbacher G. A survey of automated techniques for formal software verification. IEEE Trans Comput-Aided Des Integr Circ Syst, 2008, 27: 1165–1178
Knuth D E. The Art of Computer Programming, Volume 1: Fundamental Algorithms. 3rd ed. Redding: Addison-Wesley Professional, 1997
Hou X, Zhao Y, Liu Y, et al. Large language models for software engineering: a systematic literature review. 2023. ArXiv:2308.10620
Fan A, Gokkaya B, Harman M, et al. Large language models for software engineering: survey and open problems. 2023. ArXiv:2310.03533
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489
Qiao S, Ou Y, Zhang N, et al. Reasoning with language model prompting: a survey. 2022. ArXiv:2212.09597
Huang J, Chang K C C. Towards reasoning in large language models: a survey. 2022. ArXiv:2212.10403
Abelson H, Sussman G J. Structure and Interpretation of Computer Programs. 2nd ed. Cambridge: The MIT Press, 1996
Hindle A, Barr E T, Gabel M, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), 2016
van Rossum G, Warsaw B, Coghlan N. PEP 8–style guide for python code. 2001. https://peps.python.org/pep-0008/
Reddy A. Java coding style guide, 2000
Engler D, Chen D Y, Hallem S, et al. Bugs as deviant behavior: a general approach to inferring errors in systems code. SIGOPS Oper Syst Rev, 2001, 35: 57–72
Li Z, Lu S, Myagmar S, et al. CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Software Eng, 2006, 32: 176–192
Allamanis M, Jackson-Flux H, Brockschmidt M. Self-supervised bug detection and repair. In: Proceedings of Advances in Neural Information Processing Systems, 2021. 27865–27876
Sharma T, Kechagia M, Georgiou S, et al. A survey on machine learning techniques for source code analysis. 2021. ArXiv:2110.09610v2
Jiang Y, Liu H, Zhang Y, et al. Do bugs lead to unnaturalness of source code? In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022
Rice H G. Classes of recursively enumerable sets and their decision problems. Trans Amer Math Soc, 1953, 74: 358–366
Livshits B, Sridharan M, Smaragdakis Y, et al. In defense of soundiness: a manifesto. Commun ACM, 2015, 58: 44–46
Heo K, Oh H, Yang H. Resource-aware program analysis via online abstraction coarsening. In: Proceedings of IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
Ko Y, Oh H. Learning to boost disjunctive static bug-finders. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Li H, Hao Y, Zhai Y, et al. The hitchhiker’s guide to program analysis: a journey with large language models. 2023. ArXiv:2308.00245
Chae K, Oh H, Heo K, et al. Automatically generating features for learning program analysis heuristics for C-like languages. In: Proceedings of the ACM on Programming Languages, 2017
Heo K, Oh H, Yi K. Machine-learning-guided selectively unsound static analysis. In: Proceedings of IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017
Jeon M, Lee M, Oh H. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features. In: Proceedings of the ACM on Programming Languages, 2020
Jeong S, Jeon M, Cha S, et al. Data-driven context-sensitivity for points-to analysis. In: Proceedings of the ACM on Programming Languages, 2017
He J, Singh G, Püschel M, et al. Learning fast and precise numerical analysis. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, 2020
Zaremba W, Sutskever I. Learning to execute. 2014. ArXiv:1410.4615
Malik R S, Patra J, Pradel M. NL2Type: inferring JavaScript function types from natural language information. In: Proceedings of IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
Jesse K, Devanbu P T, Ahmed T. Learning type annotation: is big data enough? In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021
Yu D, Yang B, Liu D, et al. A survey on neural-symbolic learning systems. Neural Netws, 2023, 166: 105–126
Wang W, Yang Y, Wu F. Towards data-and knowledge-driven AI: a survey on neuro-symbolic computing. IEEE Trans Pattern Anal Mach Intell, 2024. doi: https://doi.org/10.1109/TPAMI.2024.3483273
She D, Pei K, Epstein D, et al. NEUZZ: efficient fuzzing with neural program smoothing. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2019. 803–817
She D, Krishna R, Yan L, et al. MTFuzz: fuzzing with a multi-task neural network. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020. 737–749
Wu M, Jiang L, Xiang J, et al. Evaluating and improving neural program-smoothing-based fuzzing. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 847–858
Nicolae M I, Eisele M, Zeller A. Revisiting neural program smoothing for fuzzing. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
Zeller A. Mining specifications: a roadmap. In: The Future of Software Engineering. Berlin: Springer, 2011
Serebryany K, Bruening D, Potapenko A, et al. AddressSanitizer: a fast address sanity checker. In: Proceedings of USENIX Annual Technical Conference, 2012
Serebryany K, Iskhodzhanov T. ThreadSanitizer: data race detection in practice. In: Proceedings of the Workshop on Binary Instrumentation and Applications, 2009. 62–71
Jackson D. Software Abstractions: Logic, Language, and Analysis. Cambridge: The MIT Press, 2012
Lemieux C, Inala J P, Lahiri S K, et al. CODAMOSA: escaping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Khanfir A, Degiovanni R, Papadakis M, et al. Efficient mutation testing via pre-trained language models. 2023. ArXiv:2301.03543v1
Chen Z, Liu J, Gu W, et al. Experience report: deep learning-based system log analysis for anomaly detection. 2021. ArXiv:2107.05908
Wang J, Huang Y, Chen C, et al. Software testing with large language model: survey, landscape, and vision. 2023. ArXiv:2307.07221
Durelli V H S, Durelli R S, Borges S S, et al. Machine learning applied to software testing: a systematic mapping study. IEEE Trans Rel, 2019, 68: 1189–1212
Tufano M, Drain D, Svyatkovskiy A, et al. Unit test case generation with transformers and focal context. 2020. ArXiv:2009.05617v2
Watson C, Tufano M, Moran K, et al. On learning meaningful assert statements for unit test cases. In: Proceedings of IEEE/ACM 42nd International Conference on Software Engineering (ICSE), 2020
Tufano M, Drain D, Svyatkovskiy A, et al. Generating accurate assert statements for unit test cases using pretrained transformers. 2022. ArXiv:2009.05634
Blasi A, Gorla A, Ernst M D, et al. Call Me Maybe: using NLP to automatically generate unit test cases respecting temporal constraints. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
Dinella E, Ryan G, Mytkowicz T, et al. TOGA: a neural method for test Oracle generation. 2022. ArXiv:2109.09262
Xie Z, Chen Y, Zhi C, et al. ChatUniTest: a ChatGPT-based automated unit test generation tool. 2023. ArXiv:2305.04764
Alagarsamy S, Tantithamthavorn C, Aleti A. A3Test: assertion-augmented automated test case generation. 2023. ArXiv:2302.10352
Feldmeier P, Fraser G. Neuroevolution-based generation of tests and oracles for games. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
Schäfer M, Nadi S, Eghbali A, et al. Adaptive test generation using a large language model. 2023. ArXiv:2302.06527
Siddiq M L, Santos J, Tanvir R H, et al. Exploring the effectiveness of large language models in generating unit tests. 2023. ArXiv:2305.00418v1
Hossain S B, Filieri A, Dwyer M B, et al. Neural-based test oracle generation: a large-scale evaluation and lessons learned. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
Liu Z, Liu K, Xia X, et al. Towards more realistic evaluation for neural test oracle generation. 2023. ArXiv:2305.17047
Yuan Z, Lou Y, Liu M, et al. No more manual tests? Evaluating and improving ChatGPT for unit test generation. 2023. ArXiv:2305.04207
Wong W E, Horgan J R, London S, et al. A study of effective regression testing in practice. In: Proceedings of the 8th International Symposium on Software Reliability Engineering, 1997
Yoo S, Harman M. Regression testing minimization, selection and prioritization: a survey. Softw Test Verif Reliab, 2012, 22: 67–120
Manes V J M, Han H S, Han C, et al. The art, science, and engineering of fuzzing: a survey. IEEE Trans Software Eng, 2021, 47: 2312–2331
Zhu X, Wen S, Camtepe S, et al. Fuzzing: a survey for roadmap. ACM Comput Surv, 2022, 54: 1–36
Li J, Zhao B, Zhang C. Fuzzing: a survey. Cybersecurity, 2018, 1: 6
Lee M, Cha S, Oh H. Learning seed-adaptive mutation strategies for greybox fuzzing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Wang J, Song C, Yin H. Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing. In: Proceedings of Network and Distributed Systems Security (NDSS) Symposium, 2021
Wang Y, Wu Z, Wei Q, et al. NeuFuzz: efficient fuzzing with deep neural network. IEEE Access, 2019, 7: 36340–36352
Deng Y, Xia C S, Peng H, et al. Large language models are zero-shot fuzzers: fuzzing deep-learning libraries via large language models. 2023. ArXiv:2212.14834
Deng Y, Xia C S, Yang C, et al. Large language models are edge-case fuzzers: testing deep learning libraries via FuzzGPT. 2023. ArXiv:2304.02014
Yang C, Deng Y, Lu R, et al. White-box compiler fuzzing empowered by large language models. 2023. ArXiv:2310.15991
Xia C S, Paltenghi M, Tian J L, et al. Universal fuzzing via large language models. 2023. ArXiv:2308.04748v1
Ye G, Tang Z, Tan S H, et al. Automated conformance testing for JavaScript engines via deep compiler fuzzing. In: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021. 435–450
Cummins C, Petoumenos P, Murray A, et al. Compiler fuzzing through deep learning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018
Lin M, Zeng Y, Li Y. RegFuzz: a linear regression-based approach for seed scheduling in directed fuzzing. In: Proceedings of the 4th Information Communication Technologies Conference (ICTC), 2023
Meng R, Mirchev M, Böhme M, et al. Large language model guided protocol fuzzing. In: Proceedings of Network and Distributed System Security (NDSS) Symposium, 2024
Su J, Dai H N, Zhao L, et al. Effectively generating vulnerable transaction sequences in smart contracts with reinforcement learning-guided fuzzing. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
Luo W, Chai D, Ruan X, et al. Graph-based fuzz testing for deep learning inference engines. In: Proceedings of IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021
Chen Y, Poskitt C M, Sun J, et al. Learning-guided network fuzzing for testing cyber-physical system defences. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019
Jiang L, Yuan H, Wu M, et al. Evaluating and improving hybrid fuzzing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
He J, Balunović M, Ambroladze N, et al. Learning to fuzz from symbolic execution with application to smart contracts. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2019. 531–548
Jia H, Wen M, Xie Z, et al. Detecting JVM JIT compiler bugs via exploring two-dimensional input spaces. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Zheng Y, Liu Y, Xie X, et al. Automatic web testing using curiosity-driven reinforcement learning. In: Proceedings of the 43rd International Conference on Software Engineering, 2021. 423–435
Zhang S, Liu S, Sun J, et al. FIGCPS: effective failure-inducing input generation for cyber-physical systems with deep reinforcement learning. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021
Liu Z, Chen C, Wang J, et al. Fill in the blank: context-aware automated text input generation for mobile GUI testing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
YazdaniBanafsheDaragh F, Malek S. Deep GUI: black-box GUI input generation with deep learning. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021
Feng S, Xie M, Chen C. Efficiency matters: speeding up automated testing with GUI rendering inference. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Ran D, Wang H, Wang W, et al. Badge: prioritizing UI events with hierarchical multi-armed bandits for automated UI testing. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Pan M, Huang A, Wang G, et al. Reinforcement learning based curiosity-driven testing of Android applications. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 153–164
Zhao Y, Talebipour S, Baral K, et al. Avgust: automating usage-based test generation from videos of app executions. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022
Wang X, Zhao L. APICAD: augmenting API misuse detection through specifications from code and documents. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023
Kim M, Corradini D, Sinha S, et al. Enhancing REST API testing with NLP techniques. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023
Kim M, Sinha S, Orso A. Adaptive REST API testing with reinforcement learning. 2023. ArXiv:2309.04583
Alyahya T N, Menai M E B, Mathkour H. On the structure of the boolean satisfiability problem: a survey. ACM Comput Surv, 2023, 55: 1–34
Guo W, Zhen H L, Li X, et al. Machine learning methods in solving the Boolean satisfiability problem. Mach Intell Res, 2023, 20: 640–655
Avgerinos T, Rebert A, Cha S K, et al. Enhancing symbolic execution with veritesting. In: Proceedings of the 36th International Conference on Software Engineering, 2014. 1083–1094
Baldoni R, Coppa E, D’elia D C, et al. A survey of symbolic execution techniques. ACM Comput Surv, 2019, 51: 1–39
He J, Sivanrupan G, Tsankov P, et al. Learning to explore paths for symbolic execution. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2021
Cha S, Oh H. Concolic testing with adaptively changing search heuristics. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019
Cha S, Hong S, Lee J, et al. Automatically generating search heuristics for concolic testing. In: Proceedings of IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018
Zhang T, Zhang Y, Chen Z, et al. Efficient multiplex symbolic execution with adaptive search strategy. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020
Cha S, Oh H. Making symbolic execution promising by learning aggressive state-pruning strategy. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020
Chen Z, Chen Z, Shuai Z, et al. Synthesize solving strategy for symbolic execution. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 348–360
Luo S, Xu H, Bi Y, et al. Boosting symbolic execution via constraint solving time prediction (experience paper). In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021. 336–347
Cha S, Lee M, Lee S, et al. SYMTUNER: maximizing the power of symbolic execution by adaptively tuning external parameters. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022
Chen J, Hu W, Zhang L, et al. Learning to accelerate symbolic execution via code transformation. In: Proceedings of the 32nd European Conference on Object-Oriented Programming, 2018
Development team T C. The Coq proof assistant. 1984. https://coq.inria.fr/coq-84
Development team T I. Isabelle. 1986. https://isabelle.in.tum.de/index.html
Paulson L C. Natural deduction as higher-order resolution. 1986. ArXiv:cs/9301104
Lample G, Lachaux M A, Lavril T, et al. HyperTree proof search for neural theorem proving. 2022. ArXiv:2205.11491
Wu Y, Jiang A Q, Li W, et al. Autoformalization with large language models. In: Proceedings of Advances in Neural Information Processing Systems, 2022
First E, Brun Y. Diversity-driven automated formal verification. In: Proceedings of the 44th International Conference on Software Engineering, 2022. 749–761
Yang K, Swope A M, Gu A, et al. LeanDojo: theorem proving with retrieval-augmented language models. 2023. ArXiv:2306.15626
Chakraborty S, Lahiri S K, Fakhoury S, et al. Ranking LLM-generated loop invariants for program verification. 2023. ArXiv:2310.09342
Zimmeck S, Wang Z, Zou L, et al. Automated analysis of privacy requirements for mobile apps. In: Proceedings of the AAAI Fall Symposium Series, 2016
Mahanipour A, Nezamabadi-pour H. GSP: an automatic programming technique with gravitational search algorithm. Appl Intell, 2019, 49: 1502–1516
Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems, 2013. 26
Liu S, Zhao B, Guo R, et al. Have you been properly notified? Automatic compliance analysis of privacy policy text with GDPR article 13. In: Proceedings of the Web Conference 2021, 2021. 2154–2164
Rubio-González C, Liblit B. Expect the unexpected: error code mismatches between documentation and the real world. In: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, 2010. 73–80
Tan L, Yuan D, Krishna G, et al. /*icomment: bugs or bad comments?*/. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007. 145–158
Tan S H, Marinov D, Tan L, et al. @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation, 2012. 260–269
Wen F, Nagy C, Bavota G, et al. A large-scale empirical study on code-comment inconsistencies. In: Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 2019. 53–64
Pandita R, Taneja K, Williams L, et al. ICON: inferring temporal constraints from natural language API descriptions. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016. 378–388
Ren X, Ye X, Xing Z, et al. API-misuse detection driven by fine-grained API-constraint knowledge graph. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 461–472
Lv T, Li R, Yang Y, et al. RTFM! automatic assumption discovery and verification derivation from library document for API misuse detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2020. 1837–1852
Yun I, Min C, Si X, et al. APISan: sanitizing API usages through semantic cross-checking. In: Proceedings of Usenix Security Symposium, 2016. 363–378
Kang Y, Ray B, Jana S. APEx: automated inference of error specifications for C APIs. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 472–482
Li C, Zhou M, Gu Z, et al. Ares: inferring error specifications through static analysis. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019. 1174–1177
Takanen A, Demott J D, Miller C, et al. Fuzzing for Software Security Testing and Quality Assurance. Norwood: Artech House, Inc. 2018
You W, Zong P, Chen K, et al. SemFuzz: semantics-based automatic generation of proof-of-concept exploits. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017. 2139–2154
Godefroid P, Peleg H, Singh R. Learn&Fuzz: machine learning for input fuzzing. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. 50–59
Liu X, Li X, Prajapati R, et al. DeepFuzz: automatic generation of syntax valid C programs for fuzz testing. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019. 1044–1051
Lee S, Han H, Cha S K, et al. Montage: a neural network language model-guided JavaScript engine fuzzer. In: Proceedings of the 29th USENIX Conference on Security Symposium, 2020. 2613–2630
Chen P, Chen H. Angora: efficient fuzzing by principled search. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2018. 711–725
Funahashi K I. On the approximate realization of continuous mappings by neural networks. Neural Netws, 1989, 2: 183–192
Nagy S, Hicks M. Full-speed fuzzing: reducing fuzzing overhead through coverage-guided tracing. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), 2019. 787–802
Zhou C, Wang M, Liang J, et al. Zeror: speed up fuzzing with coverage-sensitive tracing and scheduling. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 858–870
Zong P, Lv T, Wang D, et al. FuzzGuard: filtering out unreachable inputs in directed grey-box fuzzing through deep learning. In: Proceedings of the 29th USENIX Conference on Security Symposium, 2020. 2255–2269
Jung R, Jourdan J H, Krebbers R, et al. Safe systems programming in Rust. Commun ACM, 2021, 64: 144–152
Wong W E, Gao R, Li Y, et al. A survey on software fault localization. IEEE Trans Software Eng, 2016, 42: 707–740
Zakari A, Lee S P, Abreu R, et al. Multiple fault localization of software programs: a systematic literature review. Inf Software Tech, 2020, 124: 106312
Xie X, Liu Z, Song S, et al. Revisit of automatic debugging via human focus-tracking analysis. In: Proceedings of the 38th International Conference on Software Engineering, 2016. 808–819
Agrawal H, Horgan J, London S, et al. Fault localization using execution slices and dataflow tests. In: Proceedings of the 6th International Symposium on Software Reliability Engineering, 1995. 143–151
Wong C P, Xiong Y, Zhang H, et al. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2014. 181–190
Zhang X, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering, New York, 2006. 272–281
Jones J A, Harrold M J, Stasko J. Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering, 2002. 467–477
Liblit B, Naik M, Zheng A X, et al. Scalable statistical bug isolation. ACM SIGPLAN Not, 2005, 40: 15–26
Abreu R, Zoeteweij P, Golsteijn R, et al. A practical evaluation of spectrum-based fault localization. J Syst Software, 2009, 82: 1780–1792
Xie X Y, Chen T Y, Kuo F-C, et al. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Trans Softw Eng Methodol, 2013, 22: 1–40
Zou D, Liang J, Xiong Y, et al. An empirical study of fault localization families and their combinations. IEEE Trans Software Eng, 2019, 47: 332–347
Widyasari R, Prana G A A, Haryono S A, et al. XAI4FL: enhancing spectrum-based fault localization with explainable artificial intelligence. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022. 499–510
Moon S, Kim Y, Kim M, et al. Ask the mutants: mutating faulty programs for fault localization. In: Proceedings of the IEEE 7th International Conference on Software Testing, Verification and Validation, 2014. 153–162
Papadakis M, Traon Y L. Metallaxis-FL: mutation-based fault localization. Software Testing Verif Rel, 2015, 25: 605–628
Wong W E, Qi Y U. Bp neural network-based effective fault localization. Int J Soft Eng Knowl Eng, 2009, 19: 573–597
Wong W E, Debroy V, Golden R, et al. Effective software fault localization using an RBF neural network. IEEE Trans Rel, 2012, 61: 149–169
Zheng W, Hu D, Wang J. Fault localization analysis based on deep neural network. Math Problems Eng, 2016, 2016: 1–11
Zhang Z, Lei Y, Tan Q, et al. Deep learning-based fault localization with contextual information. IEICE Trans Inf Syst, 2017, E100.D: 3027–3031
Li X, Li W, Zhang Y, et al. DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 169–180
Zhang Z, Lei Y, Mao X G, et al. CNN-FL: an effective approach for localizing faults using convolutional neural networks. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019
Li Y, Wang S, Nguyen T. Fault localization with code coverage representation learning. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 661–673
Lou Y, Zhu Q, Dong J, et al. Boosting coverage-based fault localization via graph-based representation learning. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021. 664–676
Qian J, Ju X, Chen X, et al. AGFL: a graph convolutional neural network-based method for fault localization. In: Proceedings of the IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 2021. 672–680
Qian J, Ju X, Chen X. GNet4FL: effective fault localization via graph convolutional neural network. Autom Softw Eng, 2023, 30: 16
Zhang Z, Lei Y, Mao X, et al. Context-aware neural fault localization. IEEE Trans Software Eng, 2023, 49: 3939–3954
Li Y, Wang S, Nguyen T N. Fault localization to detect co-change fixing locations. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, 2022. 659–671
Dutta A, Manral R, Mitra P, et al. Hierarchically localizing software faults using DNN. IEEE Trans Rel, 2020, 69: 1267–1292
Yu J, Lei Y, Xie H, et al. Context-based cluster fault localization. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, New York, 2022. 482–493
Li Z, Tang E, Chen X, et al. Graph neural network based two-phase fault localization approach. In: Proceedings of the 13th Asia-Pacific Symposium on Internetware, 2022. 85–95
Yousofvand L, Soleimani S, Rafe V. Automatic bug localization using a combination of deep learning and model transformation through node classification. Software Qual J, 2023, 31: 1045–1063
Wu S, Li Z, Liu Y, et al. GMBFL: optimizing mutation-based fault localization via graph representation. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2023. 245–257
Cao J, Yang S, Jiang W, et al. BugPecker: locating faulty methods with deep learning on revision graphs. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020. 1214–1218
Ciborowska A, Damevski K. Fast changeset-based bug localization with BERT. In: Proceedings of the 44th International Conference on Software Engineering, New York, 2022. 946–957
Zhang Z, Lei Y, Mao X, et al. A study of effectiveness of deep learning in locating real faults. Inf Software Tech, 2021, 131: 106486
Zhong H, Mei H. Learning a graph-based classifier for fault localization. Sci China Inf Sci, 2020, 63: 162101
Zhang Z, Lei Y, Mao X, et al. Improving deep-learning-based fault localization with resampling. J Software Evolu Process, 2021, 33: e2312
Xie H, Lei Y, Yan M, et al. A universal data augmentation approach for fault localization. In: Proceedings of the 44th International Conference on Software Engineering, New York, 2022. 48–60
Hu J, Xie H, Lei Y, et al. A light-weight data augmentation method for fault localization. Inf Software Tech, 2023, 157: 107148
Lei Y, Liu C, Xie H, et al. BCL-FL: a data augmentation approach with between-class learning for fault localization. In: Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022. 289–300
Lei Y, Wen T, Xie H, et al. Mitigating the effect of class imbalance in fault localization using context-aware generative adversarial network. In: Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension, 2023
Zhang Z, Lei Y, Su T, et al. Influential global and local contexts guided trace representation for fault localization. ACM Trans Softw Eng Methodol, 2023, 32: 1–27
Tian Z, Chen J, Zhu Q, et al. Learning to construct better mutation faults. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022. 1–13
Zhang Z, Lei Y, Mao X, et al. Improving fault localization using model-domain synthesized failing test generation. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022. 199–210
Just R, Jalali D, Ernst M D. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the International Symposium on Software Testing and Analysis, 2014. 437–440
Madeiral F, Urli S, Maia M, et al. BEARS: an extensible Java bug benchmark for automatic program repair studies. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 468–478
Do H, Elbaum S, Rothermel G. Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Software Eng, 2005, 10: 405–435
Goues C L, Holtschulte N, Smith E K, et al. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Software Eng, 2015, 41: 1236–1256
Weiß C, Premraj R, Zimmermann T, et al. How long will it take to fix this bug? In: Proceedings of the 4th International Workshop on Mining Software Repositories, 2007
Gazzola L, Micucci D, Mariani L. Automatic software repair: a survey. IEEE Trans Software Eng, 2019, 45: 34–67
Xuan J, Ren Z, Wang Z, et al. Progress on approaches to automatic program repair (in Chinese). J Software, 2016, 27: 771–784
Monperrus M. The Living Review on Automated Program Repair. Research Report hal-01956501, HAL Archives Ouvertes, 2018. Version: 5
Tufano M, Watson C, Bavota G, et al. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans Softw Eng Methodol, 2019, 28: 1–29
Kern C, Esparza J. Automatic error correction of Java programs. In: Proceedings of the 15th International Workshop on Formal Methods for Industrial Critical Systems, 2010. 67–81
Tian Y, Ray B. Automatically diagnosing and repairing error handling bugs in C. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017. 752–762
Carvalho A, Luz W P, Marcilio D, et al. C-3PR: a bot for fixing static analysis violations via pull requests. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, 2020. 161–171
Aho A V, Peterson T G. A minimum distance error-correcting parser for context-free languages. SIAM J Comput, 1972, 1: 305–312
Graham S L, Rhodes S P. Practical syntactic error recovery. In: Proceedings of Conference Record of the ACM Symposium on Principles of Programming Languages, Boston, 1973. 52–58
Anderson S O, Backhouse R C. Locally least-cost error recovery in Earley’s algorithm. ACM Trans Program Lang Syst, 1981, 3: 318–347
Burke M G, Fisher G A. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Trans Program Lang Syst, 1987, 9: 164–197
Gupta R, Pal S, Kanade A, et al. DeepFix: fixing common C language errors by deep learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 1345–1351
Bhatia S, Kohli P, Singh R. Neuro-symbolic program corrector for introductory programming assignments. In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 60–70
Ahmed U Z, Kumar P, Karkare A, et al. Compilation error repair: for the student programs, from the student programs. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training, 2018. 78–87
Santos E A, Campbell J C, Patel D, et al. Syntax and sensibility: using language models to detect and correct syntax errors. In: Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering, 2018. 311–322
Brown N C C, Kölling M, McCall D, et al. Blackbox: a large scale repository of novice programmers’ activity. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education, Atlanta, 2014. 223–228
Mesbah A, Rice A, Johnston E, et al. DeepDelta: learning to repair compilation errors. In: Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, 2019. 925–936
Gupta R, Kanade A, Shevade S K. Deep reinforcement learning for syntactic error repair in student programs. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, 2019. 930–937
Wu L, Li F, Wu Y, et al. GGF: a graph-based method for programming language syntax error correction. In: Proceedings of the 28th International Conference on Program Comprehension, Seoul, 2020. 139–148
Yasunaga M, Liang P. Graph-based, self-supervised program repair from diagnostic feedback. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 10799–10808
Hajipour H, Bhattacharyya A, Staicu C, et al. SampleFix: learning to generate functionally diverse fixes. In: Proceedings of Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. 119–133
Yasunaga M, Liang P. Break-it-fix-it: unsupervised learning for program repair. In: Proceedings of the 38th International Conference on Machine Learning, 2021. 11941–11952
Ahmed T, Devanbu P, Hellendoorn V J. Learning lenient parsing & typing via indirect supervision. Empir Software Eng, 2021, 26: 29
Sakkas G, Endres M, Guo P J, et al. Seq2Parse: neurosymbolic parse error repair. Proc ACM Program Lang, 2022, 6: 1180–1206
Li X, Liu S, Feng R, et al. TransRepair: context-aware program repair for compilation errors. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, 2022. 1–13
Ahmed T, Ledesma N R, Devanbu P. SynShine: improved fixing of syntax errors. IEEE Trans Software Eng, 2023, 49: 2169–2181
Liu Z, Lin W, Shi Y, et al. A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th China National Conference on Chinese Computational Linguistics, Hohhot, 2021. 471–484
Gu Y F, Ma P, Jia X Y, et al. Progress on software crash research (in Chinese). Sci Sin Inform, 2019, 49: 1383–1398
Goues C L, Nguyen T V, Forrest S, et al. GenProg: a generic method for automatic software repair. IEEE Trans Software Eng, 2012, 38: 54–72
Wong C, Santiesteban P, Kästner C, et al. VarFix: balancing edit expressiveness and search effectiveness in automated program repair. In: Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, 2021. 354–366
Nguyen H D T, Qi D, Roychoudhury A, et al. SemFix: program repair via semantic analysis. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 772–781
Mechtaev S, Yi J, Roychoudhury A. Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 691–701
Xuan J, Martinez M, DeMarco F, et al. Nopol: automatic repair of conditional statement bugs in Java programs. IEEE Trans Software Eng, 2017, 43: 34–55
Tan S H, Roychoudhury A. relifix: automated repair of software regressions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, Florence, 2015. 471–482
Saha S, Saha R K, Prasad M R. Harnessing evolution for multi-hunk program repair. In: Proceedings of the 41st International Conference on Software Engineering, Montreal, 2019. 13–24
Liu K, Koyuncu A, Kim D, et al. TBar: revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, Beijing, 2019. 31–42
White M, Tufano M, Martinez M, et al. Sorting and transforming program repair ingredients via deep learning code similarities. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, Hangzhou, 2019. 479–490
Chen Z, Kommrusch S J, Tufano M, et al. SequenceR: sequence-to-sequence learning for end-to-end program repair. IEEE Trans Software Eng, 2021, 47: 1943–1959
Jiang N, Lutellier T, Tan L. CURE: code-aware neural machine translation for automatic program repair. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering, Madrid, 2021. 1161–1173
Long F, Rinard M C. Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016. 298–312
Goues C L, Dewey-Vogt M, Forrest S, et al. A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering, 2012. 3–13
Tufano M, Watson C, Bavota G, et al. An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, 2018. 832–837
Sun Z, Xin C, Sun Y. An automatic semantic code repair service based on deep learning for programs with single error. In: Proceedings of the IEEE World Congress on Services, Milan, 2019. 360–361
Ding Y, Ray B, Devanbu P T, et al. Patching as translation: the data and the metaphor. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, 2020. 275–286
Yang G, Min K, Lee B. Applying deep learning algorithm to automatic bug localization and repair. In: Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing, 2020. 1634–1641
Yu L, Zhang W, Wang J, et al. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 2852–2858
Lutellier T, Pham H V, Pang L, et al. CoCoNuT: combining context-aware neural translation models using ensemble for program repair. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020. 101–114
Martinez M, Durieux T, Sommerard R, et al. Automatic repair of real bugs in Java: a large-scale experiment on the defects4j dataset. Empir Software Eng, 2017, 22: 1936–1964
Saha R K, Lyu Y, Lam W, et al. Bugs.jar: a large-scale, diverse dataset of real-world Java bugs. In: Proceedings of the 15th International Conference on Mining Software Repositories, Gothenburg, 2018. 10–13
Tian H, Liu K, Kaboré A K, et al. Evaluating representation learning of code changes for predicting patch correctness in program repair. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, 2020. 981–992
Dinella E, Dai H, Li Z, et al. Hoppity: learning graph transformations to detect and fix bugs in programs. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020
Tang Y, Zhou L, Blanco A, et al. Grammar-based patches generation for automated program repair. In: Proceedings of Findings of the Association for Computational Linguistics, 2021. 1300–1305
Huang S, Zhou X, Chin S. Application of Seq2Seq models on code correction. Front Artif Intell, 2021, 4: 590215
Rahman M M, Watanobe Y, Nakamura K. A bidirectional LSTM language model for code evaluation and repair. Symmetry, 2021, 13: 247
Berabi B, He J, Raychev V, et al. TFix: learning to fix coding errors with a text-to-text transformer. In: Proceedings of the 38th International Conference on Machine Learning, 2021. 780–791
Tang B, Li B, Bo L, et al. GrasP: graph-to-sequence learning for automated program repair. In: Proceedings of the 21st IEEE International Conference on Software Quality, Reliability and Security, Hainan, 2021. 819–828
Szalontai B, Vadász A, Borsi Z R, et al. Detecting and fixing nonidiomatic snippets in Python source code with deep learning. In: Proceedings of Intelligent Systems and Applications, Amsterdam, 2021. 129–147
Li Y, Wang S, Nguyen T N. DEAR: a novel deep learning-based approach for automated program repair. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 511–523
Xu X, Wang X, Xue J. M3V: multi-modal multi-view context embedding for repair operator prediction. In: Proceedings of IEEE/ACM International Symposium on Code Generation and Optimization, Seoul, 2022. 266–277
Meng X, Wang X, Zhang H, et al. Improving fault localization and program repair with deep semantic features and transferred knowledge. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 1169–1180
Kim M, Kim Y, Heo J, et al. Impact of defect instances for successful deep learning-based automatic program repair. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, Limassol, 2022. 419–423
Wardat M, Cruz B D, Le W, et al. DeepDiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 561–572
Yao J, Rao B, Xing W, et al. Bug-Transformer: automated program repair using attention-based deep neural network. J Circuit Syst Comp, 2022, 31: 2250210
Yan D, Liu K, Niu Y, et al. Crex: predicting patch correctness in automated repair of C programs through transfer learning of execution semantics. Inf Software Tech, 2022, 152: 107043
Pei K, Xuan Z, Yang J, et al. Learning approximate execution semantics from traces for binary function similarity. IEEE Trans Software Eng, 2023, 49: 2776–2790
Chakraborty S, Ding Y, Allamanis M, et al. CODIT: code editing with tree-based neural models. IEEE Trans Software Eng, 2022, 48: 1385–1399
Ye H, Martinez M, Monperrus M. Neural program repair with execution-based backpropagation. In: Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, 2022. 1506–1518
Ye H, Gu J, Martinez M, et al. Automated classification of overfitting patches with statically extracted code features. IEEE Trans Software Eng, 2022, 48: 2920–2938
Ye H, Martinez M, Luo X, et al. SelfAPR: self-supervised program repair with test execution diagnostics. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, 2022. 1–13
Xia C S, Zhang L. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022. 959–971
Kim M, Kim Y, Jeong H, et al. An empirical study of deep transfer learning-based program repair for Kotlin projects. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2022. 1441–1452
Tian H, Li Y, Pian W, et al. Predicting patch correctness based on the similarity of failing test cases. ACM Trans Softw Eng Methodol, 2022, 31: 1–30
Yuan W, Zhang Q, He T, et al. CIRCLE: continual repair across programming languages. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022. 678–690
Chen L, Pei Y, Pan M, et al. Program repair with repeated learning. IEEE Trans Software Eng, 2023, 49: 831–848
Stocco A, Yandrapally R, Mesbah A. Visual web test repair. In: Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Lake Buena Vista, 2018. 503–514
Pan M, Xu T, Pei Y, et al. GUI-guided test script repair for mobile apps. IEEE Trans Software Eng, 2022, 48: 910–929
Ren Z, Sun S, Xuan J, et al. Automated patching for unreproducible builds. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering, Pittsburgh, 2022. 200–211
Hassan F, Wang X. HireBuild: an automatic approach to history-driven repair of build scripts. In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 1078–1089
Lou Y, Chen J, Zhang L, et al. History-driven build failure fixing: how far are we? In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019. 43–54
Loriot B, Madeiral F, Monperrus M. Styler: learning formatting conventions to repair Checkstyle violations. Empir Software Eng, 2022, 27: 149
Ma S, Thung F, Lo D, et al. VuRLE: automatic vulnerability detection and repair by learning from examples. In: Proceedings of the 22nd European Symposium on Research in Computer Security, Oslo, 2017. 229–246
Harer J, Ozdemir O, Lazovich T, et al. Learning to repair software vulnerabilities with generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 7944–7954
Zhou Z, Bo L, Wu X, et al. SPVF: security property assisted vulnerability fixing via attention-based models. Empir Software Eng, 2022, 27: 171
Huang K, Yang S, Sun H, et al. Repairing security vulnerabilities using pre-trained programming language models. In: Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022. 111–116
Chen Z, Kommrusch S, Monperrus M. Neural transfer learning for repairing security vulnerabilities in C code. IEEE Trans Software Eng, 2023, 49: 147–165
Chi J, Qu Y, Liu T, et al. SeqTrans: automatic vulnerability fix via sequence to sequence learning. IEEE Trans Software Eng, 2023, 49: 564–585
Das R, Ahmed U Z, Karkare A, et al. Prutor: a system for tutoring CS1 and collecting student programs for analysis. 2016. ArXiv:1608.03828
Brown N C C, Altadmri A, Sentance S, et al. Blackbox, five years on: an evaluation of a large-scale programming data collection project. In: Proceedings of the ACM Conference on International Computing Education Research, New York, 2018. 196–204
Motwani M, Sankaranarayanan S, Just R, et al. Do automated program repair techniques repair hard and important bugs? In: Proceedings of the 40th International Conference on Software Engineering, Gothenburg, 2018. 25
Jiang Y, Liu H, Niu N, et al. Extracting concise bug-fixing patches from human-written patches in version control systems. In: Proceedings of the 43rd International Conference on Software Engineering (ICSE’21), 2021
Jiang Y, Liu H, Luo X, et al. BugBuilder: an automated approach to building bug repository. IEEE Trans Software Eng, 2023, 49: 1443–1463
Bui Q C, Scandariato R, Ferreyra N E D. Vul4J: a dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques. In: Proceedings of the IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), 2022. 464–468
Nikitopoulos G, Dritsa K, Louridas P, et al. CrossVul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021. 1565–1569
Zou W, Lo D, Chen Z, et al. How practitioners perceive automated bug report management techniques. IEEE Trans Software Eng, 2018, 46: 836–862
Bettenburg N, Just S, Schröter A, et al. What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008. 308–318
Lee D G, Seo Y S. Systematic review of bug report processing techniques to improve software management performance. J Inf Process Syst, 2019, 15: 967–985
Anvik J. Automating bug report assignment. In: Proceedings of the 28th International Conference on Software Engineering, 2006. 937–940
Jiang H, Li X, Ren Z, et al. Toward better summarizing bug reports with crowdsourcing elicited attributes. IEEE Trans Rel, 2018, 68: 2–22
Tan Y, Xu S, Wang Z, et al. Bug severity prediction using question-and-answer pairs from Stack Overflow. J Syst Software, 2020, 165: 110567
Zhang T, Han D, Vinayakarao V, et al. Duplicate bug report detection: how far are we? ACM Trans Softw Eng Methodol, 2023, 32: 1–32
Li X, Jiang H, Liu D, et al. Unsupervised deep bug report summarization. In: Proceedings of the 26th Conference on Program Comprehension, 2018. 144–155
Fang F, Wu J, Li Y, et al. On the classification of bug reports to improve bug localization. Soft Comput, 2021, 25: 7307–7323
Zhou C, Li B, Sun X, et al. Leveraging multi-level embeddings for knowledge-aware bug report reformulation. J Syst Software, 2023, 198: 111617
He J, Xu L, Yan M, et al. Duplicate bug report detection using dual-channel convolutional neural networks. In: Proceedings of the 28th International Conference on Program Comprehension, 2020. 117–127
Xiao G, Du X, Sui Y, et al. HINDBR: heterogeneous information network based duplicate bug report prediction. In: Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020. 195–206
Xie Q, Wen Z, Zhu J, et al. Detecting duplicate bug reports with convolutional neural networks. In: Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC), 2018. 416–425
Deshmukh J, Annervaz K, Podder S, et al. Towards accurate duplicate bug retrieval using deep learning techniques. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 115–124
Budhiraja A, Dutta K, Reddy R, et al. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, 2018. 193–194
Isotani H, Washizaki H, Fukazawa Y, et al. Duplicate bug report detection by using sentence embedding and fine-tuning. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2021. 535–544
Jiang Y, Su X, Treude C, et al. Does deep learning improve the performance of duplicate bug report detection? An empirical study. J Syst Software, 2023, 198: 111607
Koc U, Wei S, Foster J S, et al. An empirical assessment of machine learning approaches for triaging reports of a Java static analysis tool. In: Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019. 288–299
Florea A C, Anvik J, Andonie R. Parallel implementation of a bug report assignment recommender using deep learning. In: Proceedings of the 26th International Conference on Artificial Neural Networks and Machine Learning, 2017. 64–71
Lee S R, Heo M J, Lee C G, et al. Applying deep learning based automatic bug triager to industrial projects. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017
Mani S, Sankaran A, Aralikatte R. DeepTriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019. 171–179
Liu Y, Qi X, Zhang J, et al. Automatic bug triaging via deep reinforcement learning. Appl Sci, 2022, 12: 3565
Han Z, Li X, Xing Z, et al. Learning to predict severity of software vulnerability using only vulnerability description. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017. 125–136
Gomes L A F, Torres R S, Côrtes M L. Bug report severity level prediction in open source software: a survey and research opportunities. Inf Software Tech, 2019, 115: 58–78
Noyori Y, Washizaki H, Fukazawa Y, et al. Deep learning and gradient-based extraction of bug report features related to bug fixing time. Front Comput Sci, 2023, 5: 1032440
Liu H, Yu Y, Li S, et al. How to cherry pick the bug report for better summarization? Empir Software Eng, 2021, 26: 119
Liu H, Yu Y, Li S, et al. BugSum: deep context understanding for bug report summarization. In: Proceedings of the 28th International Conference on Program Comprehension, 2020. 94–105
Chen S, Xie X, Yin B, et al. Stay professional and efficient: automatically generate titles for your bug reports. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020. 385–397
Lin H, Chen X, Chen X, et al. TitleGen-FL: quality prediction-based filter for automated issue title generation. J Syst Software, 2023, 195: 111513
Xiao Y, Keung J, Bennin K E, et al. Improving bug localization with word embedding and enhanced convolutional neural networks. Inf Software Tech, 2019, 105: 17–29
Xiao Y, Keung J, Mi Q, et al. Improving bug localization with an enhanced convolutional neural network. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC), 2017. 338–347
Wang B, Xu L, Yan M, et al. Multi-dimension convolutional neural network for bug localization. IEEE Trans Serv Comput, 2020, 15: 1649–1663
Lam A N, Nguyen A T, Nguyen H A, et al. Bug localization with combination of deep learning and information retrieval. In: Proceedings of the IEEE/ACM 25th International Conference on Program Comprehension (ICPC), 2017. 218–229
Cheng S, Yan X, Khan A A. A similarity integration method based information retrieval and word embedding in bug localization. In: Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020. 180–187
Lam A N, Nguyen A T, Nguyen H A, et al. Combining deep learning with information retrieval to localize buggy files for bug reports (N). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015. 476–481
Loyola P, Gajananan K, Satoh F. Bug localization by learning to rank and represent bug inducing changes. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018. 657–665
Zhu Z, Li Y, Tong H H, et al. CooBa: cross-project bug localization via adversarial transfer learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020
Han J, Huang C, Sun S, et al. bjXnet: an improved bug localization model based on code property graph and attention mechanism. Autom Softw Eng, 2023, 30: 12
Liang H, Hang D, Li X. Modeling function-level interactions for file-level bug localization. Empir Software Eng, 2022, 27: 186
Choetkiertikul M, Dam H K, Tran T, et al. Automatically recommending components for issue reports using deep learning. Empir Software Eng, 2021, 26: 1–39
Huo X, Thung F, Li M, et al. Deep transfer bug localization. IEEE Trans Software Eng, 2019, 47: 1368–1380
Haering M, Stanik C, Maalej W. Automatically matching bug reports with related app reviews. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021. 970–981
Ruan H, Chen B, Peng X, et al. DeepLink: recovering issue-commit links based on deep learning. J Syst Software, 2019, 158: 110406
Xie R, Chen L, Ye W, et al. DeepLink: a code knowledge graph based deep learning approach for issue-commit link recovery. In: Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019. 434–444
Xi S, Yao Y, Xiao X, et al. An effective approach for routing the bug reports to the right fixers. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, 2018. 1–10
Fu W, Menzies T. Easy over hard: a case study on deep learning. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, New York, 2017. 49–60
Biswas E, Vijay-Shanker K, Pollock L. Exploring word embedding techniques to improve sentiment analysis of software engineering texts. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 68–78
Nizamani Z A, Liu H, Chen D M, et al. Automatic approval prediction for software enhancement requests. Autom Softw Eng, 2018, 25: 347–381
Li X, Jiang H, Kamei Y, et al. Bridging semantic gaps between natural languages and APIs with word embedding. IEEE Trans Software Eng, 2018, 46: 1081–1097
Rhu M, Gimelshein N, Clemons J, et al. VDNN: virtualized deep neural networks for scalable, memory-efficient neural network design. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Wang L, Ye J, Zhao Y, et al. Superneurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, 2018. 41–53
Moran K, Bernal-Cardenas C, Curcio M, et al. Machine learning-based prototyping of graphical user interfaces for mobile apps. IEEE Trans Software Eng, 2018, 46: 196–221
Brooks F P. The Mythical Man-Month: Essays on Software Engineering. Reading: Addison-Wesley, 1975
Mockus A, Herbsleb J D. Expertise browser: a quantitative approach to identifying expertise. In: Proceedings of the 24th International Conference on Software Engineering, New York, 2002. 503–512
Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, New York, 2006. 361–370
Ma D, Schuler D, Zimmermann T, et al. Expert recommendation with usage expertise. In: Proceedings of the IEEE International Conference on Software Maintenance, 2009. 535–538
Zhou M, Mockus A. Developer fluency: achieving true mastery in software projects. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, 2010. 137–146
Fritz T, Murphy G C, Murphy-Hill E, et al. Degree-of-knowledge: modeling a developer’s knowledge of code. ACM Trans Softw Eng Methodol, 2014, 23: 1–42
Joblin M, Mauerer W, Apel S, et al. From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th International Conference on Software Engineering, 2015. 563–573
Meng X, Miller B P, Williams W R, et al. Mining software repositories for accurate authorship. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM), 2013. 250–259
Baltes S, Diehl S. Towards a theory of software development expertise. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018
Ren J, Yin H, Hu Q, et al. Towards quantifying the development value of code contributions. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018. 775–779
Venkataramani R, Gupta A, Asadullah A, et al. Discovery of technical expertise from open source code repositories. In: Proceedings of the 22nd International Conference on World Wide Web, 2013. 97–98
Saxena R, Pedanekar N. I know what you coded last summer: mining candidate expertise from GitHub repositories. In: Proceedings of Companion of the ACM Conference on Computer Supported Cooperative Work and Social Computing, 2017. 299–302
Liu S, Wang S, Zhu F, et al. HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014. 51–62
Kouters E, Vasilescu B, Serebrenik A, et al. Who’s who in Gnome: using LSA to merge software repository identities. In: Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM), 2012. 592–595
Mo W, Shen B, Chen Y, et al. TbIL: a tagging-based approach to identity linkage across software communities. In: Proceedings of Software Engineering Conference (APSEC), 2015. 56–63
Lee R K, Lo D. GitHub and stack overflow: analyzing developer interests across multiple social collaborative platforms. In: Proceedings of the 9th International Conference on Social Informatics, 2017. 245–256
Huang W, Mo W, Shen B, et al. CPDScorer: modeling and evaluating developer programming ability across software communities. In: Proceedings of SEKE, 2016. 87–92
Yan J, Sun H, Wang X, et al. Profiling developer expertise across software communities with heterogeneous information network analysis. In: Proceedings of the 10th Asia-Pacific Symposium on Internetware, Beijing, 2018. 1–9
Montandon J E, Valente M T, Silva L L. Mining the technical roles of GitHub users. Inf Software Tech, 2021, 131: 106485
Song X, Yan J, Huang Y, et al. A collaboration-aware approach to profiling developer expertise with cross-community data. In: Proceedings of IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), 2022. 344–355
Dey T, Karnauch A, Mockus A. Representation of developer expertise in open source software. In: Proceedings of IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2020. 995–1007
Ma Y, Bogart C, Amreen S, et al. World of Code: an infrastructure for mining the universe of open source VCS data. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019. 143–154
Dakhel A M, Desmarais M C, Khomh F. Dev2vec: representing domain expertise of developers in an embedding space. Inf Software Tech, 2022, 159: 107218
Javeed F, Siddique A, Munir A, et al. Discovering software developer’s coding expertise through deep learning. IET softw, 2020, 14: 213–220
Wang Z, Sun H, Fu Y, et al. Recommending crowdsourced software developers in consideration of skill improvement. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017. 717–722
Zhang Z, Sun H, Zhang H. Developer recommendation for Topcoder through a meta-learning based policy model. Empir Software Eng, 2019, 25: 859–889
Yu X, He Y, Fu Y, et al. Cross-domain developer recommendation algorithm based on feature matching. In: Proceedings of CCF Conference on Computer Supported Cooperative Work and Social Computing, 2019. 443–457
Wang J J, Yang Y, Wang S, et al. Context-aware personalized crowdtesting task recommendation. IEEE Trans Software Eng, 2021, 48: 3131–3144
Wang J, Yang Y, Wang S, et al. Context- and fairness-aware in-process crowdworker recommendation. ACM Trans Softw Eng Methodol, 2022, 31: 1–31
Ying H, Chen L, Liang T, et al. EARec: leveraging expertise and authority for pull-request reviewer recommendation in GitHub. In: Proceedings of the 3rd International Workshop on CrowdSourcing in Software Engineering, 2016. 29–35
Jiang J, Yang Y, He J, et al. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Inf Software Tech, 2017, 84: 48–62
Zhang J, Maddila C S, Bairi R, et al. Using large-scale heterogeneous graph representation learning for code review recommendations at Microsoft. In: Proceedings of IEEE/ACM 45th International Conference on Software Engineering, 2022. 162–172
Rebai S, Amich A, Molaei S, et al. Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations. Autom Softw Eng, 2020, 27: 301–328
Zanjani M B, Kagdi H, Bird C. Automatically recommending peer reviewers in modern code review. IEEE Trans Software Eng, 2016, 42: 530–543
Hannebauer C, Patalas M, Stünkel S, et al. Automatically recommending code reviewers based on their expertise: an empirical comparison. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), 2016. 99–110
Rong G, Zhang Y, Yang L, et al. Modeling review history for reviewer recommendation: a hypergraph approach. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 1381–1392
Kovalenko V, Tintarev N, Pasynkov E, et al. Does reviewer recommendation help developers? IEEE Trans Software Eng, 2020, 46: 710–731
Ahasanuzzaman M, Oliva G A, Hassan A E. Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study. Empir Software Eng, 2024, 29: 33
Gonçalves P W, Calikli G, Serebrenik A, et al. Competencies for code review. In: Proceedings of the ACM on Human-Computer Interaction, 2023. 1–33
Huang Y, Sun H. Best answerers prediction with topic based GAT in Q&A sites. In: Proceedings of the 12th Asia-Pacific Symposium on Internetware, 2020. 156–164
Jin Y, Bai Y, Zhu Y, et al. Code recommendation for open source software developers. In: Proceedings of the ACM Web Conference, 2023
Xiao W, He H, Xu W, et al. Recommending good first issues in GitHub OSS projects. In: Proceedings of IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022. 1830–1842
Santos F. Supporting the task-driven skill identification in open source project issue tracking systems. ACM SIGSOFT Softw Eng Notes, 2023, 48: 54–58
Costa C, Figueiredo J, Pimentel J F, et al. Recommending participants for collaborative merge sessions. IEEE Trans Software Eng, 2021, 47: 1198–1210
Constantino K, Figueiredo E. CoopFinder: finding collaborators based on co-changed files. In: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2022. 1–3
Constantino K, Belém F, Figueiredo E. Dual analysis for helping developers to find collaborators based on co-changed files: an empirical study. Softw Pract Exp, 2023, 53: 1438–1464
Surian D, Liu N, Lo D, et al. Recommending people in developers’ collaboration network. In: Proceedings of the 18th Working Conference on Reverse Engineering, 2011. 379–388
Canfora G, Penta M D, Oliveto R, et al. Who is going to mentor newcomers in open source projects? In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
Ye L, Sun H, Wang X, et al. Personalized teammate recommendation for crowdsourced software developers. In: Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018. 808–813
Fry T, Dey T, Karnauch A, et al. A dataset and an approach for identity resolution of 38 million author IDs extracted from 2B Git commits. In: Proceedings of IEEE/ACM 17th International Conference on Mining Software Repositories (MSR), 2020
Acknowledgements
We thank the following persons for their prior contributions to the manuscript preparation (in alphabetical order): Yuze GUO (Beihang University), Ruiqi HONG (Beihang University), Mingwei LIU (Fudan University), Xiaofan LIU (Wuhan University), Di WU (Beihang University), Hongjun YANG (Beihang University), Yanming YANG (Zhejiang University), Binquan ZHANG (Beihang University), and Zhuang ZHAO (Wuhan University).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, X., Hu, X., Huang, Y. et al. Deep learning-based software engineering: progress, challenges, and opportunities. Sci. China Inf. Sci. 68, 111102 (2025). https://doi.org/10.1007/s11432-023-4127-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-4127-5