Automatic bug assignment has been well-studied in the past decade. As textual bug reports usually describe the buggy phenomena and potential causes, engineers highly depend on these reports to fix bugs. Researchers heavily depend on the textual content in the bug reports to locate the buggy files. However, noises in texts bring adverse impacts to automatic bug assignments unexpectedly, mainly due to insufficiency of classical Natural Language Processing (NLP) techniques.
To acquire a deep understanding on the effects of textual features and nominal features, a research team led by Zexuan Li published their new research on 15 August 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team reproduce an NLP technique, TextCNN, to learn whether improved NLP technique can lead to better performance for textual features. The results reveal that textual features do not surpass other features even with the relatively advanced technique. The team further explore the influential features for bug assignment approaches and make explanation from a statistic perspective. They find that influential features selected are all nominal features that indicate the preference of developers. Experimental results show that nominal features can achieve competitive results without using text.
In the research, they make efforts to answer three questions. First, How effective are textual features with deep-learning based NLP techniques? They reproduce TextCNN and compare the effectiveness of textual features with the group of nominal features. Second, What are influential features for bug assignment approaches and why they are influential? They employ the wrapper method and the widely-used bidirectional strategy. By repeatedly training a classifier with different groups of features, it judges the importance of features according to the metric. They speculate that nominal features can contribute to reducing the search scope of the classifier and verify the speculation in a statistical method. Third, to what extent can the selected influential features make improvements on bug assignments? They train models with fixed classifiers on changing groups of features and conduct two popular classifiers (Decision Tree and SVM) on five groups of feature.
The experiment use five projects in different sizes and types as datasets. The results demonstrate that improved NLP technique has limited improvement and the selected key features achieve 11%-25% accuracy under two popular classifiers.
Future work can focus on introducing source files to build knowledge graph between these influential features and descriptive words for better embedding of nominal features.
DOI: 10.1007/s11704-024-3299-6
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Automatic bug assignments without texts: a study
Article Publication Date
15-Aug-2024