Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

  title={Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code},
  author={Xuan Huo and Ming Li},
Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug… 

Figures and Tables from this paper

Deep Learning With Customized Abstract Syntax Tree for Bug Localization

A bug localization system CAST is presented, which exploits deep learning and customized abstract syntax trees of programs to locate potential buggy source files automatically and effectively and significantly outperforms the state-of-the-art methods in locating the buggy sources.

Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention

This work proposes KGBugLocator to utilize knowledge graph embeddings to extract interrelations of code, and a keywords supervised bi-directional attention mechanism regularize model with interactive information between source files and bug reports to reach the new the-state-of-art SOTA for bug localization.

Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization

A novel model named CG-CNN is proposed, which is a multi-instance learning framework that enhances the unified features for bug localization by exploiting structural and sequential nature from the control flow graph.

Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

A novel method that improves bug localization performance by using surface lexical correlation matching and semantic correlation matching to solve the lexical gap between bug reports and source code files and obtains the relatively highBug localization performance compared to other classic methods.

Bug Localization by Learning to Rank and Represent Bug Inducing Changes

This work proposes a model that, instead of working at file level, learns feature representations from source changes extracted from the project history at both syntactic and code change dependency perspectives to support bug localization.

DependLoc: A Dependency-based Framework For Bug Localization

DependLoc is a novel framework for bug localization which leverages the dependency relationship among source code files and adopts a customized Ant Colony algorithm to quantify the intrinsic dependency relationship (called reference heat) and designs a segment-based encoder to learn this feature.

Enhancing supervised bug localization with metadata and stack-trace

A supervised topic modeling approach for automatically locating the relevant source files of a bug report and shows that the proposed method can achieve up to 67.1% improvement in terms of prediction accuracy over its best competitors and scales linearly with the size of the data.

DRAST - A Deep Learning and AST Based Approach for Bug Localization

A novel bug localization approach that works on C and Java projects and a bug localization C dataset along with a novel source code representation that leverages the syntactic structure of source code, bug report information and which can support multi-language projects along with new dataset of C projects is presented.

Online Adaptable Bug Localization for Rapidly Evolving Software

This paper proposes a streaming bug localization technique, based on an ensemble of online topic models, that is able to adapt to both specific (with explicit code mentions) and more abstract bug reports, and naturally integrates defect prediction and co-change information into its prediction.

Bug Localization via Supervised Topic Modeling

This paper proposes a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report and considers a special type of bug reports with stack-traces in bug reports, and proposes a variant of STMLOCator to tailor for such bug reports.



Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N)

The new model, HyLoc, with a combination of the features built from DNN, rVSM, and project's bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques.

Version history, similar report, and structure: putting them together for improved bug localization

A new method for locating relevant buggy files that puts together version history, similar reports, and structure is proposed, and a large-scale experiment is performed on four open source projects to localize more than 3,000 bugs.

Learning to rank relevant files for bug reports using domain knowledge

An adaptive ranking approach that leverages domain knowledge through functional decompositions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history is introduced.

Improving bug localization using structured information retrieval

This work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice and presents BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available.

Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

This paper presents an LDA-based static technique for automating bug localization and demonstrates the technique performs at least as well as the LSI-based techniques for all bugs and performs better, often significantly so, than the L SI- based techniques for most bugs.

Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports

The results show that BugLocator can effectively locate the files where the bugs should be fixed, and outperforms existing state-of-the-art bug localization methods.

Toward Deep Learning Software Repositories

This work motivate deep learning for software language modeling, highlighting fundamental differences between state-of-the-practice software language models and connectionist models, and proposes avenues for future work, where deep learning can be brought to bear to support model-based testing, improve software lexicons, and conceptualize software artifacts.

Populating a Release History Database from version control and bug tracking systems

An approach is introduced for populating a release history database that combines version data with bug tracking data and adds missing data not covered by version control systems such as merge points to obtain meaningful views showing the evolution of a software project.

Practitioners' expectations on automated fault localization

An empirical study is performed by surveying practitioners from more than 30 countries across 5 continents about their expectations of research in fault localization and investigates a number of factors that impact practitioners' willingness to adopt a fault localization technique.

Convolutional Neural Networks over Tree Structures for Programming Language Processing

A novel tree-based convolutional neural network (TBCNN) is proposed for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information.