Querying source code with natural language

  title={Querying source code with natural language},
  author={Markus Kimmig and Monperrus Martin and Mira Mezini},
  journal={2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011)},
One common task of developing or maintaining software is searching the source code for information like specific method calls or write accesses to certain fields. This kind of information is required to correctly implement new features and to solve bugs. This paper presents an approach for querying source code with natural language. 

Figures from this paper

Development Task Query in Natural Language Translation to a Code Query Engine Displaying the Search Results
An approach for querying source code with a natural language interface that enables the developer to execute a huge range of precise searches while being as easy and intuitive to use as writing natural language.
Leveraging a corpus of natural language descriptions for program similarity
The approach can determine semantic relatedness and similarity of code across different libraries and even across different programming languages, a task considered extremely difficult using traditional approaches.
Improving feature location by transforming the query from natural language into requirements
A feature location approach that transforms a natural language query to a query that is made up of the requirements that are located as relevant, and limits the scope of the code search space by selecting only the code of those products that hold relevant requirements.
Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes
This paper presents domain-specific language features inspired by object-oriented visitors and provides a default depth-first traversal strategy along with two expressions for defining custom traversals in Java code and provides an implementation in the Boa infrastructure for software repository mining and describes a code generation strategy into Java code.
Effective Reformulation of Query for Code Search Using Crowdsourced Knowledge and Extra-Large Data Analytics
  • M. M. Rahman, C. Roy
  • Computer Science
    2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)
  • 2018
A novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search is proposed.
What Is the Cube Root of 27? Question Answering Over CodeOntology
Experimental results show that the unsupervised approach to process natural language questions that cannot be answered by factual question answering nor advanced data querying is comparable with other state-of-the-art proprietary systems, such as the closed-source WolframAlpha computational knowledge engine.
Evaluating a query framework for software evolution data
The results of the evaluation show that the query interface can outperform classical software engineering tools in terms of correctness, while yielding significant time savings to its users and greatly advancing the state of the art in Terms of usability and learnability.
Automatic Reformulation of Query for Code Search using Crowdsourced Knowledge
A novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search by exploiting keyword-API associations from the questions and answers of Stack Overflow.
Automatic query reformulation for code search using crowdsourced knowledge
Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin, and investigation using three web/code search engines shows that the technique can significantly improve their results in the context of code search.
Integrating source code search into git client for effective retrieving of change history
This paper presents MJgit, a prototype tool for integrating a source code search technique into Git commands that manipulate historical data in a repository and conducts a performance experiment using actual software repositories.


Automatically capturing source code context of NL-queries for software maintenance and reuse
A novel approach is presented that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy and significantly outperforms the most closely related technique in terms of effort and effectiveness.
codeQuest: Scalable Source Code Queries with Datalog
This paper describes a source code querying tool, named codeQuest, which combines two previous proposals, namely the use of logic programming and database systems, and uses safe Datalog, which was originally introduced in the theory of databases.
Navigating and querying code without getting lost
A source browsing tool that improves the developer's ability to work with crosscutting concerns by providing better support for exploring code and avoiding disorienting view switches is presented.
Questions programmers ask during software evolution tasks
What information a programmer needs to know about a code base while performing a change task and also on how they go about discovering that information are cataloged and categorized.
Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior
The Whyline is a new kind of debugging tool that enables developers to select a question about program output from a set of why did and why didn’t questions derived from the program’s code and execution.
Pattern Recognition and Machine Learning (Information Science and Statistics)
Looking for competent reading resources? We have pattern recognition and machine learning information science and statistics to read, not only read, but also download them or even check out online.
Pattern Recognition and Machine Learning
This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
I and J
Supporting developers with natural language queries
A framework to query for information about a software system using guided-input natural language resembling plain English is presented, which model data extracted by classical software analysis tools with an OWL ontology and use knowledge processing technologies from the Semantic Web to query it.
An approach to detecting duplicate bug reports using natural language and execution information
The experimental results show that the approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.