Querying source code with natural language

  title={Querying source code with natural language},
  author={Markus Kimmig and Monperrus Martin and Mira Mezini},
  journal={2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011)},
One common task of developing or maintaining software is searching the source code for information like specific method calls or write accesses to certain fields. This kind of information is required to correctly implement new features and to solve bugs. This paper presents an approach for querying source code with natural language. 

Figures from this paper

Development Task Query in Natural Language Translation to a Code Query Engine Displaying the Search Results
An approach for querying source code with a natural language interface that enables the developer to execute a huge range of precise searches while being as easy and intuitive to use as writing natural language.
Leveraging a corpus of natural language descriptions for program similarity
The approach can determine semantic relatedness and similarity of code across different libraries and even across different programming languages, a task considered extremely difficult using traditional approaches.
Code Similarity via Natural Language Descriptions
The main idea is that the relationship between code and its textual descriptions as established in question-answering sites can be used to determine semantic relatedness and similarity, of code fragments across different programming languages, a task considered extremely difficult using traditional approaches.
Improving feature location by transforming the query from natural language into requirements
A feature location approach that transforms a natural language query to a query that is made up of the requirements that are located as relevant, and limits the scope of the code search space by selecting only the code of those products that hold relevant requirements.
Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes
This paper presents domain-specific language features inspired by object-oriented visitors and provides a default depth-first traversal strategy along with two expressions for defining custom traversals in Java code and provides an implementation in the Boa infrastructure for software repository mining and describes a code generation strategy into Java code.
Verifiable Source Code Documentation in Controlled Natural Language
Effective Reformulation of Query for Code Search Using Crowdsourced Knowledge and Extra-Large Data Analytics
  • M. M. Rahman, C. Roy
  • Computer Science
    2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)
  • 2018
A novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search is proposed.
What Is the Cube Root of 27? Question Answering Over CodeOntology
Experimental results show that the unsupervised approach to process natural language questions that cannot be answered by factual question answering nor advanced data querying is comparable with other state-of-the-art proprietary systems, such as the closed-source WolframAlpha computational knowledge engine.
Evaluating a query framework for software evolution data
The results of the evaluation show that the query interface can outperform classical software engineering tools in terms of correctness, while yielding significant time savings to its users and greatly advancing the state of the art in Terms of usability and learnability.
Automatic Reformulation of Query for Code Search using Crowdsourced Knowledge
A novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search by exploiting keyword-API associations from the questions and answers of Stack Overflow.


Automatically capturing source code context of NL-queries for software maintenance and reuse
A novel approach is presented that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy and significantly outperforms the most closely related technique in terms of effort and effectiveness.
Supporting developers with natural language queries
A framework to query for information about a software system using guided-input natural language resembling plain English is presented, which model data extracted by classical software analysis tools with an OWL ontology and use knowledge processing technologies from the Semantic Web to query it.
codeQuest: Scalable Source Code Queries with Datalog
This paper describes a source code querying tool, named codeQuest, which combines two previous proposals, namely the use of logic programming and database systems, and uses safe Datalog, which was originally introduced in the theory of databases.
Navigating and querying code without getting lost
A source browsing tool that improves the developer's ability to work with crosscutting concerns by providing better support for exploring code and avoiding disorienting view switches is presented.
Answering conceptual queries with Ferret
  • B. D. Alwis, G. Murphy
  • Computer Science
    2008 ACM/IEEE 30th International Conference on Software Engineering
  • 2008
A model that supports the integration of different sources of information about a program is presented that enables the results of concrete queries in separate tools to be brought together to directly answer many of a programmer's conceptual queries.
An approach to detecting duplicate bug reports using natural language and execution information
The experimental results show that the approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.
Questions programmers ask during software evolution tasks
What information a programmer needs to know about a code base while performing a change task and also on how they go about discovering that information are cataloged and categorized.
Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior
The Whyline is a new kind of debugging tool that enables developers to select a question about program output from a set of why did and why didn’t questions derived from the program’s code and execution.
Pattern Recognition and Machine Learning (Information Science and Statistics)
Looking for competent reading resources? We have pattern recognition and machine learning information science and statistics to read, not only read, but also download them or even check out online.
Pattern Recognition and Machine Learning
This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.