Using latent semantic analysis to identify similarities in source code to support program understanding

@article{Maletic2000UsingLS,
  title={Using latent semantic analysis to identify similarities in source code to support program understanding},
  author={Jonathan I. Maletic and Andrian Marcus},
  journal={Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000},
  year={2000},
  pages={46-53}
}
  • J. Maletic, A. Marcus
  • Published 2000
  • Computer Science
  • Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000
The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation… Expand
Semantic Clustering: Identifying Topics in Source Code To appear in Journal on Information Systems and Technologies
TLDR
Semantic Clustering is introduced, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary that interpret them as linguistic topics that reveal the intention of the code. Expand
Semantic clustering: Identifying topics in source code
TLDR
Semantic Clustering is introduced, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary that interpret them as linguistic topics that reveal the intention of the code. Expand
Discrete Characterization of Domain Using Semantic Clustering
TLDR
The mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code, to understand software as a whole is proposed. Expand
Semantic driven program analysis
  • A. Marcus
  • Computer Science
  • 20th IEEE International Conference on Software Maintenance, 2004. Proceedings.
  • 2004
TLDR
The paper advocates for the use of latent semantic indexing as the underlying support for the semantic driven analysis of existing software systems to support program understanding and software various maintenance tasks, such as recovery of traceability links between documentation and source code. Expand
Semantic Clustering: Making Use of Linguistic Information to Reveal Concepts in So
TLDR
Semantic Clustering is introduced, an algorithm to group source artifacts based on how they use similar terms, which works at the source code textual level which makes it language independent. Expand
Supporting program comprehension using semantic and structural information
  • J. Maletic, A. Marcus
  • Computer Science
  • Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001
  • 2001
Focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems.Expand
Recovering documentation-to-source-code traceability links using latent semantic indexing
  • A. Marcus, J. Maletic
  • Computer Science
  • 25th International Conference on Software Engineering, 2003. Proceedings.
  • 2003
TLDR
The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation. Expand
Enriching reverse engineering with semantic clustering
TLDR
This paper analyzes how semantics of the source code are spread over the source artifacts using latent semantic indexing, an information retrieval technique that cluster artifacts that use similar terms, and reveals the most relevant terms for the computed clusters. Expand
Identifying domain expertise of developers from source code
TLDR
The analysis first derives documents from source code by discarding all the programming language constructs, and KMeans clustering is further used to cluster documents and extract closely related concepts. Expand
A Topic Modeling Based Solution for Confirming Software Documentation Quality
TLDR
Latent Dirichlet Allocation and HELLINGER DISTANCE are used to compute the similarities between the fragments of source code and the external documentation topics and this approach yields state-of-the-art performance in evaluating and confirming the existing external documentation quality and superiority. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Automatic software clustering via Latent Semantic Analysis
TLDR
Applying Latent Semantic Analysis to the domain of source code and internal documentation for the support of software reuse is a new application of this method and a departure from the normal application domain of natural language. Expand
An approach to program understanding by natural language understanding
TLDR
A knowledge-based, natural language processing approach to the automated understanding of object-oriented code as an aid to the reuse of object -oriented code is described and a system that implements the approach is examined. Expand
Automatically Identifying Reusable OO Legacy Code
TLDR
In developing the Patricia system, the developers had to overcome the problems of syntactically parsing natural language comments and syntactical analyzing identifiers-all prior to a semantic understanding of the comments and identifiers. Expand
Latent Semantic Indexing (LSI) and TREC-2
TLDR
LSI is an extension of the vector retrieval method in which the dependencies between terms are explicitly taken into account in the representation and exploited in retrieval by simultaneously modeling all the interrelationships among terms and documents. Expand
Indexing by Latent Semantic Analysis
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”)Expand
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. AExpand
Program understanding and the concept assignment problem
TLDR
A central hypothesis of this paper is that a parsingoriented recognition model based on formal, predominately structural patterns of programming language features is necessary but insufficient for the general concept assignment problem. Expand
How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans
TLDR
An exploratory approach was provided by asking humans to judge the quality and quantity of knowledge conveyed by short student essays on scientific topics and comparing the interrater reliability and predictive accuracy of their estimates with the performance of a corpus-based statistical model that takes no account of word order within an essay. Expand
A survey of information retrieval and filtering methods
TLDR
This work surveys the major techniques for information retrieval and discusses attempts to include semantic information natural language processing latent semantic indexing and neural networks. Expand
Using Linear Algebra for Intelligent Information Retrieval
TLDR
A lexical match between words in users’ requests and those in or assigned to documents in a database helps retrieve textual materials from scientific databases. Expand
...
1
2
3
...