Figures from this paper
497 Citations
Discrete Characterization of Domain Using Semantic Clustering
- Computer Science
- 2010
The mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code, to understand software as a whole is proposed.
Extracting High-Level Concepts from Open-Source Systems
- Computer Science
- 2015
This paper extracts topic models from the textual content of source code by conducting a case study on the source code of Java-based open-source systems, ArgoUML, Checkstyle, JHotDraw and jEdit, and investigates the effectiveness of LDA in comprehending large open- source software systems.
On the Effect of Semantically Enriched Context Models on Software Modularization
- Computer ScienceArt Sci. Eng. Program.
- 2018
The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis.
Identifying domain expertise of developers from source code
- Computer ScienceKDD
- 2008
The analysis first derives documents from source code by discarding all the programming language constructs, and KMeans clustering is further used to cluster documents and extract closely related concepts.
Topic modeling of public repositories at scale using names in source code
- Computer ScienceArXiv
- 2017
The goal of this paper is to apply topic modeling to names used in over 13.6 million repositories and perceive the inferred topics through data analysis together with open-access to the source code, tools and datasets.
Investigating the use of lexical information for software system clustering
- Computer Science2011 15th European Conference on Software Maintenance and Reengineering
- 2011
This paper explores the contribution of the combined use of six different dictionaries corresponding to the six parts of the source code where programmers introduce lexical information, namely: class, attribute, method and parameter names, comments, and source code statements.
Estimating Semantic Relatedness in Source Code
- Computer ScienceACM Trans. Softw. Eng. Methodol.
- 2015
Normalized Software Distance (nsd), an information-theoretic method that captures semantic relatedness in source code by exploiting the distributional cues of code terms across the system, is proposed.
Identifying Semantic Outliers of Source Code Artifacts and Their Application to Software Architecture Recovery
- Computer ScienceIEEE Access
- 2020
A novel measure Conceptual Conformity (CC) is proposed, which computes the similarity between two latent topic distributions obtained from both the source code and its package, and is used to identify source code that is not relevant to the package’s semantic context and define it as a semantic outlier.
Supporting program comprehension with program summarization
- Computer Science2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)
- 2014
This paper proposes to use latent semantic indexing and clustering to group source artifacts with similar vocabulary to analyze the composition of each package in the program and employs Minipar, a nature language parser, to help generate the summaries.
Using Developers Contributions on Software Vocabularies to Identify Experts
- Computer Science2015 12th International Conference on Information Technology - New Generations
- 2015
Results confirm similarity between vocabularies might be explored to point out code experts and can recommend among current team members one whose vocabulary is closest to the entity for orphaned entities.
References
SHOWING 1-10 OF 50 REFERENCES
Semantic Clustering: Making Use of Linguistic Information to Reveal Concepts in So
- Computer Science
- 2006
Semantic Clustering is introduced, an algorithm to group source artifacts based on how they use similar terms, which works at the source code textual level which makes it language independent.
Enriching reverse engineering with semantic clustering
- Computer Science12th Working Conference on Reverse Engineering (WCRE'05)
- 2005
This paper analyzes how semantics of the source code are spread over the source artifacts using latent semantic indexing, an information retrieval technique that cluster artifacts that use similar terms, and reveals the most relevant terms for the computed clusters.
Using latent semantic analysis to identify similarities in source code to support program understanding
- Computer ScienceProceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000
- 2000
The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation to assist in the understanding of a nontrivial software system, namely a version of Mosaic.
Recovering Traceability Links between Code and Documentation
- Computer ScienceIEEE Trans. Software Eng.
- 2002
A probabilistic and a vector space information retrieval model is applied in two case studies to trace C++ source code onto manual pages and Java code to functional requirements to recover traceability links between source code and free text documents.
Identification of high-level concept clones in source code
- Computer ScienceProceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)
- 2001
The intention of the approach is to enhance and augment existing clone detection methods that are based on structural analysis and improve the quality of clone detection.
Extracting concepts from file names; a new file clustering criterion
- Computer ScienceProceedings of the 20th International Conference on Software Engineering
- 1998
This work discusses techniques for extracting concepts (abbreviations) from a more informal source of information: file names and shows by experiment that the techniques proposed allow about 90% of the abbreviations to be found automatically.
MUDABlue: an automatic categorization system for open source repositories
- Computer Science11th Asia-Pacific Software Engineering Conference
- 2004
Recovering documentation-to-source-code traceability links using latent semantic indexing
- Computer Science25th International Conference on Software Engineering, 2003. Proceedings.
- 2003
The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.
An information retrieval approach to concept location in source code
- Computer Science11th Working Conference on Reverse Engineering
- 2004
This work addresses the problem of concept location using an advanced information retrieval method, Latent Semantic Indexing (LSI), used to map concepts expressed in natural language by the programmer to the relevant parts of the source code.
The conceptual cohesion of classes
- Computer Science21st IEEE International Conference on Software Maintenance (ICSM'05)
- 2005
A new set of measures for the cohesion of individual classes within an OO software system is proposed, based on the analysis of the semantic information embedded in the source code, such as comments and identifiers.