Traceability Support for Multi-Lingual Software Projects

  title={Traceability Support for Multi-Lingual Software Projects},
  author={Yalin Liu and Jinfeng Lin and Jane Cleland-Huang},
  journal={Proceedings of the 17th International Conference on Mining Software Repositories},
Software traceability establishes associations between diverse software artifacts such as requirements, design, code, and test cases. Due to the non-trivial costs of manually creating and maintaining links, many researchers have proposed automated approaches based on information retrieval techniques. However, many globally distributed software projects produce software artifacts written in two or more languages. The use of intermingled languages reduces the efficacy of automated tracing… 

Figures and Tables from this paper

Topic modeling in software engineering research
LDA and LDA-based techniques are the most frequent topic modeling techniques, and developer communication and bug reports have been modelled most, while data pre-processing and modeling parameters vary quite a bit and are often vaguely reported.
Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data
This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series, and provides guidelines/best practices for researchers utilizing time- based data from Git repositories.
Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models
A novel framework called Trace BERT (T-BERT) is proposed to generate trace links between source code and natural language artifacts and outperform classical IR trace models and is applied to recover links between issues and commits in Open Source Projects.


An Improved Approach to Traceability Recovery Based on Word Embeddings
This paper proposes a novel approach WELR, based on word embeddings and learning to rank to recover traceability links, which outperforms the state-of-the-art method that works under the same conditions.
Traceability in the Wild: Automatically Augmenting Incomplete Trace Links
This paper addresses the fundamental problem of missing links between commits and issues by leveraging a combination of process and text-related features characterizing issues and code changes to train a classifier to identify missing issue tags in commit messages, thereby generating the missing links.
Recovering Traceability Links between Code and Documentation
A probabilistic and a vector space information retrieval model is applied in two case studies to trace C++ source code onto manual pages and Java code to functional requirements to recover traceability links between source code and free text documents.
Software traceability with topic modeling
An automated technique that combines traceability with a machine learning technique known as topic modeling is proposed that automatically records traceability links during the software development process and learns a probabilistic topic model over artifacts.
Semantically Enhanced Software Traceability Using Deep Learning Techniques
A tracing network architecture that utilizes Word Embedding and Recurrent Neural Network models to generate trace links and significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.
Trustrace: Mining Software Repositories to Improve the Accuracy of Requirement Traceability Links
It is shown that mining software repositories and combining mined results with IR techniques can improve the accuracy (precision and recall) of IR techniques and the proposed Trustrace, a trust--based traceability recovery approach is proposed.
Recovering traceability links in software artifact management systems using information retrieval methods
An artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique, is improved and it is shown that such tools can help to identify quality problems in the textual description of traced artifacts.
Automated Techniques for Capturing Custom Traceability Links Across Heterogeneous Artifacts
Focusing on the mobile phone case study, this chapter illustrates how users can integrate their custom filters, heuristics, and relationship types, as well as their existing development tools, into the traceability system.
Linguistic Challenges in Global Software Development: Lessons Learned in an International SW Development Division
  • Benedikt Lutz
  • Linguistics
    2009 Fourth IEEE International Conference on Global Software Engineering
  • 2009
The concept of ELF (English as a lingua franca) is presented, which is steadily gaining importance in applied linguistics research, and the practical challenges of using English as a non-native language in international collaboration are outlined.
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
A novel word representation learning model called Bilingual Word Embeddings Skip-Gram (BWESG) is presented which is the first model able to learn bilingual word embeddings solely on the basis of document-aligned comparable data.