Automated Localization for Unreproducible Builds

@article{Ren2018AutomatedLF,
  title={Automated Localization for Unreproducible Builds},
  author={Zhilei Ren and He Jiang and J. Xuan and Zijiang James Yang},
  journal={2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)},
  year={2018},
  pages={71-81}
}
  • Zhilei Ren, He Jiang, Z. Yang
  • Published 19 March 2018
  • Computer Science
  • 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)
Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environme nts, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the… 
ConstBin: A Tool for Automatic Fixing of Unreproducible Builds
TLDR
ConstBin is an automated tool that captures unreproducible commands in a build process and automatically replace them with their fixing operations based on an extensible rules set, and is the first tool that fixes inconsistencies during build processes.
Root Cause Localization for Unreproducible Builds via Causality Analysis Over System Call Tracing
TLDR
RepTrace is proposed, a framework that leverages the uniform interfaces of system call tracing for monitoring executed build commands in diverse build environments and identifies the root causes for unreproducible builds by analyzing the system call traces of the executedBuild commands.
Identifying Bugs in Make and JVM-Oriented Builds
TLDR
This work presents buildfs, a generally-applicable model that takes into account the specification (as declared in build scripts) and the actual behavior (low-level file system operation) of build operations, and is the first to handle Java-oriented build systems.
Escaping dependency hell: finding build dependency errors with the unified dependency graph
TLDR
A new dependency graph is designed, the unified dependency graph (UDG), which leverages both static and dynamic information to uniformly encode the declared and actual dependencies between build targets and files and facilitates the efficient and precise detection of dependency errors via simple graph traversals.
DockerizeMe: Automatic Inference of Environment Dependencies for Python Code Snippets
  • Eric Horton, Chris Parnin
  • Computer Science
    2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)
  • 2019
TLDR
DockerizeMe is presented, a technique for inferring the dependencies needed to execute a Python code snippet without import error that resolves import errors in 892 out of nearly 3,000 gists from the Gistable dataset.
Reproducible Containers
TLDR
DetTrace is used to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows, and it is shown that, while software in each of these domains is initially irreproducible, DetTrace brings reproducibles without requiring any hardware, OS or application changes.
A model for detecting faults in build specifications
TLDR
This work presents BuildFS, a generally-applicable model that takes into account the specification of build executions and the actual behavior (low-level file system operation) of build operations, and is the first to handle JVM-oriented build systems.
The nature of build changes
TLDR
Detailed change information enables improvements of refactoring approaches for build configurations and improvements of prediction models to identify error-prone build files, and shows that build changes frequently occur around release days.
Revisiting the building of past snapshots - a replication and reproduction study
TLDR
It is validated that the most influential error causing failures in builds are missing external artifacts, and the less influential is compiling errors, and some facts that could lead to the effect of the build tool on past compilability are observed.
Compiler testing: a systematic literature analysis
TLDR
A literature analysis framework is proposed to gain insights into the compiler testing area and finds that the USA is the leading country that contains the most influential researchers and institutions.
...
1
2
...

References

SHOWING 1-10 OF 48 REFERENCES
Incorporating version histories in Information Retrieval based bug localization
  • Bunyamin Sisman, A. Kak
  • Computer Science
    2012 9th IEEE Working Conference on Mining Software Repositories (MSR)
  • 2012
TLDR
It is shown how version histories of a software project can be used to estimate a prior probability distribution for defect proneness associated with the files in a given version of the project, and these priors are used in an IR (Information Retrieval) framework to determine the posterior probability of a file being the cause of a bug.
Improving bug localization using structured information retrieval
TLDR
This work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice and presents BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available.
Comparing Incremental Latent Semantic Analysis Algorithms for Efficient Retrieval from Software Libraries for Bug Localization
TLDR
This paper presents an incremental framework to update the model parameters of the Latent Semantic Analysis (LSA) model as the data evolves and compares two state-of-the-art incremental SVD update techniques for LSA with respect to the retrieval accuracy and the time performance.
Learning to rank relevant files for bug reports using domain knowledge
TLDR
An adaptive ranking approach that leverages domain knowledge through functional decompositions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history is introduced.
Evaluating the usefulness of IR-based fault localization techniques
TLDR
The investigation shows that bug reports do not always contain rich information, and that low-quality bug reports can considerably affect the effectiveness of IR-based techniques, and also shows, through a user study, that high- quality bug reports benefit developers just as much as they benefit IR- based techniques.
Version history, similar report, and structure: putting them together for improved bug localization
TLDR
A new method for locating relevant buggy files that puts together version history, similar reports, and structure is proposed, and a large-scale experiment is performed on four open source projects to localize more than 3,000 bugs.
Potential biases in bug localization: do they matter?
TLDR
This paper analyses issue reports from three different projects: HTTPClient, Jackrabbit, and Lucene-Java to examine the impact of above three biases on bug localization, and shows that one of these biases significantly and substantially impacts bug localization results.
Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis
TLDR
This paper proposes to use segmentation and stack-trace analysis to improve the performance of bug localization by dividing each source code file into a series of segments and using the segment most similar to the bug report to represent the file.
Compositional Vector Space Models for Improved Bug Localization
TLDR
A genetic algorithm (GA) based approach to explore the space of possible compositions and output a heuristically near-optimal composite model that improves hit at 5, mean average precision (MAP), and mean reciprocal rank (MRR) scores of VSMnatural by 18.4%, 20.6%, and 10.5% respectively.
Using Bug Report Similarity to Enhance Bug Localisation
TLDR
The technique increases the number of bugs where the first relevant method presented to developers is the first result from 6 to 27, and those in the top-10 from 50 to 57, showing that it can be successfully used to enhance existing bug localisation techniques.
...
1
2
3
4
5
...