Learn More
We have mapped defects from the bug database of Eclipse (one of the largest open-source projects) to source code locations. The resulting data set lists the number of pre- and post-release defects for every package and file in the Eclipse releases 2.0, 2.1, and 3.0. We additionally annotated the data with common complexity metrics. All data is publicly(More)
We apply data mining to version histories in order toguide programmers along related changes: "Programmerswho changed these functions also changed. . . ". Given aset of existing changes, such rules (a) suggest and predictlikely further changes, (b) show up item coupling that is indetectableby program analysis, and (c) prevent errors dueto incomplete(More)
—Given some test case, a program fails. Which circumstances of the test case are responsible for the particular failure? The Delta Debugging algorithm generalizes and simplifies some failing test case to a minimal test case that still produces the failure; it also isolates the difference between a passing and a failing test case. In a case study, the(More)
As a software system evolves, programmers make changes that sometimes cause problems. We analyze CVS archives for <i>fix-inducing changes</i>---changes that lead to problems, indicated by fixes. We show how to automatically locate fix-inducing changes by linking a version archive (such as CVS) to a bug database (such as BUGZILLA). In a first investigation(More)
Which is the defect that causes a software failure? By comparing the program states of a failing and a passing run, we can identify the <i>state differences</i> that cause the failure. However, these state differences can occur all over the program run. Therefore, we focus <i>in space</i> on those variables and values that are relevant for the failure, and(More)
We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that are likely to have faults: starting from the location of a known (fixed) fault, we cache the location(More)
How do we know a program does what it claims to do? After clustering Android apps by their description topics, we identify outliers in each cluster with respect to their API usage. A "weather" app that sends messages thus becomes an anomaly; likewise, a "messaging" app would typically not be expected to access the current location. Applied on a set of(More)