Coverage is not strongly correlated with test suite effectiveness

@article{Inozemtseva2014CoverageIN,
  title={Coverage is not strongly correlated with test suite effectiveness},
  author={Laura Inozemtseva and Reid Holmes},
  journal={Proceedings of the 36th International Conference on Software Engineering},
  year={2014}
}
The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did… 

Figures and Tables from this paper

Test suites effectiveness evolution in open source systems: empirical study
TLDR
The authors empirically explore three open-source software systems along with their 11 versions to study the correlation between test suite effectiveness, the size of the test suite, and coverage for three Java programs during their evolution.
Assertions are strongly correlated with test suite effectiveness
TLDR
The number of assertions in a test suite strongly correlates with its effectiveness, and this factor directly influences the relationship between test suite size and effectiveness.
Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness?
TLDR
This paper investigates the correlation between statement/assertion coverage and mutation score using both pseudo and original test suites and reveals that contrary to previously reported, statement coverage has a stronger correlation with mutation score than assertion coverage.
Code coverage for suite evaluation by developers
TLDR
Using suites from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.
Predicting Test Suite Effectiveness Using Static Analysis
TLDR
This work investigates whether metrics obtained from static analysis could predict test suite effectiveness, as measured with mutation testing, and shows that, when size is ignored, there is a correlation between statically estimated code coverage and effectiveness, however, when suites of equal sizes are compared the correlation drops significantly.
Test suite evaluation using code coverage based metrics
TLDR
A method for deeper understanding of a test suite and its relation to the program code it is intended to test is proposed and coherent logical groups which are easier to analyze and understand are decomposed.
Assessing the Test Suite of a Large System Based on Code Coverage, Efficiency and Uniqueness
TLDR
A recent approach for test suite assessment and improvement that utilizes code coverage information, but at a more detailed level, hence it adds further evaluation aspects derived from the coverage to analyze the test suite of a large scale industrial open source system containing 27 000 test cases.
Code coverage and test suite effectiveness: Empirical study with real bugs in large systems
TLDR
This paper analyzes two large software systems to measure the relationship of code coverage and its effectiveness in killing real bugs from the software systems and finds that there is indeed statistically significant correlation between code Coverage and bug kill effectiveness.
The Impact of Fault Type on the Relationship between Code Coverage and Fault Detection
Structural coverage criteria are commonly used to determine the adequacy of a test suite. However, studies investigating structural coverage and fault-finding capabilities have mixed results. Some
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Code coverage for suite evaluation by developers
TLDR
Using suites from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.
The influence of size and coverage on test suite effectiveness
TLDR
This work studies the relationship between three properties of test suites: size, structural coverage, and fault-finding effectiveness to indicate that coverage is sometimes correlated with effectiveness when size is controlled for, and that using both size and coverage yields a more accurate prediction of effectiveness than size alone.
Effect of test set size and block coverage on the fault detection effectiveness
TLDR
It is found that there is little or no reduction in the FDE of a test set when its size is reduced while the all-uses coverage is kept constant, suggesting, indirectly, that coverage is more correlated than the size with theFDE.
Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites
TLDR
This article presents the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states.
Using simulation for assessing the real impact of test-coverage on defect-coverage
TLDR
A procedure is proposed to investigate whether any test-co Coverage criterion has a genuine additional impact on defect-coverage when compared to the impact of just running additional test cases, and the results do not support the assumption of a causal dependency between test- coverage and defect- Coverage.
Comparing non-adequate test suites using coverage criteria
TLDR
A large set of plausible criteria, including statement and branch coverage, as well as stronger criteria used in recent studies are evaluated: branch coverage and an intra-procedural acyclic path coverage perform best.
Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria
TLDR
An experimental study investigating the effectiveness of two code-based test adequacy criteria for identifying sets of test cases that detect faults found that tests based respectively on control-flow and dataflow criteria are frequency complementary in their effectiveness.
Further empirical studies of test effectiveness
TLDR
An empirical evaluation of the fault-detecting ability of two white-box software testing techniques: decision coverage (branch testing) and the all-uses data flow testing criterion supports the belief that these testing techniques can be more effective than random testing.
The effect of code coverage on fault detection under different testing profiles
TLDR
This study hypothesizes that the estimation of code coverage on testing effectiveness varies under different testing profiles, and employs coverage testing and mutation testing in this experiment to investigate the relationship between code coverage and fault detection capability under differentTesting profiles.
An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing
TLDR
An experiment comparing the effectiveness of the all-uses and all-edges test data adequacy criteria is discussed, and error exposing ability was shown to be strongly positively correlated to percentage of covered definition-use associations in only four of the nine subjects.
...
...