Machine Learning Testing: Survey, Landscapes and Horizons

@article{Zhang2022MachineLT,
  title={Machine Learning Testing: Survey, Landscapes and Horizons},
  author={Jie M. Zhang and Mark Harman and Lei Ma and Yang Liu},
  journal={IEEE Transactions on Software Engineering},
  year={2022},
  volume={48},
  pages={1-36}
}
  • J. Zhang, M. Harman, +1 author Yang Liu
  • Published 19 June 2019
  • Computer Science, Mathematics
  • IEEE Transactions on Software Engineering
This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets… 
A Review on Oracle Issues in Machine Learning
TLDR
A survey of the oracle issues found in machine learning and state-of-the-art solutions for dealing with these issues are presented, as well as lines of research for differential testing, metamorphic testing, and test coverage.
Combinatorial Testing Metrics for Machine Learning
TLDR
A set difference metric for comparing machine learning (ML) datasets is defined and the difference between datasets be a function of combinatorial coverage and its utility is illustrated for transfer learning without retraining.
On Using Decision Tree Coverage Criteria forTesting Machine Learning Models
Over the past decade, there has been a growing interest in applying machine learning (ML) to address a myriad of tasks. Owing to this interest, the adoption of ML-based systems has gone mainstream.
Oracle Issues in Machine Learning and Where to Find Them
TLDR
The need for software engineering strategies that especially target and assess the oracle is illustrated, beyond existing ML testing efforts, by employing two heuristics based on information entropy and semantic analysis on well-known computer vision models and benchmark data from ImageNet.
Testing machine learning based systems: a systematic mapping
TLDR
A systematic mapping study about testing techniques for MLSs driven by 33 research questions and investigated multiple aspects of the testing approaches, such as the used/proposed adequacy criteria, the algorithms for test input generation, and the test oracles.
Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?
TLDR
An empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop, finds that most of the machineLearning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage and mutation score.
Boundary sampling to boost mutation testing for deep learning models
Abstract Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is
Property-Based Testing for Parameter Learning of Probabilistic Graphical Models
TLDR
This research work describes a concrete use of property-based testing for quality assurance in the parameter learning algorithm of a probabilistic graphical model and the necessity and effectiveness of this method in comparison to unit tests is analyzed with concrete code examples for enhanced retraceability and interpretability.
Machine Learning Application Development: Practitioners' Insights
TLDR
The reported challenges and best practices of ML application development are synthesized into 17 findings to inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score
TLDR
This paper describes an approach to automatically generate new test inputs that can be used to augment the existing test set so that its capability to detect DL mutations increases, and implements a search based input generation strategy.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 307 REFERENCES
Dataset Coverage for Testing Machine Learning Computer Programs
TLDR
A systematic method to derive a set of metamorphic properties for machine learning classifiers, support vector machines is proposed and includes a new notion of test coverage for the machine learning programs; this test coverage provides a clear guideline for conducting a series of meetamorphic testing.
On Testing Machine Learning Programs
TLDR
This comprehensive review of software testing practices for Machine learning models will help ML engineers identify the right approach to improve the reliability of their ML-based systems.
A Survey of Software Quality for Machine Learning Applications
TLDR
A survey of software quality for ML applications to consider the quality of ML applications as an emerging discussion and indicated key areas, such as deep learning, fault localization, and prediction, to be researched with software engineering and testing.
Multiple-Implementation Testing of Supervised Learning Software
TLDR
This paper presents a novel black-box approach of multiple-implementation testing for supervised learning software, and derives a pseudo-oracle for a test input by running the test input on n implementations of the supervised learning algorithm, and then using the common test output produced by a majority of these n implementations.
Test Selection for Deep Learning Systems
TLDR
An in-depth empirical comparison of a set of test selection metrics based on the notion of model uncertainty, which shows that uncertainty-based metrics have a strong ability to identify misclassified inputs, being three times stronger than surprise adequacy and outperforming coverage-related metrics.
Testing and validating machine learning classifiers by metamorphic testing
TLDR
This paper presents a technique for testing the implementations of machine learning classification algorithms which support such applications, based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem.
Testing Machine Learning Algorithms for Balanced Data Usage
  • Arnab Sharma, H. Wehrheim
  • Computer Science
    2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)
  • 2019
TLDR
This paper develops a (metamorphic) testing approach called TiLe for checking balanced data usage, applied on 14 ML classifiers taken from the scikit-learn library using 4 artificial and 9 real-world data sets for training, finding 12 of the classifiers to be unbalanced.
DeepMutation: Mutation Testing of Deep Learning Systems
  • L. Ma, Fuyuan Zhang, +8 authors Yadong Wang
  • Computer Science
    2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE)
  • 2018
TLDR
This paper proposes a mutation testing framework specialized for DL systems to measure the quality of test data, and designs a set of model-level mutation operators that directly inject faults into DL models without a training process.
Generalized Oracle for Testing Machine Learning Computer Programs
TLDR
This paper studies how a notion of oracles is elaborated so that these programs can be tested, and shows a systematic way of deriving testing properties from mathematical formulations of given machine learning problems.
An Approach to Software Testing of Machine Learning Applications
TLDR
A software testing approach aimed at addressing the challenge of testing implementations of two different ML ranking algorithms: Support Vector Machines and MartiRank is described.
...
1
2
3
4
5
...