Distribution Awareness for AI System Testing

@article{Berend2021DistributionAF,
  title={Distribution Awareness for AI System Testing},
  author={David Berend},
  journal={2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)},
  year={2021},
  pages={96-98}
}
  • David Berend
  • Published 1 May 2021
  • Computer Science
  • 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
As Deep Learning (DL) is continuously adopted in many safety critical applications, its quality and reliability start to raise concerns. Similar to the traditional software development process, testing the DL software to uncover its defects at an early stage is an effective way to reduce risks after deployment. Although recent progress has been made in designing novel testing techniques for DL software, the distribution of generated test data is not taken into consideration. It is therefore… 

Figures and Tables from this paper

Hierarchical Distribution-Aware Testing of Deep Learning

A new robustness testing approach for detecting AEs that considers both the input distribution and the perceptual quality of inputs and is superior to state-of-the-arts that either disregard any input distribution or only consider a single (non-hierarchical) distribution.

Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance

An overall assurance framework for LESs is presented with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics, and a novel model-agnostic Reliability Assessment Model for ML classifiers that utilises the operational profile and robustness verification evidence.

Reliability Assessment and Safety Arguments for Machine Learning Components in Assuring Learning-Enabled Autonomous Systems

An overall assurance framework for LES is presented with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics, and a novel model-agnostic Reliability Assessment Model for ML classifiers that utilises the operational profile and robustness verification evidence.

References

SHOWING 1-10 OF 35 REFERENCES

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

DeepXplore efficiently finds thousands of incorrect corner case behaviors in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data.

Deep Anomaly Detection with Outlier Exposure

In extensive experiments on natural language processing and small- and large-scale vision tasks, it is found that Outlier Exposure significantly improves detection performance and that cutting-edge generative models trained on CIFar-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; OE is used to mitigate this issue.

Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam - e-traces

SAN FRANCISCO — Arizona officials saw opportunity when Uber and other companies began testing driverless cars a few years ago. Promising to keep oversight light, they invited the companies to test…

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality

This work proposes a statistically principled, easy-to-implement test using the empirical distribution of model likelihoods to determine whether or not inputs reside in the typical set, only requiring that the likelihood can be computed or closely approximated.

Input complexity and out-of-distribution detection with likelihood-based generative models

This paper uses an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison, and finds such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.

DeepStellar: model-based quantitative analysis of stateful deep learning systems

This paper model RNN as an abstract state transition system to characterize its internal behaviors and designs two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs, which are evaluated on four RNN-based systems covering image classification and automated speech recognition.

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs, is proposed and a metamorphic mutation strategy to generate new semantically preserved tests is proposed, and multiple extensible coverage criteria as feedback to guide the test generation.

Likelihood Ratios for Out-of-Distribution Detection

This work investigates deep generative model based approaches for OOD detection and observes that the likelihood score is heavily affected by population level background statistics, and proposes a likelihood ratio method forDeep generative models which effectively corrects for these confounding background statistics.

Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity

This paper proposes a simple, hyperparameter-free method based on softmax of scaled cosine similarity, which resembles the approach employed by modern metric learning methods, but it differs in details; the differences are essential to achieve high detection performance.

Analysis of Confident-Classifiers for Out-of-distribution Detection

This paper suggests training a classifier by adding an explicit "reject" class for OOD samples by minimizing the standard cross-entropy loss on in-distribution samples and minimizing the KL divergence between the predictive distribution of Ood samples in the low-density regions of in-Distribution and the uniform distribution.