Reducing DNN labelling cost using surprise adequacy: an industrial case study for autonomous driving

@article{Kim2020ReducingDL,
  title={Reducing DNN labelling cost using surprise adequacy: an industrial case study for autonomous driving},
  author={Jinhan Kim and Jeongil Ju and Robert Feldt and Shin Yoo},
  journal={Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year={2020}
}
  • Jinhan KimJeongil Ju S. Yoo
  • Published 29 May 2020
  • Computer Science
  • Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set ofโ€ฆย 

Multimodal Surprise Adequacy Analysis of Inputs for Natural Language Processing DNN Models

  • Seah KimShin Yoo
  • Computer Science
    2021 IEEE/ACM International Conference on Automation of Software Test (AST)
  • 2021
An empirical evaluation of extended SA metrics with three NLP tasks and nine DNN models shows that, while unimodal SAs perform sufficiently well for text classification, multimodal SA can outperform unimmodal metrics.

Corner Case Data Description and Detection

A simple and novel approach aiming at corner case data detection via a specific metric developed on surprise adequacy (SA) which has advantages on capture data behaviors is proposed.

A Review and Refinement of Surprise Adequacy

This work developed and released a performance-optimized, but functionally equivalent, implementation of Surprise Adequacy, and proposes refined variants of the SA computation algorithm, aiming to further increase the evaluation speed.

Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion

Eight design requirements for DNN coverage criteria are summarized, taking into account distribution properties and practical concerns, and a new criterion, N EURA L C OVERAGE (NLC), is proposed that accurately describes how DNNs comprehend inputs via approximated distributions.

A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity

A novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones.

Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study)

DeepGini is proposed, a very fast and simple TIP for Deep Neural Networks that outperforms more elaborate techniques such as neuron- and surprise coverage and it is found that other comparable or even simpler baselines from the field of uncertainty quantification perform equally well as DeepGini.

Uncertainty Quantification for Deep Neural Networks: An Empirical Comparison and Usage Guidelines

The need for an empirical assessment method that can deal with the experimental setting in which supervisors are used is motivated, where accuracy of the DNN matters only as long as the supervisor lets the DLS continue to operate.

When and Why Test Generators for Deep Learning Produce Invalid Inputs: an Empirical Study

This paper investigates to what extent TIGs can generate valid inputs, according to both automated and human validators, and shows that 84% arti๏ฌcially generated inputs are valid, but their expected label is not always preserved.

Anomaly Detection in Driving by Cluster Analysis Twice

This paper validated the performance of the method namely Anomaly Detection in Driving by Cluster Analysis Twice (ADDCAT) which clusters the processed sensor data in different physical properties on an open dataset.

Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing

The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs.

References

SHOWING 1-10 OF 23 REFERENCES

DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars

DeepTest is a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes and systematically explore different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons.

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving with an overview of on-board sensors on test vehicles, open datasets, and background information.

A Comparative Study of Real-Time Semantic Segmentation for Autonomous Driving

A real-time segmentation benchmarking framework and study various segmentation algorithms for autonomous driving and a generic meta-architecture via a decoupled design where different types of encoders and decoders can be plugged in independently.

Guiding Deep Learning System Testing Using Surprise Adequacy

  • Jinhan KimR. FeldtS. Yoo
  • Computer Science
    2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)
  • 2019
A novel test adequacy criterion is proposed, called Surprise Adequacy for Deep Learning Systems (SADL), which is based on the behaviour of DL systems with respect to their training data, and shows that systematic sampling of inputs based on their surprise can improve classification accuracy ofDL systems against adversarial examples by up to 77.5% via retraining.

CGNet: A Light-Weight Context Guided Network for Semantic Segmentation

This work proposes a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation, and develops CGNet which captures contextual information in all stages of the network.

The Power of Ensembles for Active Learning in Image Classification

It is found that ensembles perform better and lead to more calibrated predictive uncertainties, which are the basis for many active learning algorithms, and Monte-Carlo Dropout uncertainties perform worse.

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

DeepXplore efficiently finds thousands of incorrect corner case behaviors in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data.

A Survey of Autonomous Driving: Common Practices and Emerging Technologies

The technical aspect of automated driving is surveyed, with an overview of available datasets and tools for ADS development and many state-of-the-art algorithms implemented and compared on their own platform in a real-world driving setting.

SBST in the Age of Machine Learning Systems - Challenges Ahead

  • S. Yoo
  • Computer Science
    2019 IEEE/ACM 12th International Workshop on Search-Based Software Testing (SBST)
  • 2019
The fundamentals of software testing as well as the state of the art in Search Based Software Testing (SBST) are examined, and the challenges ahead are outlined while highlighting areas where SBST can shine.

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

A fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints, which outperforms all the current efficient CNN networks such as MobileNet, ShuffleNet, and ENet on both standard metrics and the newly introduced performance metrics that measure efficiency on edge devices.