Corpus ID: 222178157

Characterising Bias in Compressed Models

@article{Hooker2020CharacterisingBI,
  title={Characterising Bias in Compressed Models},
  author={Sara Hooker and Nyalleng Moorosi and Gregory Clark and Samy Bengio and Emily L. Denton},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.03058}
}
The popularity and widespread use of pruning and quantization is driven by the severe resource constraints of deploying deep neural networks to environments with strict latency, memory and energy requirements. These techniques achieve high levels of compression with negligible impact on top-line metrics (top-1 and top-5 accuracy). However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE). We further… Expand

Figures and Tables from this paper

Reliable Model Compression via Label-Preservation-Aware Loss Functions
TLDR
This work presents a framework that uses a teacher-student learning paradigm to better preserve labels and demonstrates the effectiveness of the approach both quantitatively and qualitatively on multiple compression schemes and accuracy recovery algorithms using a set of 8 different real-world network architectures. Expand
Estimating Example Difficulty using Variance of Gradients
TLDR
This work proposes Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution and provides quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Expand
Generative Zero-shot Network Quantization
TLDR
This work shows that, for high-level image recognition tasks, it can further reconstruct “realistic” images of each category by leveraging intrinsic Batch Normalization statistics without any training data. Expand
Algorithmic Factors Influencing Bias in Machine Learning
TLDR
This paper demonstrates how ML algorithms can misrepresent the training data through underestimation, and shows how irreducible error, regularization and feature and class imbalance can contribute to this underestimation. Expand
In Defense of the Paper
TLDR
It is argued that the root cause of hindrances in the accessibility of machine learning research lies not in the paper workflow but within the misaligned incentives behind the publishing and research processes and that the paper is the optimal workflow. Expand
Arabic Compact Language Modelling for Resource Limited Devices
Natural language modelling has gained a lot of interest recently. The current state-of-the-art results are achieved by first training a very large language model and then fine-tuning it on multipleExpand
A Tale Of Two Long Tails
TLDR
The results show that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty, suggesting that well the rate of learning in the presence of additional information differs between atypical and noisy examples. Expand
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
TLDR
This work demonstrates that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics based on lexical overlap, and shows that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning. Expand
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
TLDR
Two new metrics, label loyalty and probability loyalty, are proposed that measure how closely a compressed model mimics the original model and the effect of compression with regard to robustness under adversarial attacks is explored. Expand
Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?
TLDR
A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 45 REFERENCES
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. Expand
The State of Sparsity in Deep Neural Networks
TLDR
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted. Expand
Towards Compact and Robust Deep Neural Networks
TLDR
This work proposes a new pruning method that can create compact networks while preserving both benign accuracy and robustness of a network and ensures that the training objectives of the pre-training and fine-tuning steps match the training objective of the desired robust model. Expand
ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases
TLDR
It is experimentally demonstrated that the accuracy and robustness of ConvNets measured on Imagenet are vastly underestimated and that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user. Expand
Estimating Example Difficulty using Variance of Gradients
TLDR
This work proposes Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution and provides quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Expand
Improving the speed of neural networks on CPUs
TLDR
This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy. Expand
What is the State of Neural Network Pruning?
TLDR
Issues with current practices in pruning are identified, concrete remedies are suggested, and ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods are introduced, to be used to compare various pruning techniques. Expand
Sparse DNNs with Improved Adversarial Robustness
TLDR
It is demonstrated that an appropriately higher model sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples. Expand
Exploring Sparsity in Recurrent Neural Networks
TLDR
This work proposes a technique to reduce the parameters of a network by pruning weights during the initial training of the network, which reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
...
1
2
3
4
5
...