# Characterising Bias in Compressed Models

@article{Hooker2020CharacterisingBI, title={Characterising Bias in Compressed Models}, author={Sara Hooker and Nyalleng Moorosi and Gregory Clark and Samy Bengio and Emily L. Denton}, journal={ArXiv}, year={2020}, volume={abs/2010.03058} }

The popularity and widespread use of pruning and quantization is driven by the severe resource constraints of deploying deep neural networks to environments with strict latency, memory and energy requirements. These techniques achieve high levels of compression with negligible impact on top-line metrics (top-1 and top-5 accuracy). However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE). We further… Expand

#### 28 Citations

Reliable Model Compression via Label-Preservation-Aware Loss Functions

- Computer Science
- ArXiv
- 2020

This work presents a framework that uses a teacher-student learning paradigm to better preserve labels and demonstrates the effectiveness of the approach both quantitatively and qualitatively on multiple compression schemes and accuracy recovery algorithms using a set of 8 different real-world network architectures. Expand

Estimating Example Difficulty using Variance of Gradients

- Computer Science
- ArXiv
- 2020

This work proposes Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution and provides quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Expand

Generative Zero-shot Network Quantization

- Computer Science
- 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2021

This work shows that, for high-level image recognition tasks, it can further reconstruct “realistic” images of each category by leveraging intrinsic Batch Normalization statistics without any training data. Expand

Algorithmic Factors Influencing Bias in Machine Learning

- Computer Science, Mathematics
- ArXiv
- 2021

This paper demonstrates how ML algorithms can misrepresent the training data through underestimation, and shows how irreducible error, regularization and feature and class imbalance can contribute to this underestimation. Expand

In Defense of the Paper

- Computer Science
- ArXiv
- 2021

It is argued that the root cause of hindrances in the accessibility of machine learning research lies not in the paper workflow but within the misaligned incentives behind the publishing and research processes and that the paper is the optimal workflow. Expand

Arabic Compact Language Modelling for Resource Limited Devices

- WANLP
- 2021

Natural language modelling has gained a lot of interest recently. The current state-of-the-art results are achieved by first training a very large language model and then fine-tuning it on multiple… Expand

A Tale Of Two Long Tails

- Computer Science
- ArXiv
- 2021

The results show that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty, suggesting that well the rate of learning in the presence of additional information differs between atypical and noisy examples. Expand

Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning

- Computer Science
- ArXiv
- 2021

This work demonstrates that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics based on lexical overlap, and shows that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning. Expand

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

- Computer Science
- ArXiv
- 2021

Two new metrics, label loyalty and probability loyalty, are proposed that measure how closely a compressed model mimics the original model and the effect of compression with regard to robustness under adversarial attacks is explored. Expand

Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

- Computer Science, Mathematics
- ICML
- 2021

A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks. Expand

#### References

SHOWING 1-10 OF 45 REFERENCES

To prune, or not to prune: exploring the efficacy of pruning for model compression

- Computer Science, Mathematics
- ICLR
- 2018

Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. Expand

The State of Sparsity in Deep Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2019

It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted. Expand

Towards Compact and Robust Deep Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2019

This work proposes a new pruning method that can create compact networks while preserving both benign accuracy and robustness of a network and ensures that the training objectives of the pre-training and fine-tuning steps match the training objective of the desired robust model. Expand

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

- Computer Science
- ECCV
- 2018

It is experimentally demonstrated that the accuracy and robustness of ConvNets measured on Imagenet are vastly underestimated and that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user. Expand

Estimating Example Difficulty using Variance of Gradients

- Computer Science
- ArXiv
- 2020

This work proposes Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution and provides quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Expand

Improving the speed of neural networks on CPUs

- Computer Science
- 2011

This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy. Expand

What is the State of Neural Network Pruning?

- Computer Science, Mathematics
- MLSys
- 2020

Issues with current practices in pruning are identified, concrete remedies are suggested, and ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods are introduced, to be used to compare various pruning techniques. Expand

Sparse DNNs with Improved Adversarial Robustness

- Computer Science, Mathematics
- NeurIPS
- 2018

It is demonstrated that an appropriately higher model sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples. Expand

Exploring Sparsity in Recurrent Neural Networks

- Computer Science, Mathematics
- ICLR
- 2017

This work proposes a technique to reduce the parameters of a network by pruning weights during the initial training of the network, which reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Expand

Deep Residual Learning for Image Recognition

- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand