# What Do Compressed Deep Neural Networks Forget

@article{Hooker2020WhatDC, title={What Do Compressed Deep Neural Networks Forget}, author={Sara Hooker and Aaron C. Courville and Gregory Clark and Yann Dauphin and Andrea Frome}, journal={arXiv: Learning}, year={2020} }

Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weights have comparable top-line performance metrics but diverge considerably in behavior on a narrow…

## Figures and Tables from this paper

## 34 Citations

A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness

- Computer ScienceArXiv
- 2021

This work is able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness—simply by pruning and (optionally) quantizing.

Is the Lottery Fair? Evaluating Winning Tickets Across Demographics

- Computer ScienceFINDINGS
- 2021

There is a small increase in group disparity, which is most pronounced at high pruning rates and correlates with instability, which suggests the fairness of models trained with distributionally robust optimization objectives is sometimes less sensitive to pruning, but results are not consistent.

Self-Damaging Contrastive Learning

- Computer ScienceICML
- 2021

This paper proposes to explicitly tackle this challenge, via a principled framework called Self-Damaging Contrastive Learning (SDCLR), to automatically balance the representation learning without knowing the classes, to create a dynamic self-competitor model to contrast with the target model, which is a pruned version of the latter.

Simon Says: Evaluating and Mitigating Bias in Pruned Neural Networks with Knowledge Distillation

- Computer ScienceArXiv
- 2021

It is demonstrated that knowledge distillation can mitigate induced bias in pruned neural networks, even with unbalanced datasets, and it is revealed that model similarity has strong correlations with pruning induced bias, which provides a powerful method to explain why bias occurs in pruning neural networks.

Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural Networks

- Computer ScienceArXiv
- 2021

This work proposes two simple yet effective metrics, Combined Error Variance and Symmetric Distance Error, to quantitatively evaluate the class-wise bias of two models in comparison to one another and shows that they can be used to measure fairness as well as bias.

Characterising Bias in Compressed Models

- Computer ScienceArXiv
- 2020

This work proposes its use as a human-in-the-loop auditing tool to surface a tractable subset of the dataset for further inspection or annotation by a domain expert and establishes that for CIE examples, compression amplifies existing algorithmic bias.

A Tale Of Two Long Tails

- Computer ScienceArXiv
- 2021

The results show that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty, suggesting that well the rate of learning in the presence of additional information differs between atypical and noisy examples.

Algorithmic Factors Influencing Bias in Machine Learning

- Computer Science, MathematicsArXiv
- 2021

This paper demonstrates how ML algorithms can misrepresent the training data through underestimation, and shows how irreducible error, regularization and feature and class imbalance can contribute to this underestimation.

An Underexplored Dilemma between Confidence and Calibration in Quantized Neural Networks

- Computer ScienceArXiv
- 2021

It is shown that CNNs are surprisingly robust to compression techniques, such as quantization, which aim to reduce computational and memory costs, and can be partially explained by the calibration behavior of modern CNNs, and may be improved with overconfidence.

Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

- Computer Science, MathematicsICML
- 2021

A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.

## References

SHOWING 1-10 OF 95 REFERENCES

Natural Adversarial Examples

- Computer Science, Mathematics2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021

This work introduces two challenging datasets that reliably cause machine learning model performance to substantially degrade and curates an adversarial out-of-distribution detection dataset called IMAGENET-O, which is the first out- of-dist distribution detection dataset created for ImageNet models.

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

- Computer Science, MathematicsICLR
- 2019

This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.

Mixed Precision Training

- Computer Science, MathematicsICLR
- 2018

This work introduces a technique to train deep neural networks using half precision floating point numbers, and demonstrates that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks.

On the Efficient Representation and Execution of Deep Acoustic Models

- Computer ScienceINTERSPEECH
- 2016

A "quantization aware" training process is proposed that applies the proposed scheme during network training and finds that it allows us to recover most of the loss in accuracy introduced by quantization.

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

- Computer ScienceArXiv
- 2021

This work suggests that initialization is only one piece of the puzzle and taking a wider view of tailoring optimization to sparse networks yields promising results, and shows that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.

Does learning require memorization? a short tale about a long tail

- Computer Science, MathematicsSTOC
- 2020

The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed, and establishes a formal link between these empirical phenomena.

MLIR: A Compiler Infrastructure for the End of Moore's Law

- Computer ScienceArXiv
- 2020

Evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture.

Rigging the Lottery: Making All Tickets Winners

- Computer Science, MathematicsICML
- 2020

This paper introduces a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods.

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

- Computer ScienceICML
- 2020

It is shown that large models are more robust to compression techniques such as quantization and pruning than small models, and one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

- Computer ScienceNeurIPS
- 2020

The experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks and provide quantitative and visually compelling evidence for the theory put forth in Feldman (2019), which proposes a theoretical explanation for this phenomenon.