• Corpus ID: 226812844

What Do Compressed Deep Neural Networks Forget

@article{Hooker2020WhatDC,
  title={What Do Compressed Deep Neural Networks Forget},
  author={Sara Hooker and Aaron C. Courville and Gregory Clark and Yann Dauphin and Andrea Frome},
  journal={arXiv: Learning},
  year={2020}
}
Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weights have comparable top-line performance metrics but diverge considerably in behavior on a narrow… 
A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness
TLDR
This work is able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness—simply by pruning and (optionally) quantizing.
Is the Lottery Fair? Evaluating Winning Tickets Across Demographics
TLDR
There is a small increase in group disparity, which is most pronounced at high pruning rates and correlates with instability, which suggests the fairness of models trained with distributionally robust optimization objectives is sometimes less sensitive to pruning, but results are not consistent.
Self-Damaging Contrastive Learning
TLDR
This paper proposes to explicitly tackle this challenge, via a principled framework called Self-Damaging Contrastive Learning (SDCLR), to automatically balance the representation learning without knowing the classes, to create a dynamic self-competitor model to contrast with the target model, which is a pruned version of the latter.
Simon Says: Evaluating and Mitigating Bias in Pruned Neural Networks with Knowledge Distillation
TLDR
It is demonstrated that knowledge distillation can mitigate induced bias in pruned neural networks, even with unbalanced datasets, and it is revealed that model similarity has strong correlations with pruning induced bias, which provides a powerful method to explain why bias occurs in pruning neural networks.
Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural Networks
TLDR
This work proposes two simple yet effective metrics, Combined Error Variance and Symmetric Distance Error, to quantitatively evaluate the class-wise bias of two models in comparison to one another and shows that they can be used to measure fairness as well as bias.
Characterising Bias in Compressed Models
TLDR
This work proposes its use as a human-in-the-loop auditing tool to surface a tractable subset of the dataset for further inspection or annotation by a domain expert and establishes that for CIE examples, compression amplifies existing algorithmic bias.
A Tale Of Two Long Tails
TLDR
The results show that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty, suggesting that well the rate of learning in the presence of additional information differs between atypical and noisy examples.
Algorithmic Factors Influencing Bias in Machine Learning
TLDR
This paper demonstrates how ML algorithms can misrepresent the training data through underestimation, and shows how irreducible error, regularization and feature and class imbalance can contribute to this underestimation.
An Underexplored Dilemma between Confidence and Calibration in Quantized Neural Networks
TLDR
It is shown that CNNs are surprisingly robust to compression techniques, such as quantization, which aim to reduce computational and memory costs, and can be partially explained by the calibration behavior of modern CNNs, and may be improved with overconfidence.
Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?
TLDR
A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.
...
1
2
3
4
...

References

SHOWING 1-10 OF 95 REFERENCES
Natural Adversarial Examples
TLDR
This work introduces two challenging datasets that reliably cause machine learning model performance to substantially degrade and curates an adversarial out-of-distribution detection dataset called IMAGENET-O, which is the first out- of-dist distribution detection dataset created for ImageNet models.
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
TLDR
This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.
Mixed Precision Training
TLDR
This work introduces a technique to train deep neural networks using half precision floating point numbers, and demonstrates that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks.
On the Efficient Representation and Execution of Deep Acoustic Models
TLDR
A "quantization aware" training process is proposed that applies the proposed scheme during network training and finds that it allows us to recover most of the loss in accuracy introduced by quantization.
Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization
TLDR
This work suggests that initialization is only one piece of the puzzle and taking a wider view of tailoring optimization to sparse networks yields promising results, and shows that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
Does learning require memorization? a short tale about a long tail
TLDR
The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed, and establishes a formal link between these empirical phenomena.
MLIR: A Compiler Infrastructure for the End of Moore's Law
TLDR
Evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture.
Rigging the Lottery: Making All Tickets Winners
TLDR
This paper introduces a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods.
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
TLDR
It is shown that large models are more robust to compression techniques such as quantization and pruning than small models, and one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
TLDR
The experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks and provide quantitative and visually compelling evidence for the theory put forth in Feldman (2019), which proposes a theoretical explanation for this phenomenon.
...
1
2
3
4
5
...