• Corpus ID: 225103395

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

@article{Nguyen2021DoWA,
  title={Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth},
  author={Thao Nguyen and Maithra Raghu and Simon Kornblith},
  journal={ArXiv},
  year={2021},
  volume={abs/2010.15327}
}
A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model… 
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
TLDR
Zen-NAS is able to design high performance architectures in less than half GPU day (12 GPU hours) and achieves up to 83.0% top-1 accuracy on ImageNet.
Universal Representation Learning from Multiple Domains for Few-shot Classification
TLDR
URL is presented, which learns a single set of universal visual representations by distilling knowledge of multiple domain-specific networks after co-aligning their features with the help of adapters and centered kernel alignment, and shows that the universal representations can be further refined for previously unseen domains by an efficient adaptation step.
On Efficient Transformer and Image Pre-training for Low-level Vision
  • Wenbo Li, Xin Lu, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia
  • Computer Science
    ArXiv
  • 2021
TLDR
This paper proposes a generic, cost-effective Transformer-based framework for image processing, and designs a whole set of principled evaluation tools to seriously and comprehensively diagnose image pre-training in different tasks, and uncovers its effects on internal network representations.
Why Do Better Loss Functions Lead to Less Transferable Features?
TLDR
It is shown that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks.
Graph Modularity: Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks
TLDR
It is demonstrated that modularity can be used to identify and locate redundant layers in DNNs, which provides theoretical guidance for layer pruning and is proposed as a layer-wise pruning method based on modularity.
Subspace Clustering Based Analysis of Neural Networks
TLDR
This work motivates sparse subspace clustering (SSC) with an aim to learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs and uses tools from Community Detection to quantify structures present in the input.
Representation Topology Divergence: A Method for Comparing Neural Network Representations
TLDR
The Representation Topology Divergence (RTD) is introduced, measuring the dissimilarity in multi-scale topology between two point clouds of equal size with a one-to-one correspondence between points.
Analyze and Design Network Architectures by Recursion Formulas
TLDR
This work attempts to find an effective way to design new network architectures from the perspective of mathematical formulas, and it is discovered that the main difference between network architectures can be reflected in their recursion formulas.
Can contrastive learning avoid shortcut solutions?
The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always
Comparative Analysis of Activation Functions Used in the Hidden Layers of Deep Neural Networks
  • Martin Kaloev, Georgi Krastev
  • 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)
  • 2021
The development in the field of neural networks opens up opportunities for the use of many activation functions, each of which has its own specific features. This raises questions about how
...
1
2
3
4
...

References

SHOWING 1-10 OF 47 REFERENCES
Wide Residual Networks
TLDR
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
The Expressive Power of Neural Networks: A View from the Width
TLDR
It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy.
Insights on representational similarity in neural networks with canonical correlation
Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks.
FitNets: Hints for Thin Deep Nets
TLDR
This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.
Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex
TLDR
Comparing the representations measured in the Allen Brain Observatory in response to natural image presentations shows that the visual cortical areas are relatively high order representations (in that they map to deeper layers of convolutional neural networks), and sees evidence of a broad, more parallel organization rather than a sequential hierarchy.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
The effect of task and training on intermediate representations in convolutional neural networks revealed with modified RV similarity analysis
TLDR
This work experiments with the modified RV-coefficient (RV2), which has very similar properties as CKA while being less sensitive to dataset size, and proposes that the superior performance achieved by freeze training can be attributed to representational differences in the penultimate layer.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Gaussian Process Behaviour in Wide Deep Neural Networks
TLDR
It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.
...
1
2
3
4
5
...