• Corpus ID: 243985799

Discovering and Explaining the Representation Bottleneck of DNNs

  title={Discovering and Explaining the Representation Bottleneck of DNNs},
  author={Huiqi Deng and Qihan Ren and Xu Chen and Hao Zhang and Jie Ren and Quanshi Zhang},
This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple and too complex interactions, but usually fails to learn interactions of intermediate complexity. Such a… 

Discovering the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions

A novel graph rewiring approach based on the pairwise interaction strengths to dynamically adjust the reception of each node is proposed, proving the superiority of this method over state-of-the-art GNN baselines.

Efficient Multi-order Gated Aggregation Network

This work empirically show that interaction complexity is an overlooked but essential indi-cator for visual recognition, and presents a new family of ConvNets, named MogaNet, to pursue informative context mining in pure ConvNet-based models, with preferable complexity-performance trade-offs.

Defects of Convolutional Decoder Networks in Frequency Representation

The discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network is conducted, and the rule of the forward propagation of such intermediate-layer spectrum maps is introduced, which is equivalent to theforward propagation of feature maps through a convolutional layer.

Architecture-Agnostic Masked Image Modeling - From ViT back to CNN

It is observed that MIM essentially teaches the model to learn better middle-level interactions among patches and extract more generalized features, and an Architecture-Agnostic Masked Image Modeling framework is proposed, which is compatible with not only Transformers but also CNNs in a unified way.

Game-Theoretic Understanding of Misclassification

This study demonstrates that the recent game-theoretic analysis of deep learning models can be broadened to analyze various malfunctions of deepLearning models including Vision Transformers by using the distribution, order, and sign of interactions.

Batch Normalization Is Blind to the First and Second Derivatives of the Loss

In this paper, we prove the effects of the BN operation on the back-propagation of the first and second derivatives of the loss. When we do the Taylor series expansion of the loss function, we prove

Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

It is shown that the inference logic of a deep can be faithfully as a symbolic a causal graph, and the faithfulness of the causal graph is theoretically guaranteed, because it can well mimic the model’s output on an exponential number of different masked samples.

Explanation-based Counterfactual Retraining(XCR): A Calibration Method for Black-box Models

This work proposes eXplanation-based Counterfactual Retraining (XCR), which applies the explanations generated by the XAI model as counterfactual input to retrain the black-box model to address OOD and social misalignment problems and beats current Ood calibration methods on the OOD calibration metric if calibration on the validation set is applied.



Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle

This theory paper investigates training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional and concludes that recent successes reported about training DNNs using the IB framework must be attributed to such solutions.

Opening the Black Box of Deep Neural Networks via Information

This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.

Building Interpretable Interaction Trees for Deep NLP Models

This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing. We construct a tree to encode salient interactions

A Closer Look at Memorization in Deep Networks

The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

Information Dropout: Learning Optimal Representations Through Noisy Computation

It is proved that Information Dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

Learning deep representations by mutual information estimation and maximization

It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

Understanding training and generalization in deep learning by Fourier analysis

This work studies DNN training by Fourier analysis to explain why Deep Neural Networks often achieve remarkably low generalization error and suggests small initialization leads to good generalization ability of DNN while preserving the DNN's ability to fit any function.

Explaining Explanations: Axiomatic Feature Interactions for Deep Networks

This work presents Integrated Hessians, an extension of Integrated Gradients that explains pairwise feature interactions in neural networks and finds that the method is faster than existing methods when the number of features is large, and outperforms previous methods on existing quantitative benchmarks.

On the Number of Linear Regions of Deep Neural Networks

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

It is shown that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models.