• Corpus ID: 246294533

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

  title={Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks},
  author={Noam Razin and Asaf Maman and Nadav Cohen},
  booktitle={International Conference on Machine Learning},
In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural… 

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

This paper quantifies the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between sides of a given partition of input vertices, and designs an edge sparsi-cation algorithm named Walk Index Sparsification ( WIS), which preserves the ability to model interactions when input edges are removed.

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network

This paper analyzes the extrapolation properties of Gradient Descent when applied to overparameterized linear RNNs, and provides theoretical evidence for learning low dimensional state spaces, which can also model long-term memory.

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

This work improves the convergence of GD on symmetric matrix factorization and provides a completely new convergence result for the orthogonal symmetric tensor decomposition and illustrates the promise of the proposed framework on realistic deep neural networks (DNNs) across different architectures, gradient-based solvers, and datasets.

On the Implicit Bias in Deep-Learning Algorithms

The notion of implicit bias is explained, the main results are reviewed, and main results and implications are discussed.

Implicit Regularization with Polynomial Growth in Deep Tensor Factorization

It is shown that implicit regularization in deep tensor factorization grows polynomially with the depth of the network, providing a remarkably faithful description of the observed experimental behaviour.

Permutation Search of Tensor Network Structures via Local Sampling

Theoretically, the counting and metric properties of search spaces of TN-PS are proved and a novel meta-heuristic algorithm is proposed, in which the searching is done by randomly sampling in a neighborhood established in the authors' theory, and then recurrently updating the neighborhood until convergence.

More is Less: Inducing Sparsity via Overparameterization

It is shown that, if there exists an exact solution, vanilla gradient for the overparameterized loss functional converges to a good approximation of the solution of minimal (cid:96) 1 -norm, the latter of which is well-known to promote sparse solutions.



Learning long-range spatial dependencies with horizontal gated-recurrent units

This work introduces the horizontal gated-recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns, and demonstrates that a single hGRU layer matches or outperforms all tested feedforward hierarchical baselines including state-of-the-art architectures which have orders of magnitude more free parameters.

Disentangling neural mechanisms for perceptual grouping

This work systematically evaluates neural network architectures featuring combinations of bottom-up, horizontal and top-down connectivity on two synthetic visual tasks, which stress low-level `gestalt' vs. high-level object cues for perceptual grouping and demonstrates how a model featuring all of these interactions can more flexibly learn to form perceptual groups.

Automatic differentiation in PyTorch

An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

Implicit Regularization in Tensor Factorization

Motivated by tensor rank capturing the implicit regularization of a non-linear neural network, this work empirically explores it as a measure of complexity, and finds that it captures the essence of datasets on which neural networks generalize, leading to a belief that tensorRank may pave way to explaining both implicitRegularization in deep learning and the properties of real-world data translating this implicitregularization to generalization.

Implicit Regularization in Deep Matrix Factorization

This work studies the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization, and finds that adding depth to a matrix factorizations enhances an implicit tendency towards low-rank solutions.

Inductive Bias of Deep Convolutional Networks through Pooling Geometry

Deep networks show that shallow ones support only linear separation ranks, and by this gain insight into the benefit of functions brought forth by depth - they are able to efficiently model strong correlation under favored partitions of the input.

What Happens after SGD Reaches Zero Loss? -A Mathematical Framework

A general framework for analysis of the implicit bias of Stochastic Gradient Descent is given using a stochastic differential equation (SDE) describing the limiting dynamics of the parameters, which is determined jointly by the loss function and the noise covariance.

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

This paper establishes global optimality of margin for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width, and gives some theoretical justification for recent empirical findings.