Generalization and Overfitting in Matrix Product State Machine Learning Architectures

@article{Strashko2022GeneralizationAO,
  title={Generalization and Overfitting in Matrix Product State Machine Learning Architectures},
  author={Artem Strashko and Edwin Miles Stoudenmire},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.04372}
}
While overfitting and, more generally, double descent are ubiquitous in machine learning, increasing the number of parameters of the most widely used tensor network, the matrix product state (MPS), has generally lead to monotonic improvement of test performance in previous studies. To better understand the generalization properties of architectures parameterized by MPS, we construct artificial data which can be exactly modeled by an MPS and train the models with different number of parameters. We… 

Figures from this paper

Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization

This work encodes arbitrary integer-valued equality constraints of the form A(cid:126)x = (cid):126, directly into U (1) symmetric tensor networks (TNs) and leverages their applicability as quantum-inspired generative models to assist in the search of solutions to combinatorial optimization problems.

References

SHOWING 1-10 OF 41 REFERENCES

The Presence and Absence of Barren Plateaus in Tensor-network Based Machine Learning

This work rigorously proves that barren plateaus prevail in the training process of the machine learning algorithms with global loss functions, and reveals a crucial aspect of tensor-network based machine learning in a rigorous fashion.

From Probabilistic Graphical Models to Generalized Tensor Networks for Supervised Learning

This work explores the connection between tensor networks and probabilistic graphical models, and shows that it motivates the definition of generalized Tensor networks where information from a tensor can be copied and reused in other parts of the network.

Entanglement and Tensor Networks for Supervised Image Classification

The use of tensor networks for supervised image classification using the MNIST data set of handwritten digits, as pioneered by Stoudenmire and Schwab, is revisited and entanglement properties are investigated.

Exponential Machines

This paper introduces Exponential Machines (ExM), a predictor that models all interactions of every order in a factorized format called Tensor Train (TT), and shows that the model achieves state-of-the-art performance on synthetic data with high-order interactions and works on par on a recommender system dataset MovieLens 100K.

Tensor networks and efficient descriptions of classical data

It is found that for text, the mutual information scales as a power law L with a close to volume law exponent, indicating that text cannot be efficiently described by 1D tensor networks.

Deep double descent: where bigger models and more data hurt

The notion of model complexity allows us to identify certain regimes where increasing the number of train samples actually hurts test performance, and defines a new complexity measure called the effective model complexity and conjecture a generalized double descent with respect to this measure.

Generative modeling via tensor train sketching

A sketching algorithm for constructing a tensor train representation of a probability density from its samples is introduced and it is proved that the tensor cores can be recovered with a sample complexity that is constant with respect to the dimension.

GEO: Enhancing Combinatorial Optimization with Classical and Quantum Generative Models

It is shown that TN-GEO can propose unseen candidates with lower cost function values than the candidates seen by classical solvers, the first demonstration of the generalization capabilities of quantum-inspired generative models that provide real value in the context of an industrial application.

Learning Feynman Diagrams with Tensor Trains

We use tensor network techniques to obtain high order perturbative diagrammatic expansions for the quantum many-body problem at very high precision. The approach is based on a tensor train

The ITensor Software Library for Tensor Network Calculations

The philosophy behind ITensor, a system for programming tensor network calculations with an interface modeled on tensor diagrams, and examples of each part of the interface including Index objects, the ITensor product operator, tensor factorizations, and tensor storage types are discussed.