Explainable natural language processing with matrix product states

  title={Explainable natural language processing with matrix product states},
  author={J. Tangpanitanon and Chanatip Mangkang and Pradeep Bhadola and Yuichiro Minato and Dimitris G Angelakis and Thiparat Chotibut},
  journal={New Journal of Physics},
Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding of RNNs is still limited due to intrinsically complex non-linear computations. We systematically analyze RNNs’ behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews, via the mapping between a class of RNNs called recurrent arithmetic circuits (RACs) and a matrix product state. Using the von-Neumann entanglement entropy (EE) as a proxy for… 
1 Citations

Group-invariant tensor train networks for supervised learning

A new numerical algorithm is introduced to construct a basis of tensors that are invariant under the action of normal matrix representations of an arbitrary discrete group, which can be up to several orders of magnitude faster than previous approaches.



Generating Text with Recurrent Neural Networks

The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

Benefits of Depth for Long-Term Memory of Recurrent Networks

It is established that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and is proved that deep recurrent networks support Start-End separation ranks which are exponentially higher than those supported by their shallow counterparts.

Tensor Networks for Probabilistic Sequence Modeling

A novel generative algorithm is introduced giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions, each one defined by a regular expression, which permits the generation of richly structured text in a manner that has no direct analogue in current generative models.

Critical Behavior in Physics and Probabilistic Formal Languages

It is proved that Markov/hidden Markov processes generically exhibit exponential decay in their mutual information, which explains why natural languages are poorly approximated by Markov process, and presents a broad class of models that naturally reproduce this critical behavior.

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

This paper develops a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches, and shows the generality of the mixed objective function by improving the performance on relation extraction task.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

On Multiplicative Integration with Recurrent Neural Networks

This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.

Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning

This work provides a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions, and introduces locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems with provably better expressive power than all other representations considered.

From Probabilistic Graphical Models to Generalized Tensor Networks for Supervised Learning

This work explores the connection between tensor networks and probabilistic graphical models, and shows that it motivates the definition of generalized Tensor networks where information from a tensor can be copied and reused in other parts of the network.