Explainable natural language processing with matrix product states
@article{Tangpanitanon2021ExplainableNL, title={Explainable natural language processing with matrix product states}, author={J. Tangpanitanon and Chanatip Mangkang and Pradeep Bhadola and Yuichiro Minato and Dimitris G Angelakis and Thiparat Chotibut}, journal={New Journal of Physics}, year={2021}, volume={24} }
Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding of RNNs is still limited due to intrinsically complex non-linear computations. We systematically analyze RNNs’ behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews, via the mapping between a class of RNNs called recurrent arithmetic circuits (RACs) and a matrix product state. Using the von-Neumann entanglement entropy (EE) as a proxy for…
One Citation
Group-invariant tensor train networks for supervised learning
- Computer ScienceArXiv
- 2022
A new numerical algorithm is introduced to construct a basis of tensors that are invariant under the action of normal matrix representations of an arbitrary discrete group, which can be up to several orders of magnitude faster than previous approaches.
References
SHOWING 1-10 OF 50 REFERENCES
Generating Text with Recurrent Neural Networks
- Computer ScienceICML
- 2011
The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
- Computer ScienceNeurIPS
- 2019
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Benefits of Depth for Long-Term Memory of Recurrent Networks
- Computer ScienceICLR
- 2018
It is established that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and is proved that deep recurrent networks support Start-End separation ranks which are exponentially higher than those supported by their shallow counterparts.
Tensor Networks for Probabilistic Sequence Modeling
- Computer ScienceAISTATS
- 2021
A novel generative algorithm is introduced giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions, each one defined by a regular expression, which permits the generation of richly structured text in a manner that has no direct analogue in current generative models.
Critical Behavior in Physics and Probabilistic Formal Languages
- Computer ScienceEntropy
- 2017
It is proved that Markov/hidden Markov processes generically exhibit exponential decay in their mutual information, which explains why natural languages are poorly approximated by Markov process, and presents a broad class of models that naturally reproduce this critical behavior.
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function
- Computer ScienceAAAI
- 2019
This paper develops a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches, and shows the generality of the mixed objective function by improving the performance on relation extraction task.
Deep Contextualized Word Representations
- Computer ScienceNAACL
- 2018
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
On Multiplicative Integration with Recurrent Neural Networks
- Computer ScienceNIPS
- 2016
This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.
Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning
- Computer ScienceNeurIPS
- 2019
This work provides a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions, and introduces locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems with provably better expressive power than all other representations considered.
From Probabilistic Graphical Models to Generalized Tensor Networks for Supervised Learning
- Computer ScienceIEEE Access
- 2020
This work explores the connection between tensor networks and probabilistic graphical models, and shows that it motivates the definition of generalized Tensor networks where information from a tensor can be copied and reused in other parts of the network.