# A Statistical Investigation of Long Memory in Language and Music

@inproceedings{GreavesTunnell2019ASI, title={A Statistical Investigation of Long Memory in Language and Music}, author={Alexander Greaves-Tunnell and Za{\"i}d Harchaoui}, booktitle={ICML}, year={2019} }

Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing…

## Figures and Tables from this paper

## 11 Citations

### Do RNN and LSTM have Long Memory?

- Computer ScienceICML
- 2020

It is proved that RNN and LSTM do not have long memory from a statistical perspective, and a new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate.

### On the Memory Mechanism of Tensor-Power Recurrent Models

- Computer ScienceAISTATS
- 2021

This work proves that a large degree p is an essential condition to achieve the long memory effect, yet it would lead to unstable dynamical behaviors, and extends the degree p from discrete to a differentiable domain, such that it is efficiently learnable from a variety of datasets.

### Probabilistic Transformer For Time Series Analysis

- Computer ScienceNeurIPS
- 2021

Deep probabilistic methods that combine state-space models (SSMs) with transformer architectures are proposed that use attention mechanism to model non-Markovian dynamics in the latent space and avoid recurrent neural networks entirely.

### Stanza: A Nonlinear State Space Model for Probabilistic Inference in Non-Stationary Time Series

- Computer ScienceArXiv
- 2020

Stanza strikes a balance between competitive forecasting accuracy and probabilistic, interpretable inference for highly structured time series, achieving forecasting accuracy competitive with deep LSTMs on real-world datasets, especially for multi-step ahead forecasting.

### Learning Long-Term Dependencies in Irregularly-Sampled Time Series

- Computer ScienceNeurIPS
- 2020

This work designs a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path.

### Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism

- Computer ScienceCIKM
- 2021

A theoretical analysis of LSTM integrated with attention mechanism shows that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score, and shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard L STM.

### ARISE: ApeRIodic SEmi-parametric Process for Efficient Markets without Periodogram and Gaussianity Assumptions

- Computer ScienceArXiv
- 2021

The ApeRIodic SEmi-parametric (ARISE) process is formulated as an infinite-sum function of some known processes and employs the aperiodic spectrum estimation to determine the key hyper-parameters, thus possessing the power and potential of modeling the price data with long-term memory, non-stationarity, and a periodic spectrum.

### LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

- Computer ScienceNeural Networks
- 2022

### VRT: A Video Restoration Transformer

- Computer ScienceArXiv
- 2022

Experimental results on video super-resolution, video deblurring, video denoising, video frame interpolation and space-time videosuper-resolution demonstrate that VRT outperforms the state-of-the-art methods by large margins.

### Comparison of sequence classification techniques with BERT for named entity recognition

- Computer Science
- 2019

This thesis takes its starting point from the recent advances in Natural Language Processing being developed upon the Transformer model. One of the significant developments recently was the release…

## References

SHOWING 1-10 OF 49 REFERENCES

### Long Short-Term Memory

- Computer ScienceNeural Computation
- 1997

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

### Learning Longer Memory in Recurrent Neural Networks

- Computer ScienceICLR
- 2015

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

### Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

- Computer ScienceACL
- 2018

This work decoupling the LSTM’s gates from the embedded simple RNN, producing a new class of RNNs where the recurrence computes an element-wise weighted sum of context-independent functions of the input.

### Learning long-term dependencies with gradient descent is difficult

- Computer ScienceIEEE Trans. Neural Networks
- 1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

### Credit Assignment through Time: Alternatives to Backpropagation

- Computer ScienceNIPS
- 1993

This work considers and compares alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled and shows performance qualitatively superior to that obtained with backpropagation.

### Invariances and Data Augmentation for Supervised Music Transcription

- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018

The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation.

### Learning Features of Music from Scratch

- Computer ScienceICLR
- 2017

A multi-label classification task to predict notes in musical recordings is defined, along with an evaluation protocol, and several machine learning architectures for this task are benchmarked.

### Learning long-term dependencies in NARX recurrent neural networks

- Computer ScienceIEEE Trans. Neural Networks
- 1996

It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.

### GloVe: Global Vectors for Word Representation

- Computer ScienceEMNLP
- 2014

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

### Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

- Computer ScienceEMNLP
- 2014

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.