• Corpus ID: 102350870

A Statistical Investigation of Long Memory in Language and Music

  title={A Statistical Investigation of Long Memory in Language and Music},
  author={Alexander Greaves-Tunnell and Za{\"i}d Harchaoui},
Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing… 

Do RNN and LSTM have Long Memory?

It is proved that RNN and LSTM do not have long memory from a statistical perspective, and a new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate.

On the Memory Mechanism of Tensor-Power Recurrent Models

This work proves that a large degree p is an essential condition to achieve the long memory effect, yet it would lead to unstable dynamical behaviors, and extends the degree p from discrete to a differentiable domain, such that it is efficiently learnable from a variety of datasets.

Probabilistic Transformer For Time Series Analysis

Deep probabilistic methods that combine state-space models (SSMs) with transformer architectures are proposed that use attention mechanism to model non-Markovian dynamics in the latent space and avoid recurrent neural networks entirely.

Stanza: A Nonlinear State Space Model for Probabilistic Inference in Non-Stationary Time Series

Stanza strikes a balance between competitive forecasting accuracy and probabilistic, interpretable inference for highly structured time series, achieving forecasting accuracy competitive with deep LSTMs on real-world datasets, especially for multi-step ahead forecasting.

Learning Long-Term Dependencies in Irregularly-Sampled Time Series

This work designs a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path.

Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism

A theoretical analysis of LSTM integrated with attention mechanism shows that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score, and shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard L STM.

ARISE: ApeRIodic SEmi-parametric Process for Efficient Markets without Periodogram and Gaussianity Assumptions

The ApeRIodic SEmi-parametric (ARISE) process is formulated as an infinite-sum function of some known processes and employs the aperiodic spectrum estimation to determine the key hyper-parameters, thus possessing the power and potential of modeling the price data with long-term memory, non-stationarity, and a periodic spectrum.

VRT: A Video Restoration Transformer

Experimental results on video super-resolution, video deblurring, video denoising, video frame interpolation and space-time videosuper-resolution demonstrate that VRT outperforms the state-of-the-art methods by large margins.

Comparison of sequence classification techniques with BERT for named entity recognition

This thesis takes its starting point from the recent advances in Natural Language Processing being developed upon the Transformer model. One of the significant developments recently was the release



Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Learning Longer Memory in Recurrent Neural Networks

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

This work decoupling the LSTM’s gates from the embedded simple RNN, producing a new class of RNNs where the recurrence computes an element-wise weighted sum of context-independent functions of the input.

Learning long-term dependencies with gradient descent is difficult

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

Credit Assignment through Time: Alternatives to Backpropagation

This work considers and compares alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled and shows performance qualitatively superior to that obtained with backpropagation.

Invariances and Data Augmentation for Supervised Music Transcription

The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation.

Learning Features of Music from Scratch

A multi-label classification task to predict notes in musical recordings is defined, along with an evaluation protocol, and several machine learning architectures for this task are benchmarked.

Learning long-term dependencies in NARX recurrent neural networks

It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.