• Corpus ID: 245218798

Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture

  title={Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture},
  author={Kieran Wood and Sven Giegerich and Stephen J. Roberts and Stefan Zohren},
Deep learning architectures, specifically Deep Momentum Networks (DMNs) [1904.04912], have been found to be an effective approach to momentum and mean-reversion trading. However, some of the key challenges in recent years involve learning long-term dependencies, degradation of performance when considering returns net of transaction costs and adapting to new market regimes, notably during the SARS-CoV-2 crisis. Attention mechanisms, or Transformer-based architectures, are a solution to such… 



Dropout: a simple way to prevent neural networks from overfitting

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

First, convolutional self-attention is proposed by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism, and LogSparse Transformer is proposed, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget.

Dissecting Investment Strategies in the Cross Section and Time Series

We contrast the time-series and cross-sectional performance of three popular investment strategies: carry, momentum and value. While considerable research has examined the performance of these

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Language Modeling with Gated Convolutional Networks

A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection

A novel approach is introduced, in which an online changepoint detection (CPD) module is inserted into a deep momentum network pipeline, which uses a long short-term memory deep-learning architecture to simultaneously learn both trend estimation and position sizing.

Enhancing Time Series Momentum Strategies Using Deep Neural Networks

Backtesting on a portfolio of 88 continuous futures contracts, it is demonstrated that the Sharpe-optimised LSTM improved traditional methods by more than two times in the absence of transactions costs, and continue outperforming when considering transaction costs up to 2-3 basis points.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

An efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.