Detection of false investment strategies using unsupervised learning methods

@article{LopezdePrado2021DetectionOF,
  title={Detection of false investment strategies using unsupervised learning methods},
  author={Marcos M. López de Prado and Michael J. Lewis},
  journal={Quantitative Finance},
  year={2021},
  volume={19},
  pages={1555 - 1565}
}
In this paper we address the problem of selection bias under multiple testing in the context of investment strategies. We introduce an unsupervised learning algorithm that determines the number of effectively uncorrelated trials carried out in the context of a discovery. This estimate is critical for computing the familywise false positive probability, and for filtering out false investment strategies. 
Confidence and Power of the Sharpe Ratio under Multiple Testing
TLDR
Analytical estimates to Type I and Type II errors for the Sharpe ratios of investments, and derive their familywise counterparts are provided to help researchers carefully design experiments with high confidence and power.
How “backtest overfitting” in finance leads to false discoveries
Financial investment strategies are often designed and tested using historical market data. But this can frequently give rise to “optimal” strategies that are statistical mirages and perform poorly
A Bayesian Approach to Measurement of Backtest Overfitting
TLDR
A consistent Bayesian approach is proposed that yields the desired robust estimates on the basis of a Markov chain Monte Carlo (MCMC) simulation on a class of technical trading strategies where a seemingly profitable strategy can be selected in the naïve approach.
A Data Science Solution to the Multiple-Testing Crisis in Financial Research
TLDR
The author reduces the problem of selection bias in the context of investment strategy development to two sub-problems: determining the number of essentially independent trials and determining the variance across those trials.
On false discoveries of standard t-tests in investment management applications
TLDR
It is shown by Monte Carlo simulation that, especially in skewed and/or autocorrelated populations, test decisions based on the t-test can be severely biased.
Machine Learning for Asset Managers (Chapter 1)
Successful investment strategies are specific implementations of general theories. An investment strategy that lacks a theoretical justification is likely to be false. Hence, an asset manager should
Using Supervised Machine Learning to Detect High-Skill Mutual Fund Managers
  • Weinan Zheng, D. Wu
  • Computer Science
    2019 4th IEEE International Conference on Cybernetics (Cybconf)
  • 2019
TLDR
It is shown that the supervised machine learning algorithm can separate high- and low-skill corporate bond mutual fund managers by reading risk-related information in funds' prospectus and empirical evidence shows that the results are consistent with the model predictions.
Being Honest in Backtest Reporting: A Template for Disclosing Multiple Tests
Selection bias under multiple testing is a serious problem. From a practitioner’s perspective, failure to disclose the impact of multiple tests of a proposed investment strategy to clients and senior
A Robust Estimator of the Efficient Frontier
TLDR
The nested clustered optimization algorithm (NCO) is introduced, a method that tackles both sources of instability in convex optimization and a Monte Carlo approach that estimates the allocation error produced by various optimization methods on a particular set of input variables.
Trends and Applications of Machine Learning in Quantitative Finance
Recent advances in machine learning are finding commercial applications across many industries, not least the finance industry. This paper focuses on applications in one of the core functions of
...
...

References

SHOWING 1-10 OF 35 REFERENCES
False (and Missed) Discoveries in Financial Economics
TLDR
This work proposes a new way to calibrate both Type I and Type II errors, using a double-bootstrap method and establishes a hurdle that effectively allows for differential costs of the two types of mistakes.
Backtesting
When evaluating a trading strategy, it is routine to discount the Sharpe ratio from a historical backtest. The reason is simple according to the authors: there is inevitable data mining by both the
. . . And the Cross-Section of Expected Returns
Hundreds of papers and hundreds of factors attempt to explain the cross-section of expected returns. Given this extensive data mining, it does not make any economic or statistical sense to use the
The Probability of Backtest Overfitting
TLDR
It is shown that CSCV produces accurate estimates of the probability that a particular backtest is over-fit, through a numerical method that is called combinatorially symmetric cross-validation (CSCV).
Is the Sharpe Ratio Useful in Asset Allocation
Investors often consider Sharpe ratios when making asset allocation decisions and comparing portfolios. Given sampling error in estimated means and variances of returns, promoting Sharpe ratios as
Building Diversified Portfolios That Outperform Out-of-Sample (Presentation Slides)
Mean-Variance portfolios are optimal in-sample, however they tend to perform poorly out-of-sample (even worse than the 1/N naive portfolio!) We introduce a new portfolio construction method that
The Sharpe Ratio Efficient Frontier
We evaluate the probability that an estimated Sharpe ratio exceeds a given threshold in presence of non-Normal returns. We show that this new uncertainty-adjusted investment skill metric (called
Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance
Recent computational advances allow investment managers to methodically search through thousands or even millions of potential options for a pro�table investment strategy. In many instances, the
Comparing Sharpe ratios: So where are the p-values?
Until recently, since Jobson and Korkie (1981), derivations of the asymptotic distribution of the Sharpe ratio that are practically useable for generating confidence intervals or for conducting one-
Lucky Factors
Identifying the factors that drive the cross-section of expected returns is challenging for at least three reasons. First, the choice of testing approach (time-series versus cross-sectional) will
...
...