# Orthogonal Machine Learning: Power and Limitations

@inproceedings{Mackey2018OrthogonalML, title={Orthogonal Machine Learning: Power and Limitations}, author={Lester W. Mackey and Vasilis Syrgkanis and Ilias Zadik}, booktitle={ICML}, year={2018} }

Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or…

## 26 Citations

### Minimax semiparametric learning with approximate sparsity

- Mathematics, Computer Science
- 2019

This paper gives automatic debiased machine learners that are 1/\sqrt{n}$ consistent and asymptotically efficient under minimal conditions and gives lower bounds on the convergence rate of estimators of such objects.

### Orthogonal Statistical Learning

- Computer ScienceArXiv
- 2019

By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.

### Coordinated Double Machine Learning

- Computer ScienceICML
- 2022

This paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias and improves empirical performance of the proposed method through numerical experiments on both simulated and real data.

### Debiasing Linear Prediction

- Computer ScienceArXiv
- 2019

This work shows how debiasing techniques can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive, debiased prediction rules.

### Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models

- EconomicsArXiv
- 2020

This work provides some simple theoretical results that justify incorporating machine learning in a standard linear instrumental variable setting, prevalent in empirical research in economics, and provides a simple, user-friendly upgrade to the applied economics toolbox.

### Higher-Order Orthogonal Causal Learning for Treatment Effect

- Computer ScienceArXiv
- 2021

This paper constructs the k-order orthogonal score function for estimating the average treatment effect (ATE) and presents an algorithm that enables us to obtain the debiased estimator recovered from the score function.

### Single Point Transductive Prediction

- Computer ScienceICML
- 2020

This work shows how techniques from semi-parametric inference can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive prediction rules.

### Partial Identification with Noisy Covariates: A Robust Optimization Approach

- Computer ScienceCLeaR
- 2022

This work can formulate the identification of the average treatment effects (ATE) as a robust optimization problem and lead to an efficient robust optimization algorithm that bounds the ATE with noisy covariates, and shows that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.

### Estimating Identifiable Causal Effects through Double Machine Learning

- Computer Science, MathematicsAAAI
- 2021

This paper introduces a complete identification algorithm that returns an influence function (IF) for any identifiable causal functional and shows that DML-ID estimators hold the key properties of debiasedness and doubly robustness.

### Robust Causal Inference Under Covariate Shift via Worst-Case Subpopulation Treatment Effects

- MathematicsCOLT
- 2020

A semiparametrically efficient estimator is developed for the worst-case treatment effect, leveraging machine learning-based estimates of the heterogeneous treatment effect and propensity score, and it is proved that the estimator achieves the optimal asymptotic variance.

## References

SHOWING 1-10 OF 16 REFERENCES

### Debiasing the lasso: Optimal sample size for Gaussian designs

- Computer Science, MathematicsThe Annals of Statistics
- 2018

It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.

### Double machine learning for treatment and causal parameters

- Computer Science, Mathematics
- 2016

The resulting method could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models and achieves the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.

### Statistical Learning with Sparsity: The Lasso and Generalizations

- Computer Science
- 2015

Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.

### Program evaluation and causal inference with high-dimensional data

- Mathematics, Economics
- 2013

This paper shows that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters, and provides results on honest inference for (function-valued) parameters within this general framework where any high-quality, modern machine learning methods can be used to learn the nonparametric/high-dimensional components of the model.

### On asymptotically optimal confidence regions and tests for high-dimensional models

- Computer Science, Mathematics
- 2014

A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.

### Exact Post-Selection Inference for Sequential Regression Procedures

- Mathematics
- 2014

ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general…

### Valid post-selection inference

- Mathematics
- 2013

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees…

### Double/Debiased/Neyman Machine Learning of Treatment Effects

- Mathematics
- 2017

The application of a generic double/de-biased machine learning approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using ML methods is illustrated.

### Probability: Theory and Examples

- Mathematics
- 1990

This book is an introduction to probability theory covering laws of large numbers, central limit theorems, random walks, martingales, Markov chains, ergodic theorems, and Brownian motion. It is a…