• Corpus ID: 10674153

# Orthogonal Machine Learning: Power and Limitations

@inproceedings{Mackey2018OrthogonalML,
title={Orthogonal Machine Learning: Power and Limitations},
author={Lester W. Mackey and Vasilis Syrgkanis and Ilias Zadik},
booktitle={ICML},
year={2018}
}
• Published in ICML 1 November 2017
• Mathematics, Computer Science
Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or…
25 Citations

## Figures from this paper

Non-Parametric Inference Adaptive to Intrinsic Dimension
• Mathematics, Computer Science
CLeaR
• 2022
We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample
Minimax semiparametric learning with approximate sparsity
• Mathematics, Computer Science
• 2019
This paper gives automatic debiased machine learners that are 1/\sqrt{n}$consistent and asymptotically efficient under minimal conditions and gives lower bounds on the convergence rate of estimators of such objects. Orthogonal Statistical Learning • Computer Science ArXiv • 2019 By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. Debiasing Linear Prediction • Computer Science ArXiv • 2019 This work shows how debiasing techniques can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive, debiased prediction rules. Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models • Economics ArXiv • 2020 This work provides some simple theoretical results that justify incorporating machine learning in a standard linear instrumental variable setting, prevalent in empirical research in economics, and provides a simple, user-friendly upgrade to the applied economics toolbox. Estimating Identifiable Causal Effects through Double Machine Learning • Computer Science, Mathematics AAAI • 2021 A new, general class of estimators for any identiﬁable causal functionals that exhibit DML properties, which is named DML-ID and shown to hold the key properties of debiasedness and doubly robustness. Higher-Order Orthogonal Causal Learning for Treatment Effect • Computer Science ArXiv • 2021 This paper constructs the k-order orthogonal score function for estimating the average treatment effect (ATE) and presents an algorithm that enables us to obtain the debiased estimator recovered from the score function. Single Point Transductive Prediction • Computer Science ICML • 2020 This work shows how techniques from semi-parametric inference can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive prediction rules. Partial Identification with Noisy Covariates: A Robust Optimization Approach • Computer Science CLeaR • 2022 This work can formulate the identification of the average treatment effects (ATE) as a robust optimization problem and lead to an efficient robust optimization algorithm that bounds the ATE with noisy covariates, and shows that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification. Robust Causal Inference Under Covariate Shift via Worst-Case Subpopulation Treatment Effects • Mathematics COLT • 2020 A semiparametrically efficient estimator is developed for the worst-case treatment effect, leveraging machine learning-based estimates of the heterogeneous treatment effect and propensity score, and it is proved that the estimator achieves the optimal asymptotic variance. ## References SHOWING 1-10 OF 18 REFERENCES Debiasing the lasso: Optimal sample size for Gaussian designs • Computer Science, Mathematics The Annals of Statistics • 2018 It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition$s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor$1+o_n(1)\$ for i.i.d. Gaussian designs.
Double machine learning for treatment and causal parameters
• Computer Science, Mathematics
• 2016
The resulting method could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models and achieves the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.
Confidence intervals for low dimensional parameters in high dimensional linear models
• Mathematics, Computer Science
• 2011
The method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding, and demonstrates the accuracy of the coverage probability and other desirable properties of the confidence intervals proposed.
Statistical Learning with Sparsity: The Lasso and Generalizations
• Computer Science
• 2015
Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
Program evaluation and causal inference with high-dimensional data
• Mathematics, Economics
• 2013
This paper shows that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters, and provides results on honest inference for (function-valued) parameters within this general framework where any high-quality, modern machine learning methods can be used to learn the nonparametric/high-dimensional components of the model.
On asymptotically optimal confidence regions and tests for high-dimensional models
• Computer Science, Mathematics
• 2014
A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.
Exact Post-Selection Inference for Sequential Regression Procedures
• Mathematics
• 2014
ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general
Valid post-selection inference
• Mathematics
• 2013
It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees
Double/Debiased/Neyman Machine Learning of Treatment Effects
• Mathematics
• 2017
The application of a generic double/de-biased machine learning approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using ML methods is illustrated.