• Corpus ID: 10674153

Orthogonal Machine Learning: Power and Limitations

  title={Orthogonal Machine Learning: Power and Limitations},
  author={Lester W. Mackey and Vasilis Syrgkanis and Ilias Zadik},
Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or… 

Figures from this paper

Non-Parametric Inference Adaptive to Intrinsic Dimension
We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample
Minimax semiparametric learning with approximate sparsity
This paper gives automatic debiased machine learners that are 1/\sqrt{n}$ consistent and asymptotically efficient under minimal conditions and gives lower bounds on the convergence rate of estimators of such objects.
Orthogonal Statistical Learning
By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.
Debiasing Linear Prediction
This work shows how debiasing techniques can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive, debiased prediction rules.
Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models
This work provides some simple theoretical results that justify incorporating machine learning in a standard linear instrumental variable setting, prevalent in empirical research in economics, and provides a simple, user-friendly upgrade to the applied economics toolbox.
Estimating Identifiable Causal Effects through Double Machine Learning
A new, general class of estimators for any identifiable causal functionals that exhibit DML properties, which is named DML-ID and shown to hold the key properties of debiasedness and doubly robustness.
Higher-Order Orthogonal Causal Learning for Treatment Effect
This paper constructs the k-order orthogonal score function for estimating the average treatment effect (ATE) and presents an algorithm that enables us to obtain the debiased estimator recovered from the score function.
Single Point Transductive Prediction
This work shows how techniques from semi-parametric inference can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive prediction rules.
Partial Identification with Noisy Covariates: A Robust Optimization Approach
This work can formulate the identification of the average treatment effects (ATE) as a robust optimization problem and lead to an efficient robust optimization algorithm that bounds the ATE with noisy covariates, and shows that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Robust Causal Inference Under Covariate Shift via Worst-Case Subpopulation Treatment Effects
A semiparametrically efficient estimator is developed for the worst-case treatment effect, leveraging machine learning-based estimates of the heterogeneous treatment effect and propensity score, and it is proved that the estimator achieves the optimal asymptotic variance.


Debiasing the lasso: Optimal sample size for Gaussian designs
It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.
Double machine learning for treatment and causal parameters
The resulting method could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models and achieves the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.
Confidence intervals for low dimensional parameters in high dimensional linear models
The method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding, and demonstrates the accuracy of the coverage probability and other desirable properties of the confidence intervals proposed.
Statistical Learning with Sparsity: The Lasso and Generalizations
Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
Program evaluation and causal inference with high-dimensional data
This paper shows that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters, and provides results on honest inference for (function-valued) parameters within this general framework where any high-quality, modern machine learning methods can be used to learn the nonparametric/high-dimensional components of the model.
On asymptotically optimal confidence regions and tests for high-dimensional models
A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.
Exact Post-Selection Inference for Sequential Regression Procedures
ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general
Valid post-selection inference
It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees
Double/Debiased/Neyman Machine Learning of Treatment Effects
The application of a generic double/de-biased machine learning approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using ML methods is illustrated.