• Corpus ID: 10674153

Orthogonal Machine Learning: Power and Limitations

@inproceedings{Mackey2018OrthogonalML,
  title={Orthogonal Machine Learning: Power and Limitations},
  author={Lester W. Mackey and Vasilis Syrgkanis and Ilias Zadik},
  booktitle={ICML},
  year={2018}
}
Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or… 

Figures from this paper

Minimax semiparametric learning with approximate sparsity

TLDR
This paper gives automatic debiased machine learners that are 1/\sqrt{n}$ consistent and asymptotically efficient under minimal conditions and gives lower bounds on the convergence rate of estimators of such objects.

Orthogonal Statistical Learning

TLDR
By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.

Coordinated Double Machine Learning

TLDR
This paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias and improves empirical performance of the proposed method through numerical experiments on both simulated and real data.

Debiasing Linear Prediction

TLDR
This work shows how debiasing techniques can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive, debiased prediction rules.

Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models

TLDR
This work provides some simple theoretical results that justify incorporating machine learning in a standard linear instrumental variable setting, prevalent in empirical research in economics, and provides a simple, user-friendly upgrade to the applied economics toolbox.

Higher-Order Orthogonal Causal Learning for Treatment Effect

TLDR
This paper constructs the k-order orthogonal score function for estimating the average treatment effect (ATE) and presents an algorithm that enables us to obtain the debiased estimator recovered from the score function.

Single Point Transductive Prediction

TLDR
This work shows how techniques from semi-parametric inference can be used transductively to combat regularization bias in linear prediction, and provides non-asymptotic upper bounds on the prediction error of two transductive prediction rules.

Partial Identification with Noisy Covariates: A Robust Optimization Approach

TLDR
This work can formulate the identification of the average treatment effects (ATE) as a robust optimization problem and lead to an efficient robust optimization algorithm that bounds the ATE with noisy covariates, and shows that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.

Estimating Identifiable Causal Effects through Double Machine Learning

TLDR
This paper introduces a complete identification algorithm that returns an influence function (IF) for any identifiable causal functional and shows that DML-ID estimators hold the key properties of debiasedness and doubly robustness.

Robust Causal Inference Under Covariate Shift via Worst-Case Subpopulation Treatment Effects

TLDR
A semiparametrically efficient estimator is developed for the worst-case treatment effect, leveraging machine learning-based estimates of the heterogeneous treatment effect and propensity score, and it is proved that the estimator achieves the optimal asymptotic variance.

References

SHOWING 1-10 OF 16 REFERENCES

Debiasing the lasso: Optimal sample size for Gaussian designs

TLDR
It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.

Double machine learning for treatment and causal parameters

TLDR
The resulting method could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models and achieves the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.

Statistical Learning with Sparsity: The Lasso and Generalizations

TLDR
Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.

Program evaluation and causal inference with high-dimensional data

TLDR
This paper shows that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters, and provides results on honest inference for (function-valued) parameters within this general framework where any high-quality, modern machine learning methods can be used to learn the nonparametric/high-dimensional components of the model.

On asymptotically optimal confidence regions and tests for high-dimensional models

TLDR
A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.

Exact Post-Selection Inference for Sequential Regression Procedures

ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general

Valid post-selection inference

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees

Double/Debiased/Neyman Machine Learning of Treatment Effects

TLDR
The application of a generic double/de-biased machine learning approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using ML methods is illustrated.

Probability: Theory and Examples

This book is an introduction to probability theory covering laws of large numbers, central limit theorems, random walks, martingales, Markov chains, ergodic theorems, and Brownian motion. It is a