Deep Neural Networks for Estimation and Inference

  title={Deep Neural Networks for Estimation and Inference},
  author={M. Farrell and Tengyuan Liang and S. Misra},
We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the… 

Figures and Tables from this paper

Optimal Nonparametric Inference via Deep Neural Network

Consistent Feature Selection for Analytic Deep Neural Networks

It is proved that for a wide class of networks, including deep feed-forward neural networks, convolutional neural Networks, and a major sub-class of residual neural network, the Adaptive Group Lasso selection procedure with GroupLasso as the base estimator is selection-consistent.

Deep Nonparametric Regression on Approximately Low-dimensional Manifolds

This paper derives non-asymptotic upper bounds for the prediction error of the empirical risk minimizer for feedforward deep neural regression and proposes a notion of network relative efficiency between two types of neural networks, which provides a quantitative measure for evaluating the relative merits of different network structures.

Deep Learning for Individual Heterogeneity

Deep neural networks are well-suited to structured modeling of heterogeneity: the network architecture can be designed to match the global structure of the economic model, giving novel methodology for deep learning as well as, more formally, improved rates of convergence.

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

For the first time, this work provides a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.

Optimal Deep Neural Networks by Maximization of the Approximation Power

The application of the optimal architecture for deep neural networks of given size to the Boston Housing dataset confirms empirically the outperformance of the method against state-of-the-art machine learning models.

Double/debiased machine learning for logistic partially linear model

We propose double/debiased machine learning approaches to infer (at the parametric rate) the parametric component of a logistic partially linear model with the binary response following a conditional

Deep Structural Estimation: With an Application to Option Pricing

We propose a novel structural estimation framework in which we train a surrogate of an economic model with deep neural networks. Our methodology alleviates the curse of dimensionality and speeds up

Efficient Estimation of General Treatment Effects using Neural Networks with A Diverging Number of Confounders.

A new unified approach for efficient estimation of treatment effects using feedforward artificial neural networks when the number of covariates is allowed to increase with the sample size is proposed and it attains the semiparametric efficiency bound.

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

This work designs a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds, providing the first universal and optimal reduction from contextual bandits to offline regression.



Posterior Concentration for Sparse Deep Learning

This work introduces Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks, and shows that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps.

Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting and could be of substantial independent interest in many machine learning applications.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Double/Debiased Machine Learning for Treatment and Structural Parameters

This work revisits the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0 and proves that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements.

Breaking the Curse of Dimensionality with Convex Neural Networks

  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

Improved Rates and Asymptotic Normality for Nonparametric Neural Network Estimators

We obtain an improved approximation rate (in Sobolev norm) of r/sup -1/2-/spl alpha//(d+1)/ for a large class of single hidden layer feedforward artificial neural networks (ANN) with r hidden units

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

It is proved that ReLU nets with width d + 1 can approximate any continuous convex function of d variables arbitrarily well, and quantitative depth estimates for the rate of approximation of any continuous scalar function on the d-dimensional cube [ 0, 1 ] d by ReLUnets with widthd + 3 .

High-dimensional econometrics and regularized GMM

This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models, and presents results in a framework where estimators of parameters of interest may be represented directly as approximate means.

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

This work proves that as the RKHS is data-adaptive and task-specific, the residual for $f_*$ lies in a subspace that is potentially much smaller than the orthogonal complement of theRKHS, which formalizes the representation and approximation benefits of neural networks.

Deep vs. shallow networks : An approximation theory perspective

A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning.