# DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural Networks

@inproceedings{Xu2021DebiNetDL, title={DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural Networks}, author={Shiyun Xu}, booktitle={AISTATS}, year={2021} }

Recent years have witnessed strong empirical performance of over-parameterized neural networks on various tasks and many advances in the theory, e.g. the universal approximation and provable convergence to global minimum. In this paper, we incorporate over-parameterized neural networks into semi-parametric models to bridge the gap between inference and prediction, especially in the high dimensional linear problem. By doing so, we can exploit a wide class of networks to approximate the nuisance…

## Figures and Tables from this paper

## References

SHOWING 1-10 OF 59 REFERENCES

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

- Computer ScienceNeurIPS
- 2019

This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

- Computer ScienceNeurIPS
- 2019

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples.

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

- Computer ScienceICLR
- 2019

Over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum.

A Convergence Theory for Deep Learning via Over-Parameterization

- Computer ScienceICML
- 2019

This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.

Sensitivity and Generalization in Neural Networks: an Empirical Study

- Computer ScienceICLR
- 2018

It is found that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization.

Breaking the Curse of Dimensionality with Convex Neural Networks

- Computer ScienceJ. Mach. Learn. Res.
- 2017

This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings

- Computer ScienceNeural Networks
- 1990

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

- Computer ScienceICML
- 2019

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

- Computer ScienceICML
- 2019

It is found that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions, and in the absence of batch normalization, the optimal normalized noise scale is directly proportional to width.

Double/Debiased Machine Learning for Treatment and Structural Parameters

- Computer Science
- 2017

This work revisits the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0 and proves that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements.