• Corpus ID: 14347197

Loss Minimization and Parameter Estimation with Heavy Tails

  title={Loss Minimization and Parameter Estimation with Heavy Tails},
  author={Daniel J. Hsu and Sivan Sabato},
  journal={J. Mach. Learn. Res.},
This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments. We show that the technique can be used for approximate minimization of smooth and strongly convex losses, and specifically for least squares linear regression. For instance, our $d$-dimensional estimator requires just $\tilde{O}(d\log(1/\delta))$ random samples to obtain a constant factor… 

Figures and Tables from this paper

Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond
This work narrows the gap between the Gaussian and heavy-tailed settings for polynomial-time estimators, introduces new techniques to high-probability estimation, and suggests numerous new algorithmic questions in the following vein.
Distributed High-dimensional Regression Under a Quantile Loss Function
This paper transforms the response variable and establishes a new connection between quantile regression and ordinary linear regression, and provides a distributed estimator that is both computationally and communicationally efficient, where only the gradient information is communicated at each iteration.
High dimensional robust M-estimation : arbitrary corruption and heavy tails
The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results.
Mean estimation with sub-Gaussian rates in polynomial time
This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median.
𝓁1-regression with Heavy-tailed Distributions
If the input is bounded, it is shown that the classical empirical risk minimization is competent for `1-regression even when the output is heavy-tailed, and the main advantage of this result is that it achieves a high-probability risk bound without exponential moment conditions on the input and output.
\ell_1-regression with Heavy-tailed Distributions
If the input is bounded, it is shown that the classical empirical risk minimization is competent for $\ell_1$-regression even when the output is heavy-tailed, and the main advantage of this result is that it achieves a high-probability risk bound without exponential moment conditions on the input and output.
Robust descent using smoothed multiplicative noise
This work proposes a novel robust gradient descent procedure which makes use of a smoothed multiplicative noise applied directly to observations before constructing a sum of soft-truncated gradient coordinates, and shows that the procedure has competitive theoretical guarantees.
Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices
The main technical contribution is the study of the lower tail of the smallest singular value of empirical covariance matrices around $0, which establishes a lower bound on this lower tail, valid for any distribution in dimension $d \geq 2$, together with a matching upper bound under a necessary regularity condition.
This paper proposes to apply the penalized least-squares approach to the appropriately truncated or shrunk data and gives a robust covariance estimator with concentration inequality and optimal rate of convergence in terms of the spectral norm, when the samples only bear bounded fourth moment.
User-Friendly Covariance Estimation for Heavy-Tailed Distributions
This work introduces element-wise and spectrum-wise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix and proposes tuning-free procedures that automatically calibrate the tuning parameters.


Heavy-tailed regression with a generalized median-of-means
It is shown that a random sample of size O(d log(1/δ)) suffices to obtain a constant factor approximation to the optimal loss with probability 1-δ, a minimax optimal sample complexity up to log factors.
Robust linear least squares regression
A new estimator is provided based on truncating differences of losses in a min-max framework and satisfies a d/n risk bound both in expectation and in deviations, which is the absence of exponential moment condition on the output distribution while achieving exponential deviations.
Learning without Concentration
We obtain sharp bounds on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members and
Covariance estimation for distributions with 2+ε moments
We study the minimal sample size N=N(n) that suffices to estimate the covariance matrix of an n-dimensional distribution by the sample covariance matrix in the operator norm, with an arbitrary fixed
Nuclear norm penalization and optimal rates for noisy low rank matrix completion
A new nuclear norm penalized estimator of A_0 is proposed and a general sharp oracle inequality for this estimator is established for arbitrary values of $n,m_1,m-2$ under the condition of isometry in expectation to find the best trace regression model approximating the data.
This paper proposes a slightly more complicated construction of robust M-estimation and shows that this strategy can be used when the data are only assumed to be mixing, and applies this general approach to least- squares density estimation.
Some sharp performance bounds for least squares regression with L1 regularization
We derive sharp performance bounds for least squares regression with L regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for L 1
Geometric median and robust estimation in Banach spaces
In many real-world applications, collected data are contaminated by noise with heavy-tailed distribution and might contain outliers of large magnitude. In this situation, it is necessary to apply
Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.
The L1L1 penalized LAD estimator for high dimensional linear regression
  • Lie Wang
  • Computer Science, Mathematics
    J. Multivar. Anal.
  • 2013