• Corpus ID: 235489741

On the benefits of maximum likelihood estimation for Regression and Forecasting

  title={On the benefits of maximum likelihood estimation for Regression and Forecasting},
  author={Pranjal Awasthi and Abhimanyu Das and Rajat Sen and Ananda Theertha Suresh},
We advocate for a practical Maximum Likelihood Estimation (MLE) approach towards designing loss functions for regression and forecasting, as an alternative to the typical approach of direct empirical risk minimization on a specific target metric. The MLE approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to… 

Figures and Tables from this paper



A modern maximum-likelihood theory for high-dimensional logistic regression

  • P. SurE. Candès
  • Mathematics
    Proceedings of the National Academy of Sciences
  • 2019
It is proved that the maximum-likelihood estimate (MLE) is biased, the variability of the MLE is far greater than classically estimated, and the likelihood-ratio test (LRT) is not distributed as a χ2.

Robust linear least squares regression

A new estimator is provided based on truncating differences of losses in a min-max framework and satisfies a d/n risk bound both in expectation and in deviations, which is the absence of exponential moment condition on the output distribution while achieving exponential deviations.

Robust estimation via robust gradient estimation

The workhorse is a novel robust variant of gradient descent, and the conditions under which this gradient descent variant provides accurate estimators in a general convex risk minimization problem are provided.

Making and Evaluating Point Forecasts

Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual

Deep Factors for Forecasting

A hybrid model that incorporates the benefits of both classical and deep neural networks is proposed, which is data-driven and scalable via a latent, global, deep component, and handles uncertainty through a local classical model.

Loss Minimization and Parameter Estimation with Heavy Tails

The technique can be used for approximate minimization of smooth and strongly convex losses, and specifically for least squares linear regression and low-rank covariance matrix estimation with similar allowances on the noise and covariate distributions.

Empirical risk minimization for heavy-tailed losses

The purpose of this paper is to discuss empirical risk minimization when the losses are not necessarily bounded and may have a distribution with heavy tails. In such situations, usual empirical

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

This work describes sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings and focuses on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni's estimators are also reviewed.

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

The technical side, it is shown that the logarithmic loss and an informationtheoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees, and it is found that the approach typically outperforms comparable non-first-order methods.