On Statistical Efficiency in Learning

  title={On Statistical Efficiency in Learning},
  author={Jie Ding and Enmao Diao and Jiawei Zhou and Vahid Tarokh},
  journal={IEEE Transactions on Information Theory},
A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of… 

Figures and Tables from this paper

The Rate of Convergence of Variation-Constrained Deep Neural Networks

It is shown that a class of variation-constrained neural networks, with arbitrary width, can achieve near-parametric rate n − 1 / 2+ δ for an arbitrarily small positive constant δ .

Parallel Assisted Learning

A general method called parallel assisted learning (PAL) is developed that applies to the context where entities perform supervised learning and can collate their data according to a common data identifier.

Apply an optimized NN model to low-dimensional format speech recognition and exploring the performance with restricted factors

The results indicate that the used model compared to other similar methods, Bi-LSTM (bi-directional LSTM), achieves a more efficient preserving a high accuracy level.

Federated Learning Challenges and Opportunities: An Outlook

An outlook on FL development is provided as part of the ICASSP 2022 special session entitled “Frontiers of Federated Learning: Applications, Challenges, and Opportunities,” categorized into emerging directions of FL, namely algorithm foundation, personalization, hardware and security constraints, lifelong learning, and nonstandard data.



Takeuchi's Information Criteria as a Form of Regularization

This paper presents a novel regularization approach based on TIC that does not assume a data generation process and results in a higher entropy distribution through more efficient sample noise suppression and superior model performance when using the TIC based regularization.

Risk bounds for model selection via penalization

It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve, which quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.

Model selection and multimodel inference : a practical information-theoretic approach

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is

Linear Model Selection by Cross-validation

Abstract We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically

Gaussian model selection

Abstract.Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and

Model Selection and the Principle of Minimum Description Length

This article reviews the principle of minimum description length (MDL) for problems of model selection, and illustrates the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis.

Parametric estimation. Finite sample theory

The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are: (1) the study is non-asymptotic, that is, the

Consistency of Bayesian procedures for variable selection

It has long been known that for the comparison of pairwise nested models, a decision based on the Bayes factor produces a consistent model selector (in the frequentist sense). Here we go beyond the


In the problem of selecting a linear model to approximate the true un- known regression model, some necessary and/or sufficient conditions are estab- lished for the asymptotic validity of various

Can the Strengths of AIC and BIC Be Shared

It is shown that in a rigorous sense, even in the setting that the true model is included in the candidates, the above mentioned main strengths of AIC and BIC cannot be shared.