Linear Regression With Distributed Learning: A Generalization Error Perspective

  title={Linear Regression With Distributed Learning: A Generalization Error Perspective},
  author={Martin Hellkvist and Ayça {\"O}zçelikkale and Anders Ahl{\'e}n},
  journal={IEEE Transactions on Signal Processing},
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear regression where the model parameters, i.e., the unknowns, are distributed over the network. We adopt a statistical learning approach. In contrast to works that focus on the performance on the training data, we focus on the generalization error, i.e., the… 

Figures from this paper

Federated Coordinate Descent for Privacy-Preserving Multiparty Linear Regression

Theoretical security analysis and experimental results demonstrate that FCD can be performed effectively and efficiently, and provide as low MAE measure as centralized methods under tasks of three types of linear regressions on real-world UCI datasets.

On Distributed Exact Sparse Linear Regression over Networks

This work shows theo- retically and empirically that, under appropriate assumptions, where each agent solves smaller and local integer programming problems, all agents will eventually reach a consensus on the same sparse optimal regressor.

Estimation and Model Misspecification: Fake and Missing Features

It is shown that the estimation error can be decreased by including more fake features in the model, even to the point where the model is overparametrized, i.e., the model contains more unknowns than observations.



CoCoA: A General Framework for Communication-Efficient Distributed Optimization

This work presents a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing, and extends the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso.

Coded Stochastic ADMM for Decentralized Consensus Optimization With Edge Computing

A class of minibatch stochastic alternating direction method of multipliers (ADMMs) algorithms is explored and it is revealed that the proposed algorithm is communication efficient, rapidly responding, and robust in the presence of straggler nodes compared with state-of-the-art algorithms.

Generalization Error for Linear Regression under Distributed Learning

This work presents an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes in a linear regression setting where theunknowns are distributed over a network of nodes.

Optimal Regularization Can Mitigate Double Descent

This work proves that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as the authors grow either the sample size or the model size, and demonstrates empirically that optimalsized regularization can mitigate double descent for more general models, including neural networks.

Federated Learning for Wireless Communications: Motivation, Opportunities, and Challenges

An accessible introduction to the general idea of federated learning is provided, several possible applications in 5G networks are discussed, and key technical challenges and open problems for future research on Federated learning in the context of wireless communications are described.

Privacy-Preserving Distributed Machine Learning via Local Randomization and ADMM Perturbation

A privacy-preserving ADMM-based DML framework with two novel features: first, the assumption commonly made in the literature that the users trust the server collecting their data is removed, and second, the framework provides heterogeneous privacy for users depending on data's sensitive levels and servers’ trust degrees.

Reconciling modern machine-learning practice and the classical bias–variance trade-off

This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.

Benign overfitting in linear regression

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.

The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated.

On the mean and variance of the generalized inverse of a singular Wishart matrix

We derive the first and the second moments of the Moore-Penrose generalized inverse of a singular standard Wishart matrix without relying on a density. Instead, we use the moments of an inverse