Corpus ID: 220281086

Decentralised Learning with Random Features and Distributed Gradient Descent

@article{Richards2020DecentralisedLW,
  title={Decentralised Learning with Random Features and Distributed Gradient Descent},
  author={Dominic Richards and Patrick Rebeschini and Lorenzo Rosasco},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.00360}
}
We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, Random Features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing Decentralised Kernel… Expand
One-shot distributed ridge regression in high dimensions
TLDR
By analyzing the mean squared error in a high dimensional random-effects model where each predictor has a small effect, several new phenomena are discovered and a new optimally weighted one-shot ridge regression algorithm is proposed. Expand
ORCCA: Optimal Randomized Canonical Correlation Analysis.
TLDR
It is proved that this method, called optimal randomized CCA (ORCCA), can outperform (in expectation) the corresponding kernel CCA with a default kernel and is significantly superior to other approximation techniques in the CCA task. Expand
Distributed Learning Systems with First-Order Methods
TLDR
A brief introduction of some distributed learning techniques that have recently been developed, namely lossy communication compression (e.g., quantization and sparsification), asynchronous communication, and decentralized communication are provided. Expand
Asymptotic Network Independence and Step-Size for A Distributed Subgradient Method
TLDR
It is shown that a distributed subgradient method has this "linear speedup" property when using a class of square-summable-but-not-sumable step-sizes which include $1/t^{\beta}$ when $\beta \in (1/2,1)$; and the same method can fail to have this "asymptotic network independence" property under the optimally decaying step-size. Expand

References

SHOWING 1-10 OF 50 REFERENCES
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up
TLDR
The results exploit the statistical concentration of quantities held by agents and shed new light on the interplay between statistics and communication in decentralised methods. Expand
Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent
TLDR
This work proposes graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning that avoid the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule. Expand
Decentralized Online Learning With Kernels
TLDR
This work proposes an algorithm that allows each individual agent to learn a regression function that is close to the globally optimal regression function, and establishes that with constant step-size selections agents’ functions converge to a neighborhood of the global optimal one while satisfying the consensus constraints as the penalty parameter is increased. Expand
COKE: Communication-Censored Kernel Learning for Decentralized Non-parametric Learning
TLDR
The random feature (RF) approximation approach is used to map the large-volume data represented in the RKH space into a smaller RF space, which facilitates the same-size parameter exchange and enables distributed agents to reach consensus on the function decided by the parameters in the RF space. Expand
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
TLDR
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork. Expand
DSA: Decentralized Double Stochastic Averaging Gradient Algorithm
TLDR
The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on strong convexity of local functions and Lipschitz continuity of local gradients to guarantee linear convergence of the sequence generated by DSA in expectation. Expand
Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features
TLDR
This work proposes to approximate the solution as a fixed-size vector (of larger dimension than the input space) using the previously introduced framework of random Fourier features to pave the way to use standard linear combine-then-adapt techniques. Expand
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms
TLDR
The results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed KRR and classic SGM, and even for non-distributed SRA, they provide the first optimal, capacity-dependent convergence rates, considering the case that the regression function may not be in the RKHS. Expand
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified. Expand
Distributed Learning with Regularized Least Squares
TLDR
It is shown with error bounds in expectation that the global output function of this distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space is a good approximation to the algorithm processing the whole data in one single machine. Expand
...
1
2
3
4
5
...