Corpus ID: 16176375

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

@inproceedings{Backurs2017OnTF,
  title={On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks},
  author={Arturs Backurs and Piotr Indyk and Ludwig Schmidt},
  booktitle={NIPS},
  year={2017}
}
Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there is a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM is still not understood. We address this issue for multiple popular ERM problems including kernel SVMs, kernel ridge regression, and training the final layer of a neural network. In particular, we give conditional hardness results for these problems based on… Expand

Paper Mentions

Streaming Complexity of SVMs
TLDR
It is shown that, for both problems, for dimensions $d=1,2$, one can obtain streaming algorithms with space polynomially smaller than SGD for strongly convex functions like the bias-regularized SVM, and polynomial lower bounds for both point estimation and optimization are proved. Expand
The Fine-Grained Hardness of Sparse Linear Regression
TLDR
There are no better-than-brute-force algorithms forparse linear regression, assuming any one of a variety of popular conjectures including the weighted k-clique conjecture from the area of fine-grained complexity, or the hardness of the closest vector problem from the geometry of numbers. Expand
Statistical and Computational Trade-Offs in Kernel K-Means
TLDR
It is proved under basic assumptions that sampling Nystr\"om landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy, the first result showing in this kind for unsupervised learning. Expand
Efficient Density Evaluation for Smooth Kernels
TLDR
This paper presents a collection of algorithms for efficient KDF evaluation under the assumptions that the kernel k is "smooth", i.e. the value changes at most polynomially with the distance, and presents a general reduction from density estimation to approximate near neighbor in the underlying space. Expand
On the learnability of quantum neural networks
TLDR
This paper derives the utility bounds of QNN towards empirical risk minimization, and shows that large gate noise, few quantum measurements, and deep circuit depth will lead to the poor utility bounds, and proves that QNN can be treated as a differentially private model. Expand
Below P vs NP: fine-grained hardness for big data problems
TLDR
This thesis presents hardness results for several text analysis and machine learning tasks and shows how lower bounds for edit distance, regular expression matching and other pattern matching and string processing problems have inspired the development of efficient algorithms for some variants of these problems. Expand
Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?
TLDR
It is shown for the first time that O(nnz(A)k) time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset. Expand
Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering
TLDR
The KRR result resolves a variant of an open question of El Alaoui and Mahoney, asking whether the effective statistical dimension is a lower bound on the sampling complexity or not and provides a KKMC algorithm which bypasses the above lower bound. Expand
Beyond P vs. NP: Quadratic-Time Hardness for Big Data Problems
  • P. Indyk
  • Mathematics, Computer Science
  • SPAA
  • 2017
TLDR
An overview of recent research on hardness results for problems in string processing and machine learning and it is shown that, under a natural complexity-theoretic conjecture, near-linear time algorithms do not exist. Expand
Subquadratic High-Dimensional Hierarchical Clustering
TLDR
Experiments are provided showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 43 REFERENCES
On the Complexity of Learning with Kernels
TLDR
There are kernel learning problems where no such method will lead to non-trivial computational savings, and lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix are studied. Expand
Sharp analysis of low-rank kernel matrix approximations
  • F. Bach
  • Mathematics, Computer Science
  • COLT
  • 2013
TLDR
This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods. Expand
Dimension-Free Iteration Complexity of Finite Sum Optimization Problems
TLDR
This work extends the framework of (Arjevani et al., 2015) to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA. Expand
Randomized sketches for kernels: Fast and optimal non-parametric regression
TLDR
It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained. Expand
On the Computational Efficiency of Training Neural Networks
TLDR
This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand
Pegasos: primal estimated sub-gradient solver for SVM
TLDR
A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods. Expand
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Chapters 2–7 make up Part II of the book: artificial neural networks. After introducing the basic concepts of neurons and artificial neuron learning rules in Chapter 2, Chapter 3 describes aExpand
Random Features for Large-Scale Kernel Machines
TLDR
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines. Expand
Recursive Sampling for the Nystrom Method
We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence onExpand
Oracle Complexity of Second-Order Methods for Finite-Sum Problems
TLDR
Evidence that the answer to can second-order information indeed be used to solve finite-sum optimization problems more efficiently is provided, at least in terms of worst-case guarantees is provided. Expand
...
1
2
3
4
5
...