Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and more RAM!

  title={Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and more RAM!},
  author={Tobias Glasmachers},
Support vector machines (SVMs) are a standard method in the machine learning tool-box, in particular for tabular data. Non-linear kernel SVMs often deliver highly accurate predictors, however, at the cost of long training times. That problem is aggravated by the ex-ponential growth of data volumes over time. It was tackled in the past mainly by two types of techniques: approximate solvers, and parallel GPU implementations. In this work, we combine both approaches to design an extremely fast… 

Figures and Tables from this paper



ThunderSVM: A Fast SVM Library on GPUs and CPUs

An efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs) and multi-core CPUs and designs a convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance.

Core Vector Machines: Fast SVM Training on Very Large Data Sets

This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.

Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach

This paper proposes a novel approach called low-rank linearized SVM to scale up kernel SVM on limited resources via an approximate empirical kernel map computed from efficient kernel low- rank decompositions.

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.

Diving into the shallows: a computational perspective on large-scale shallow learning

EigenPro iteration is introduced, based on a preconditioning scheme using a small number of approximately computed eigenvectors, which turns out that injecting this small (computationally inexpensive and SGD-compatible) amount of approximate second-order information leads to major improvements in convergence.

Random Features for Large-Scale Kernel Machines

Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

Dual SVM Training on a Budget

A dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors and demonstrates considerable speed-ups over primal budget training methods.

Locally linear support vector machines and other local models

  • V. KecmanJ. Brooks
  • Computer Science
    The 2010 International Joint Conference on Neural Networks (IJCNN)
  • 2010
This is the first paper which proves the stability bounds for local SVMs and it shows that they are tighter than the ones for traditional, global, SVM.

Finite Sum Acceleration vs . Adaptive Learning Rates for the Training of Kernel Machines on a Budget

Adaptive learning rates are widely used for deep learning, while acceleration techniques like stochastic average and variance reduced gradient descent can achieve a linear convergence rate.

An improved training algorithm for support vector machines

  • E. OsunaR. FreundF. Girosi
  • Computer Science
    Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop
  • 1997
This paper presents a decomposition algorithm that is guaranteed to solve the QP problem and that does not make assumptions on the expected number of support vectors.