Large linear classification when data cannot fit in memory

@article{Yu2010LargeLC,
  title={Large linear classification when data cannot fit in memory},
  author={Hsiang-Fu Yu and Cho-Jui Hsieh and Kai-Wei Chang and Chih-Jen Lin},
  journal={Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining},
  year={2010}
}
  • Hsiang-Fu Yu, Cho-Jui Hsieh, Chih-Jen Lin
  • Published 25 July 2010
  • Computer Science
  • Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block… 

Figures from this paper

Hashing Algorithms for Large-Scale Learning
TLDR
It is demonstrated that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory.
Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)
TLDR
Empirically verify the practice of empirically verifying that even using the simplest 2-universal hashing does not degrade the learning performance, by demonstrating that the preprocessing cost of b-bit minwise hashing is roughly on the same order of magnitude as the data loading time.
b-Bit Minwise Hashing for Large-Scale Linear SVM
In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no
Linear support vector machines via dual cached loops
TLDR
StreamSVM, the first algorithm for training linear Support Vector Machines (SVMs) which takes advantage of these properties by integrating caching with optimization by performing updates in the dual, thus obviating the need to rebalance frequently visited examples.
b-Bit Minwise Hashing for Large-Scale Learning
TLDR
It is demonstrated that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory.
Recent Advances of Large-Scale Linear Classification
TLDR
A comprehensive survey on the recent development of linear classifiers is given, which shows how efficient optimization methods to construct linear classifier methods have applied them to some large-scale applications.
Faster learning by reduction of data access time
TLDR
The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset by using empirical risk minimization for strongly convex and smooth case.
Snap ML: A Hierarchical Framework for Machine Learning
TLDR
It is proved theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter- node communication and that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results.
Stochastic, Distributed and Federated Optimization for Machine Learning
TLDR
This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting and introduces the concept of Federated Optimization/Learning, where the machine learning problems without having data stored in any centralized manner are solved.
Video hacked dataset for convolutionnal neural networks
TLDR
This work argues that using all images of video to train image convolutionnal neural networks should be considered and shows thatUsing all successive images provides a significant increase of performance compare to use only spaced images on several datasets.
...
1
2
3
4
...

References

SHOWING 1-10 OF 29 REFERENCES
Slow Learners are Fast
TLDR
This paper proves that online learning with delayed updates converges well, thereby facilitating parallel online learning.
A dual coordinate descent method for large-scale linear SVM
TLDR
A novel dual coordinate descent method for linear SVM with L1-and L2-loss functions that reaches an ε-accurate solution in O(log(1/ε)) iterations is presented.
LIBSVM: A library for support vector machines
TLDR
Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Pegasos: primal estimated sub-gradient solver for SVM
TLDR
A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Selective block minimization for faster convergence of limited memory large-scale linear models
TLDR
It is proved that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data.
Feature Engineering and Classifier Ensemble for KDD Cup 2010
TLDR
This team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010 and combined results of student sub-teams by regularized linear regression.
Parallelized Stochastic Gradient Descent
TLDR
This paper presents the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence and introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits.
b-Bit minwise hashing
TLDR
This paper establishes the theoretical framework of b-bit minwise hashing and provides an unbiased estimator of the resemblance for any b and demonstrates that, even in the least favorable scenario, using b=1 may reduce the storage space at least by a factor of 21.3.
P-packSVM: Parallel Primal grAdient desCent Kernel SVM
TLDR
A novel P-packSVM algorithm that can solve the Support Vector Machine (SVM) optimization problem with an arbitrary kernel that embraces the best known stochastic gradient descent method to optimize the primal objective and has 1/¿ dependency in complexity to obtain a solution of optimization error.
Streamed Learning: One-Pass SVMs
TLDR
A single-pass SVM which is based on the minimum enclosing ball of streaming data, and it is shown that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates.
...
1
2
3
...