# Large linear classification when data cannot fit in memory

@article{Yu2010LargeLC, title={Large linear classification when data cannot fit in memory}, author={Hsiang-Fu Yu and Cho-Jui Hsieh and Kai-Wei Chang and Chih-Jen Lin}, journal={Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining}, year={2010} }

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block…

## 33 Citations

Hashing Algorithms for Large-Scale Learning

- Computer ScienceNIPS
- 2011

It is demonstrated that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory.

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

- Computer ScienceArXiv
- 2011

Empirically verify the practice of empirically verifying that even using the simplest 2-universal hashing does not degrade the learning performance, by demonstrating that the preprocessing cost of b-bit minwise hashing is roughly on the same order of magnitude as the data loading time.

b-Bit Minwise Hashing for Large-Scale Linear SVM

- Computer ScienceArXiv
- 2011

In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no…

Linear support vector machines via dual cached loops

- Computer ScienceKDD
- 2012

StreamSVM, the first algorithm for training linear Support Vector Machines (SVMs) which takes advantage of these properties by integrating caching with optimization by performing updates in the dual, thus obviating the need to rebalance frequently visited examples.

b-Bit Minwise Hashing for Large-Scale Learning

- Computer Science
- 2011

It is demonstrated that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory.

Recent Advances of Large-Scale Linear Classification

- Computer ScienceProceedings of the IEEE
- 2012

A comprehensive survey on the recent development of linear classifiers is given, which shows how efficient optimization methods to construct linear classifier methods have applied them to some large-scale applications.

Faster learning by reduction of data access time

- Computer ScienceApplied Intelligence
- 2018

The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset by using empirical risk minimization for strongly convex and smooth case.

Snap ML: A Hierarchical Framework for Machine Learning

- Computer ScienceNeurIPS
- 2018

It is proved theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter- node communication and that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results.

Stochastic, Distributed and Federated Optimization for Machine Learning

- Computer Science
- 2017

This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting and introduces the concept of Federated Optimization/Learning, where the machine learning problems without having data stored in any centralized manner are solved.

Video hacked dataset for convolutionnal neural networks

- Computer Science
- 2017

This work argues that using all images of video to train image convolutionnal neural networks should be considered and shows thatUsing all successive images provides a significant increase of performance compare to use only spaced images on several datasets.

## References

SHOWING 1-10 OF 29 REFERENCES

Slow Learners are Fast

- Computer ScienceNIPS
- 2009

This paper proves that online learning with delayed updates converges well, thereby facilitating parallel online learning.

A dual coordinate descent method for large-scale linear SVM

- Computer ScienceICML '08
- 2008

A novel dual coordinate descent method for linear SVM with L1-and L2-loss functions that reaches an ε-accurate solution in O(log(1/ε)) iterations is presented.

LIBSVM: A library for support vector machines

- Computer ScienceTIST
- 2011

Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

Pegasos: primal estimated sub-gradient solver for SVM

- Computer ScienceMath. Program.
- 2011

A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.

Selective block minimization for faster convergence of limited memory large-scale linear models

- Computer ScienceKDD
- 2011

It is proved that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data.

Feature Engineering and Classifier Ensemble for KDD Cup 2010

- Computer ScienceKDD 2010
- 2010

This team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010 and combined results of student sub-teams by regularized linear regression.

Parallelized Stochastic Gradient Descent

- Computer ScienceNIPS
- 2010

This paper presents the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence and introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits.

b-Bit minwise hashing

- Computer ScienceWWW '10
- 2010

This paper establishes the theoretical framework of b-bit minwise hashing and provides an unbiased estimator of the resemblance for any b and demonstrates that, even in the least favorable scenario, using b=1 may reduce the storage space at least by a factor of 21.3.

P-packSVM: Parallel Primal grAdient desCent Kernel SVM

- Computer Science2009 Ninth IEEE International Conference on Data Mining
- 2009

A novel P-packSVM algorithm that can solve the Support Vector Machine (SVM) optimization problem with an arbitrary kernel that embraces the best known stochastic gradient descent method to optimize the primal objective and has 1/¿ dependency in complexity to obtain a solution of optimization error.

Streamed Learning: One-Pass SVMs

- Computer ScienceIJCAI
- 2009

A single-pass SVM which is based on the minimum enclosing ball of streaming data, and it is shown that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates.