• Corpus ID: 9085286

Scalable Non-linear Learning with Adaptive Polynomial Expansions

@article{Agarwal2014ScalableNL,
  title={Scalable Non-linear Learning with Adaptive Polynomial Expansions},
  author={Alekh Agarwal and Alina Beygelzimer and Daniel J. Hsu and John Langford and Matus Telgarsky},
  journal={ArXiv},
  year={2014},
  volume={abs/1410.0440}
}
Can we effectively learn a nonlinear representation in time comparable to linear learning? We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. The algorithm is designed for extreme computational efficiency, and an extensive experimental study shows that its computation/prediction tradeoff ability compares very favorably against strong baselines. 

Figures and Tables from this paper

Scalable Non-linear Learning with Adaptive Polynomial Expansions
TLDR
This work describes a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations and shows that its computation/prediction tradeoff ability compares very favorably against strong baselines.
Tensor machines for learning target-specific polynomial features
TLDR
This work considers the problem of learning a small number of explicit polynomial features and finds a parsimonious set of features by optimizing over the hypothesis class introduced by Kar and Karnick for random feature maps in a target-specific manner, named Tensor Machines.
Sparse Quadratic Logistic Regression in Sub-quadratic Time
TLDR
A non-linear correlation test for non-binary finite support case that involves hashing a variable and then correlating with the output variable is proposed, motivated by insights from the boolean case.
Sparse Hierarchical Interaction Learning with Epigraphical Projection
TLDR
This work proposes a primal-dual proximal algorithm based on an epigraphical projection to optimize a general formulation of these learning problems with quadratical interactions between variables, which go beyond the additive models of traditional linear learning.
Scaling Up Stochastic Dual Coordinate Ascent
TLDR
An asynchronous parallel version of the SDCA algorithm is introduced, its convergence properties are analyzed, and a solution for primal-dual synchronization required to achieve convergence in practice is proposed.
ChaCha for Online AutoML
TLDR
The ChaCha (ChampionChallengers) algorithm for making an online choice of hyperparameters in online learning settings provides good performance across a wide array of datasets when optimizing over featurization and hyperparameter decisions.

References

SHOWING 1-10 OF 31 REFERENCES
Scalable Non-linear Learning with Adaptive Polynomial Expansions
TLDR
This work describes a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations and shows that its computation/prediction tradeoff ability compares very favorably against strong baselines.
Iterative Construction of Sparse Polynomial Approximations
TLDR
The algorithm is shown to discover a known polynomial from samples, and to make accurate estimates of pixel values in an image-processing task, based on the tree-growing heuristic in LMS Trees extended to approximation of arbitrary polynomials of the input features.
Fast and scalable polynomial kernels via explicit feature maps
TLDR
A novel randomized tensor product technique, called Tensor Sketching, is proposed for approximating any polynomial kernel in O(n(d+D \log{D})) time, and achieves higher accuracy and often runs orders of magnitude faster than the state-of-the-art approach for large-scale real-world datasets.
A reliable effective terascale linear learning system
We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features, billions of training examples and millions of
Normalized Online Learning
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has
Learning Nonlinear Functions Using Regularized Greedy Forest
  • Rie Johnson, Tong Zhang
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2014
TLDR
This paper proposes a method that directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure and achieves higher accuracy and smaller models than gradient boosting on many of the datasets it has tested on.
Compact Random Feature Maps
TLDR
The error bounds of CRAFT maps are proved demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes, and it is shown how structured random matrices can be used to efficiently generate CRAFTMaps.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TLDR
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Parallel Boosting with Momentum
TLDR
The resulting algorithm, which is called BOOM, for boosting with momentum, enjoys the merits of both techniques and retains the momentum and convergence properties of the accelerated gradient method while taking into account the curvature of the objective function.
Fast Kernel Classifiers with Online and Active Learning
TLDR
This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
...
...