• Corpus ID: 16100393

Tensor machines for learning target-specific polynomial features

  title={Tensor machines for learning target-specific polynomial features},
  author={Jiyan Yang and Alex Gittens},
Recent years have demonstrated that using random feature maps can significantly decrease the training and testing times of kernel-based algorithms without significantly lowering their accuracy. Regrettably, because random features are target-agnostic, typically thousands of such features are necessary to achieve acceptable accuracies. In this work, we consider the problem of learning a small number of explicit polynomial features. Our approach, named Tensor Machines, finds a parsimonious set of… 

Figures and Tables from this paper

Exponential Machines
This paper introduces Exponential Machines (ExM), a predictor that models all interactions of every order in a factorized format called Tensor Train (TT), and shows that the model achieves state-of-the-art performance on synthetic data with high-order interactions and works on par on a recommender system dataset MovieLens 100K.
CPD-Structured Multivariate Polynomial Optimization
The Tensor-Based Multivariate Optimization (TeMPO) framework is introduced and an efficient second-order Gauss–Newton algorithm for multivariate polynomial optimization is introduced, making a compromise between model generality and efficiency of computation.
High-order Learning Model via Fractional Tensor Network Decomposition
A new notion of fractional tensor network (FrTN) decomposition is introduced, which generalizes the conventional TN models by allowing the order to be an arbitrary fraction and demonstrates the effectiveness attributed to the learnable order parameters in FrTN.
Algebraic and Optimization Based Algorithms for Multivariate Regression Using Symmetric Tensor Decomposition
This work casts this regression problem as a linear system with a solution that is a vectorized symmetric tensor, which is assumed to be of low rank, and shows that an algebraic algorithm can be derived even if the number of given data points is low.
Higher-Order Factorization Machines
The first generic yet efficient algorithms for training arbitrary-order higher-orderFactorization machines (HOFMs) are presented and new variants of HOFMs with shared parameters are presented, which greatly reduce model size and prediction times while maintaining similar accuracy.
Deep & Cross Network for Ad Click Predictions
This paper proposes the Deep & Cross Network (DCN), which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions.
Cross-GCN: Enhancing Graph Convolutional Network with k-Order Feature Interactions
This work designs a new operator named Cross-feature Graph Convolution, which explicitly models the arbitrary-order cross features with complexity linear to feature dimension and order size in GCN, and conducts experiments to validate its effectiveness.
Using Support Tensor Mechine for Predicting Cell Penetrating Peptides by Fusing DipC and TipC
It's showed that the fusion of the TipC and DipC can get higher accurary than that of the single feature-based method and the accuracy of STM was higher than thatof SVM in the identification of cell penetrating peptides under different feature expression, which brought a new idea for the prediction of cell penetrate peptides.


Fast and scalable polynomial kernels via explicit feature maps
A novel randomized tensor product technique, called Tensor Sketching, is proposed for approximating any polynomial kernel in O(n(d+D \log{D})) time, and achieves higher accuracy and often runs orders of magnitude faster than the state-of-the-art approach for large-scale real-world datasets.
Compact Random Feature Maps
The error bounds of CRAFT maps are proved demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes, and it is shown how structured random matrices can be used to efficiently generate CRAFTMaps.
Random Features for Large-Scale Kernel Machines
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
A la Carte - Learning Fast Kernels
This work introduces a family of fast, flexible, lightly parametrized and general purpose kernel learning methods, derived from Fastfood basis function expansions, and provides mechanisms to learn the properties of groups of spectral frequencies in these expansions.
An Algorithm for Training Polynomial Networks
The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training deep neural networks, which is a universal learner in the sense that the training error is guaranteed to decrease at every iteration, and can eventually reach zero under mild conditions.
Predictive low-rank decomposition for kernel methods
This paper presents an algorithm that can exploit side information (e.g., classification labels, regression responses) in the computation of low-rank decompositions for kernel matrices and presents simulation results that show that the algorithm yields decomposition of significantly smaller rank than those found by incomplete Cholesky decomposition.
Learning Sparse Polynomial Functions
For some unknown polynomial f(x) of degree-d and k monomials, it is shown how to reconstruct f, within error e, given only a set of examples xi drawn uniformly from the n-dimensional cube, together with evaluations f(xi) on them.
Efficient SVM Training Using Low-Rank Kernel Representations
This work shows that for a low rank kernel matrix it is possible to design a better interior point method (IPM) in terms of storage requirements as well as computational complexity and derives an upper bound on the change in the objective function value based on the approximation error and the number of active constraints (support vectors).
Random Laplace Feature Maps for Semigroup Kernels on Histograms
A new randomized technique called random Laplace features is developed, to approximate a family of kernel functions adapted to the semigroup structure of R+d, which is the natural algebraic structure on the set of histograms and other non-negative data representations.
Tensor sparsification via a bound on the spectral norm of random tensors
A simple, element-wise sparsification algorithm that zeroes out all sufficiently small elements, keeps all sufficiently large elements of A, and retains some of the remaining elements with probabilities proportional to the square of their magnitudes is presented.