Space-Efficient Feature Maps for String Alignment Kernels

  title={Space-Efficient Feature Maps for String Alignment Kernels},
  author={Yasuo Tabei and Yoshihiro Yamanishi and R. Pagh},
  journal={2019 IEEE International Conference on Data Mining (ICDM)},
String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first… 

Figures and Tables from this paper


Space-Efficient Feature Maps for String Alignment Kernels
This work presents the first approximation for string alignment kernels, which it calls space-efficient feature maps for edit distance with moves (SFMEDM) by leveraging a metric embedding named edit-sensitive parsing and feature maps (FMs) of random Fourier features (RFFs) for large-scale string analyses.
Protein homology detection using string alignment kernels
New kernels for strings adapted to biological sequences are proposed, which are called local alignment kernels, which measure the similarity between two sequences by summing up scores obtained from local alignments with gaps of the sequences.
Text Classification using String Kernels
A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.
Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification
Novel techniques to efficiently and accurately estimate the pairwise similarity score are developed, which enables us to use much larger values of $k$ and $m$, and get higher predictive accuracy inSequence classification algorithms.
The Spectrum Kernel: A String Kernel for SVM Protein Classification
A new sequence-similarity kernel, the spectrum kernel, is introduced for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem and performs well in comparison with state-of-the-art methods for homology detection.
Fast and scalable polynomial kernels via explicit feature maps
A novel randomized tensor product technique, called Tensor Sketching, is proposed for approximating any polynomial kernel in O(n(d+D \log{D})) time, and achieves higher accuracy and often runs orders of magnitude faster than the state-of-the-art approach for large-scale real-world datasets.
EmbedJoin: Efficient Edit Similarity Joins via Embeddings
This paper proposes an algorithm named EmbedJoin which scales very well with string length and distance threshold, built on the recent advance of metric embeddings for edit distance, and is very different from all the previous approaches.
A Kernel for Time Series Based on Global Alignments
It is proved that this new family of kernels to handle time series, notably speech data, within the framework of kernel methods which includes popular algorithms such as the support vector machine is positive definite under favorable conditions.
Random Features for Large-Scale Kernel Machines
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
D2KE: From Distance to Kernel and Embedding
This work proposes a general framework to derive a family of positive definite kernels from a given dissimilarity measure, which subsumes the widely-used representative-set method as a special case, and relates to the well-known distance substitution kernel in a limiting case.