• Publications
  • Influence
A Kernel Two-Sample Test
TLDR
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).
A Kernel Method for the Two-Sample-Problem
TLDR
This work proposes two statistical tests to determine if two samples are from different distributions, and applies this approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where the test performs strongly.
Weisfeiler-Lehman Graph Kernels
TLDR
A family of efficient kernels for large graphs with discrete node labels based on the Weisfeiler-Lehman test of isomorphism on graphs that outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime.
Correcting Sample Selection Bias by Unlabeled Data
TLDR
A nonparametric method which directly produces resampling weights without distribution estimation is presented, which works by matching distributions between training and testing sets in feature space.
Integrating structured biological data by Kernel Maximum Mean Discrepancy
TLDR
A novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by the experiments.
Protein function prediction via graph kernels
TLDR
A new approach that combines sequential, structural and chemical information into one graph model of proteins, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets.
Efficient graphlet kernels for large graph comparison
TLDR
In this article, two theoretically grounded speedup schemes are introduced, one based on sampling and the second specifically designed for bounded degree graphs, to efficiently compare large graphs that cannot be tackled by existing graph kernels.
Shortest-path kernels on graphs
TLDR
This work proposes graph kernels based on shortest paths, which are computable in polynomial time, retain expressivity and are still positive definite, and shows significantly higher classification accuracy than walk-based kernels.
Whole-genome sequencing of multiple Arabidopsis thaliana populations
TLDR
The majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome are described, their effects on gene function, and the patterns of local and global linkage among these variants.
Covariate Shift by Kernel Mean Matching
This chapter contains sections titled: Introduction, Sample Reweighting, Distribution Matching, Risk Estimates, The Connection to Single Class Support Vector Machines, Experiments, Conclusion,
...
...