Mahito Sugiyama

Learn More
Random walk kernels measure graph similarity by counting matching walks in two graphs. In their most popular form of geometric random walk kernels, longer walks of length k are downweighted by a factor of λ (λ < 1) to ensure convergence of the corresponding geometric series. We know from the field of link prediction that this downweighting often leads to a(More)
MOTIVATION As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. Although several methods for multi-locus mapping have been proposed, it is often unclear how to relate the(More)
MOTIVATION Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition(More)
We present a method for finding all subgraphs whose occurrence is significantly enriched in a particular class of graphs while correcting for multiple testing. Although detecting such significant subgraphs is a crucial step for further analysis across application domains, multiple testing of subgraphs has not been investigated before as it is not only(More)
We propose a new formulation of multi-task feature selection coupled with multiple network regularizers, and show that the problem can be exactly and efficiently solved by maximum flow algorithms. This method contributes to one of the central topics in data mining: How to exploit structural information in multivariate data analysis, which has numerous(More)
In this paper we integrate two essential processes, discretization of continuous data and learning of a model that explains them, towards fully computational machine learning from continuous data. Discretization is fundamental for machine learning and data mining, since every continuous datum; e.g., a real-valued datum obtained by observation in the real(More)
We present learning of figures, nonempty compact sets in Euclidean space, based on Gold’s learning model aiming at a computable foundation for binary classification of multivariate data. Encoding real vectors with no numerical error requires infinite sequences, resulting in a gap between each real vector and its discretized representation used for the(More)
We present a novel algorithm for significant pattern mining, Westfall-Young light. The target patterns are statistically significantly enriched in one of two classes of objects. Our method corrects for multiple hypothesis testing and correlations between patterns via the Westfall-Young permutation procedure, which empirically estimates the null distribution(More)
In recent years spectral clustering has become on e of the most popular clustering algorithms. It is a simple yet powerful method for finding structure in data using spectral properties of an associated pairwise similarity matrix. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional(More)