Tomoko Matsui

Learn More
We propose in this paper a new family of kernels to handle time series, notably speech data, within the framework of kernel methods which includes popular algorithms such as the support vector machine. These kernels elaborate on the well known dynamic time warping (DTW) family of distances by considering the same set of elementary operations, namely(More)
This paper describes an overview of the IR for Spoken Documents Task in NTCIR-9 Workshop. In this task, the spoken term detection (STD) subtask and ad-hoc spoken document retrieval subtask (SDR) are conducted. Both of the subtasks target to search terms, passages and documents included in academic and simulated lectures of the Corpus of Spontaneous(More)
The initiation of DNA replication is tightly regulated in eukaryotic cells to ensure that the genome is precisely duplicated once and only once per cell cycle. This is accomplished by controlling the assembly of a prereplicative complex (pre-RC) which involves the sequential binding to replication origins of the origin recognition complex (ORC), Cdc6/Cdc18,(More)
One significant problem for spoken language systems is how to cope with users' out-of-domain (OOD) utterances which cannot be handled by the back-end application system. In this paper, we propose a novel OOD detection framework, which makes use of the classification confidence scores of multiple topics and applies a linear discriminant model to perform(More)
Gaussian Processes (GPs) are Bayesian nonparametric models that are becoming more and more popular for their superior capabilities to capture highly nonlinear data relationships in various tasks, such as dimensionality reduction, time series analysis, novelty detection, as well as classical regression and classification tasks. In this paper, we investigate(More)
We propose a new approach to isolated-word speech recognition based on penalized logistic regression machines (PLRMs). With this approach we combine the hidden Markov model (HMM) with multiclass logistic regression resulting in a powerful speech recognizer which provides us with the posterior probability for each word. Experiments on the English E-set show(More)
Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning approach where unlabeled data can be different, but nevertheless have similar(More)