Learn More
In this paper, we propose a new Bayesian model for fully unsupervised word seg-mentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our model is a nested hierarchical Pitman-Yor language model, where Pitman-Yor spelling model is embedded in the word model. We confirmed that it significantly outperforms previous(More)
This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2006 evaluation compaign. We participated in all four language pair translation tasks (CE, JE, AE and IE) and all two tracks (OPEN and CSTAR). We used a phrase-based SMT in the OPEN track and a hybrid multiple translation engine in the CSTAR track. We also(More)
We present a novel stochastic model of singing voice fundamental frequency (F 0) contours for characterizing expressive dynamic components, such as vibrato and porta-mento. Although dynamic components can be important features for any singing voice applications, modeling and extracting these components from a raw F 0 contour have yet to be accomplished.(More)
This paper presents a new class of tensor fac-torization called positive semidefinite tensor factorization (PSDTF) that decomposes a set of positive semidefinite (PSD) matrices into the convex combinations of fewer PSD basis matrices. PSDTF can be viewed as a natural extension of nonnegative matrix factoriza-tion. One of the main problems of PSDTF is that(More)
The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mixture Model (M 3). The proposed approach focuses on the multi-scale property of speech dynamics, i.e., dynamics in speech can be observed on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales.(More)
We present a novel statistical model for dynamics of various singing behaviors, such as vibrato and overshoot, in a fundamental frequency (F0) contour. These dynamics are the important cues for perceiving individuality of a singer, and can be a useful measure for various applications, such as singing skill evaluation and singing voice synthesis. While most(More)
This paper presents a new fundamental technique for source separation of single-channel audio signals. Although non-negative matrix factorization (NMF) has recently become very popular for music source separation, it deals only with the amplitude or power of the spectrogram of a given mixture signal and completely discards the phase. The component(More)
Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general. However, the problem of context length has almost been neglected so far and a na¨ıve bag-of-words history has been employed in natural language processing. In contrast, in this paper we view(More)