This paper presents an eigenspace-based fast speaker adaptation approach which can improve the modeling accuracy of the conventional maximum likelihood linear regression (MLLR) techniques when only very limited adaptation data is available. The proposed eigenspace-based MLLR approach was developed by introducing a priori knowledge analysis on the training
The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. Prosody framework and modeling should base on more understanding of both the production and perception of fluent speech. We analyzed speech corpora of read Mandarin Chinese discourses from a top-down perspective on perceived units
In this paper, we investigate the problem of automatic singer identification, detection and tracking in popular music recordings with one or multiple singers. This problem reflects an important issue in multimedia applications that require the transcription and indexing of music data to meet the increasing demand for content-based information retrieval. The
In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BlC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using Delta<i>BIC</i>, a widely-adopted distance
The MATBN Mandarin Chinese broadcast news corpus contains a total of 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. In this paper, we
This paper presents an effective technique for automatically clustering undocumented music recordings based on their associated singer. This serves as an indispensable step towards indexing and content-based information retrieval of music by singer. The proposed clustering system operates in an unsupervised manner, in which no prior information is available
Retrieving audio material based on audio queries is an important and challenging issue in the research field of content-based access to popular music. As part of this research field, we present a preliminary investigation into retrieving cover versions of songs specified by users. The technique enables users to listen to songs with an identical tune, but
We propose a self-splitting Gaussian mixture learning (SGML) algorithm for Gaussian mixture modelling. The SGML algorithm is deterministic and is able to find an appropriate number of components of the Gaussian mixture model (GMM) based on a self-splitting validity measure, Bayesian information criterion (BIC). It starts with a single component in the