Learn More
Cross-language speaker adaptation has many interesting applications, e.g. speech-to-speech translation. However, in cross-language speaker adaptation, a common phoneme set, assumed to be used by different speakers of the same language, does not exist any longer. Instead, a nearest neighbor based phoneme mapping from one language to the other has been(More)
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, " the common spoken language " , is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3(More)
In this paper, we explore how to construct stylistic TTS databases from audio books, in which a storyteller performs multiple roles. The goal is to identify and build a set of speech corpora, each of which not only portrays a representative voice style performed by the speaker, but also has sufficient sentences to synthesize natural speech using unit(More)
This paper proposes a hierarchical framework, which consists of three layers of classifiers, for automatic stress detection in English speech utterances. The top two layers are a linguistic classifier, which assigns stressed labels to all content words and unstressed labels to all functions words, and an acoustic classifier, which assigns stressed and(More)
This paper proposes a new approach for measuring the target cost in unit selection, where the difference between the target and candidate units is estimated by the Kullback-Leibler Divergence (KLD) between the context-dependent Hidden Markov Models (HMM). In order to model the left/right phonetic context, biphone models are generated by merging regular(More)
ABstRAct Over the last decade, the Internet has become one of the most important means of communication in all social areas. The success of Web technology adoption in the private sector has put pressures on the public sector to adopt the Internet to present information and service resources. The concept of creating more efficient and convenient interaction(More)
Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language(More)