Learn More
Cross-language speaker adaptation has many interesting applications, e.g. speech-to-speech translation. However, in cross-language speaker adaptation, a common phoneme set, assumed to be used by different speakers of the same language, does not exist any longer. Instead, a nearest neighbor based phoneme mapping from one language to the other has been(More)
In most state-of-the-art voice conversion systems, speech quality of converted utterances is still unsatisfactory. In this paper, STRAIGHT analysis-synthesis framework is used to improve the quality. A smoothed GMM and MAP adaptation is proposed for spectrum conversion to avoid the overly smooth phenomenon in the traditional GMM method. Since frames are(More)
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, " the common spoken language " , is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3(More)
Cullin-RING ligases (CRLs) represent the largest E3 ubiquitin ligase family in eukaryotes, and the identification of their substrates is critical to understanding regulation of the proteome. Using genetic and pharmacologic Cullin inactivation coupled with genetic (GPS) and proteomic (QUAINT) assays, we have identified hundreds of proteins whose stabilities(More)
In this paper, we explore how to construct stylistic TTS databases from audio books, in which a storyteller performs multiple roles. The goal is to identify and build a set of speech corpora, each of which not only portrays a representative voice style performed by the speaker, but also has sufficient sentences to synthesize natural speech using unit(More)
ABstRAct Over the last decade, the Internet has become one of the most important means of communication in all social areas. The success of Web technology adoption in the private sector has put pressures on the public sector to adopt the Internet to present information and service resources. The concept of creating more efficient and convenient interaction(More)
This paper proposes a hierarchical framework, which consists of three layers of classifiers, for automatic stress detection in English speech utterances. The top two layers are a linguistic classifier, which assigns stressed labels to all content words and unstressed labels to all functions words, and an acoustic classifier, which assigns stressed and(More)
This paper proposes a new approach for measuring the target cost in unit selection, where the difference between the target and candidate units is estimated by the Kullback-Leibler Divergence (KLD) between the context-dependent Hidden Markov Models (HMM). In order to model the left/right phonetic context, biphone models are generated by merging regular(More)
Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language(More)