Learn More
Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. As there are only a few emphasized words in a sentence, the problem of the data limitation is one of the most important problems for emphatic speech synthesis. In this paper, we analyze contrastive (neutral versus(More)
—With the rapid development of wireless communication technology and the proliferation of mobile devices, mobile applications are more and more widely used. Depending on the characteristics that the existing mobile applications need long development cycles depending on a variety of mobile devices, this paper puts forward an HTML5-based mobile middleware on(More)
This paper investigates the incorporation of hidden Markov model (HMM) based emphatic speech synthesis for audio exaggeration into an audio-visual speech synthesis framework for the corrective feedback in computer-aided pronunciation training (CAPT). To improve the voice quality of the synthetic emphatic speech, this paper proposes a new method for HMM(More)
Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data(More)
Labeling emphatic words from speech recordings plays an important role in building speech corpus for expressive speech synthesis. People generally pronounce some words stronger than usual, making the speech more expressive and signaling the focus of the sentence. Contrastive word pairs are often pronounced with stronger prominences and their presence(More)
Speech is bimodal in nature. There are close correlations between the acoustic speech signals and the visual gestures such as lip movements, facial expressions and head motions. For speech driven talking avatar, how to derive more representative acoustic features from which to predict more accurate and realistic visual gestures still remains the research(More)
  • 1