Learn More
Automatic prosodic event detection is important for both speech understanding and natural speech synthesis since prosody provides additional information over the short-term segmental features and lexical representation of an utterance. Similar to previous work, this paper focuses on automatic detection of coarse level representation of pitch accents,(More)
Accurately sensing a user’s interest in spoken dialog plays a significant role in many applications, such as tutoring systems and customer service systems. In addition to the widely used acoustic evidence, we introduce different lexical features for interest level prediction and evaluate the impact of automatic speech recognition (ASR) on the effectiveness(More)
Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised(More)
Emotion recognition from speech plays an important role in developing affective and intelligent systems. This study investigates sentence-level emotion recognition. We propose to use a two-step approach to leverage information from subsentence segments for sentence level decision. First we use a segment level emotion classifier to generate predictions for(More)
Phonetic transcriptions are often manually encoded in a pronunciation lexicon. This process is time consuming and requires linguistic expertise. Moreover, it is very difficult to maintain consistency. To handle these problems, we present a model that produces Korean pronunciation variants based on morphophonological analysis. By analyzing phonological(More)
The aim of this study is to investigate the effect of cross-lingual data on human perception and automatic classification of emotion from speech. We use four different databases from three languages (English, Chinese, and German) and two types (acted and improvised). For automatic classification, there is a significant degradation using cross-corpus than(More)