Learn More
Automatic music type classification is very helpful for the management of digital music database. In this paper, Octavebased Spectral Contrast feature is proposed to represent the spectral characteristics of a music clip. It represented the relative spectral distribution instead of average spectral envelope. Experiments showed that Octave-based Spectral(More)
Emotion is an important element in expressive speech synthesis. Unlike traditional discrete emotion simulations, this paper attempts to synthesize emotional speech by using "strong", "medium", and "weak" classifications. This paper tests different models, a linear modification model (LMM), a Gaussian mixture model (GMM), and a classification and regression(More)
Human emotion is a temporally dynamic event which can be inferred from both audio and video feature sequences. In this paper we investigate the long short term memory recurrent neural network (LSTM-RNN) based encoding method for category emotion recognition in the video. LSTM-RNN is able to incorporate knowledge about how emotion evolves over long range(More)
Twelve sediment cores were collected in July 2007 in open waters of western Bohai Bay, the Port of Tianjin, and the adjacent estuaries of the Haihe and Yongding Rivers. While overall concentrations of trace metals at incremental depths in these cores met the Marine Sediment Quality (GB18668-2002) criteria of China, the magnitude of both metal enrichment(More)
This paper focuses on two key problems for audio-visual emotion recognition in the video. One is the audio and visual streams temporal alignment for feature level fusion. The other one is locating and re-weighting the perception attentions in the whole audio-visual stream for better recognition. The Long Short Term Memory Recurrent Neural Network (LSTM-RNN)(More)
Affective computing is currently one of the most active research topics, furthermore, having increasingly intensive attention. This strong interest is driven by a wide spectrum of promising applications in many areas such as virtual reality, smart surveillance, perceptual interface, etc. Affective computing concerns multidisciplinary knowledge background(More)
This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except(More)
Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each(More)
This paper introduces the CASIA audio emotion recognition method for the audio sub-challenge of Audio/Visual Emotion Challenge 2011 (AVEC2011). Two popular pattern recognition techniques, SVM and AdaBoost, are adopted to solve the emotion recognition problem. The feature set is also simply investigated by comparing the performance of classifier built on the(More)
This paper proposes a method to generate natural prosody parameters in Chinese and English mixed-language speech synthesis system which is based on separate Chinese, English, and a small bilingual corpus. Prosodic assimilation of English words to Chinese contexts can be found by observing the bilingual corpus. The most obvious assimilation characteristics(More)