Speech Emotion Recognition Based on Deep Belief Networks and Wavelet Packet Cepstral Coefficients


A wavelet packet based adaptive filter-bank construction combined with Deep Belief Network(DBN) feature learning method is proposed for speech signal processing in this paper. On this basis, a set of acoustic features are extracted for speech emotion recognition, namely Coiflet Wavelet Packet Cepstral Coefficients (CWPCC). CWPCC extends the conventional MelFrequency Cepstral Coefficients (MFCC) by adapting the filter-bank structure according to the decision task. And Deep Belief Networks (DBNs) are artificial neural networks having more than one hidden layer, which are first pre-trained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. Speech emotion recognition system is constructed with the feature set, DBNs feature learning structure and Support Vector Machine as classifier. Experimental results on Berlin emotional speech database show that the Coiflet Wavelet Packet is more suitable in speech emotion recognition than other acoustics features and proposed DBNs feature learning structure combined with CWPCC improve emotion recognition performance over the conventional emotion recognition method.

3 Figures and Tables

Cite this paper

@inproceedings{Huang2016SpeechER, title={Speech Emotion Recognition Based on Deep Belief Networks and Wavelet Packet Cepstral Coefficients}, author={Yongming Huang and Ao Wu and Guobao Zhang and Yue Li}, year={2016} }