Learning the Semantics of Audio Signals


This paper proposes a tempo feature extraction method. The tempo information is modeled by the narrow-band, low-pass temporal modulation component, which is decomposed into a modulation spectrum via joint frequency analysis. In implementation, the modulation spectrum is directly estimated from the modified discrete cosine transform coefficients, which are output of partial MP3 (MPEG 1 Layer 3) decoder. Then the log-scale modulation frequency coefficients are extracted from the amplitude of modulation spectrum. The tempo feature is employed in automatic music emotion classification. The accuracy is improved with several hybrid classification methods based on posterior fusion. The experimental results confirm the effectiveness of the presented tempo feature and the hybrid classification approach.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Shi2006LearningTS, title={Learning the Semantics of Audio Signals}, author={Yuan-Yuan Shi and Xuan Zhu and Hyoung-Gook Kim and Ki-Wan Eom and Ji-yeun Kim}, year={2006} }