• Corpus ID: 42192857


  author={Vikaskumar Ghodasara Daimi and Syed Naser and Shefali Waldekar and Goutam Saha},
Classifying an audio stream as either speech or music is receiving wide spread attention due to its varied applications. In this paper, we propose a novel block based mel frequency cepstral coefficient (MFCC) feature extraction method for music and speech classification. We found that the proposed features give better classification accuracy as compared to conventional MFCC features and zero crossing rate (ZCR) features. Here, we use support vector machine (SVM) classifier with 3-fold cross… 

Figures and Tables from this paper

Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification

This paper attempts ASC by a novel use of wavelet transform based mel-scaled features, and the proposed features are shown to possess better discriminative properties than other spectral features while using a similar classification framework.

Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features

This paper attempts to classify acoustic scenes by a novel use of wavelet-based mel-scaled features by outperforming two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models and the other based on log mel-band energies and multi-layer perceptron.

Audio indexing using feature warping and fusion techniques

This paper reports on the improvement of speech and music indexation performance under various noisy conditions for radio broadcast using warped features fused with traditional features at the output

A comparison of features for speech, music discrimination

This paper examines the discrimination achieved by several different features using common training and test sets and the same classifier on four types of feature, amplitude, cepstra, pitch and zero-crossings.

Automatic transcription of general audio data: preliminary analyses

  • M. SpinaV. Zue
  • Computer Science
    Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
Preliminary analyses and experiments conducted on data collected from a radio news program found that using relatively straightforward acoustic measurements and classification techniques, it was able to achieve better than 80% classification accuracy for seven salient sound classes present in the data, and nearly 94% classified accuracy for a speech/non-speech decision.

Real-time discrimination of broadcast speech/music

  • J. Saunders
  • Computer Science
    1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
A technique which is successful at discriminating speech from music on broadcast FM radio is described, which provides the capability to robustly distinguish the two classes and runs easily in real time.

Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition." Speech Communication

  • 2012

Zue . " Automatic transcription of general audio data : Preliminary analyses . " In Spoken Language , 1996 . ICSLP 96 . Proceedings

  • 1996