Corpus ID: 7441905


  author={Jani Penttil{\"a} and Johannes Peltola and Tapio Sepp{\"a}nen},
In recent years the field of content-based audio signal classification and retrieval has gained a growing amount of interest among researchers around the world. This paper describes a technique, which is used to automatically discriminate audio signals between speech and music. Our goal was to achieve reliable classification results using computationally inexpensive time-domain features. The classification results for lengthy realworld signals are presented as filtered time series that show the… Expand

Figures and Tables from this paper

Rhythm detection for speech-music discrimination in MPEG compressed domain
A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced, which uses just three features that are computed from data directly taken from an MPEG-1 bitstream. Expand
Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification
New methods to detect semantic concepts from digital video based on audible and visual content and Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames are described. Expand
Bayesian approach to sensor-based context awareness
Naive Bayesian networks were applied to classify the contexts of a mobile device user in her normal daily activities, using a naive Bayes framework and an extensive set of audio features derived partly from the algorithms of the upcoming MPEG-7 standard. Expand
TRECVID 2003 Experiments at Media Team Oulu and VTT
Most recent version of VIRE contains an interactive cluster-temporal browser of video shots exploiting three semantic levels of similarity: visual, conceptual and lexical, which capitalises on late fusion of features queries, which was evaluated in manual search task. Expand
A Survey on Human Activity Recognition using Wearable Sensors
The state of the art in HAR based on wearable sensors is surveyed and a two-level taxonomy in accordance to the learning approach and the response time is proposed. Expand
Automatic Annotation of Daily Activity from Smartphone-Based Multisensory Streams
A flexible framework for incorporating heterogeneous sensory modalities combined with state-of-the-art classifiers for sequence labeling is presented, and the accuracy and efficiency of the proposed system for practical lifelogging applications are evaluated. Expand
Candela-Storage, Analysis, and Retrieval of Video Content in Distributed Systems: Personal Mobile Multimedia Management
The CANDELA personal mobile multimedia management platform is presented, which implements an end-to-end system for personal video production, retrieval, and consumption utilizing mobile devices and distributed databases. Expand
On the Automatic Recognition of Human Activities using Heterogeneous Wearable Sensors
ix Chapter


Construction and evaluation of a robust multifeature speech/music discriminator
  • E. D. Scheirer, M. Slaney
  • Computer Science
  • 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1997
A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided. Expand
Genre classification system of TV sound signals based on a spectrogram analysis
A genre classification system of TV sound signals is proposed to provide a proper timbre automatically to the listener and the classification accuracy of speech and music was 95%, and the accuracy between popular, jazz, and classical music was 75%, 30%, and 60%, respectively. Expand
Detection of human speech in structured noise
  • J. Hoyt, H. Wechsler
  • Computer Science
  • Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing
  • 1994
This paper describes research to develop an efficient system that provides a binary decision as to the presence of speech in a short (one to three second) time sample of an acoustic signal. A methodExpand
Real-time discrimination of broadcast speech/music
  • J. Saunders
  • Computer Science
  • 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
A technique which is successful at discriminating speech from music on broadcast FM radio is described, which provides the capability to robustly distinguish the two classes and runs easily in real time. Expand
A comparison of features for speech, music discrimination
This paper examines the discrimination achieved by several different features using common training and test sets and the same classifier on four types of feature, amplitude, cepstra, pitch and zero-crossings. Expand
Content-Based Classification, Search, and Retrieval of Audio
The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features. Expand
Automatic transcription of general audio data: preliminary analyses
  • M. S. Spina, V. Zue
  • Computer Science
  • Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
Preliminary analyses and experiments conducted on data collected from a radio news program found that using relatively straightforward acoustic measurements and classification techniques, it was able to achieve better than 80% classification accuracy for seven salient sound classes present in the data, and nearly 94% classified accuracy for a speech/non-speech decision. Expand
An overview of audio information retrieval
  • J. Foote
  • Computer Science
  • Multimedia Systems
  • 1999
The state of the art in audio information retrieval is reviewed, and recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity are presented with a view towards making audio less “opaque”. Expand
Computational auditory scene analysis
A segregation system that is consistent with psychological and physiological findings and significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992). Expand
Spectral analysis and discrimination by zero-crossings
  • B. Kedem
  • Computer Science
  • Proceedings of the IEEE
  • 1986
The theme of this work is that higher order crossings analysis provides a useful descriptive as well as an analytical tool that can in many respects match spectral analysis. Expand