• Corpus ID: 18224226

Feature selection for multimodal: acoustic event detection

  title={Feature selection for multimodal: acoustic event detection},
  author={Taras Butko},
  • T. Butko
  • Published 8 July 2011
  • Computer Science
The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. [] Key Method Two basic detection approaches are investigated in this work: a joint segmentation and classification using Hidden Markov Models (HMMs) with Gaussian Mixture Densities (GMMs), and a detection-by-classification approach using discriminative Support Vector Machines (SVMs). For the first case, a fast one-pass-training feature selection algorithm is developed in this…

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features

This work uses the DCASE dataset, published in an international IEEE AASP challenge for Acoustic Event Detection which includes the "office live" recordings which were prepared in an office environment and proposes to use the Gabor filterbank in addition to MFCCs coefficients to analyze the feature.

Sound event recognition in unstructured environments using spectrogram image processing

The approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions, which enables novel methods for SER to be developed based on spectrogramimage processing, which are inspired by techniques from the field of image processing.

Random Regression Forests for Acoustic Event Detection and Classification

This paper poses event detection and localization as a regression task and proposes an approach based on random forest regression which shows superior performance on two databases ITC-Irst and UPC-TALP.

Acoustic event detection and localization using distributed microphone arrays

This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room, and shows the advantage of carrying out the two tasks, recognition and localization, with a single system.

Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns

This work considers speech patterns as basic acoustic concepts, which embody and represent the target nonspeech signal, and proposes an algorithm to select a sufficient subset, which provides an approximate representation capability of the entire set of available speech patterns.

Automatic analysis of the acoustic environment of a preterm infant in a neonatal intensive care unit

The goal is to develop robust systems able to detect and identify the sounds that appear in such environment, and the detection of the two most relevant types of sounds is targeted in this work: equipment alarms and vocalizations.

Estudi de detecció de sons en un entorn de dispositiu mòbil

This project aims to study enhanced algorithms for sound signals by analysing three well-known algorithms, DTW, GMM and HMM, when various type of sounds with different features are applied.


The crime rates in Mexico have been increasing in recent years; every day, there are reports on social media and in the news where assaults and verbal aggression by criminals can be seen. Public

Integración de tecnologías de audio en entornos inteligentes

El proceso de implementacion de las tecnologias de voz para salas inteligentes sobre una plataforma de computacion distribuida.



Fusion of audio and video modalities for detection of acoustic events

This work aims at improving the AED accuracy by using two complementary audio- based AED systems, built with SVM and HMM classifiers, and also a video-based AED system, which employs the output of a 3D video tracking algorithm to improve detection of steps.

Acoustic Event Detection and Classification

The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the

Acoustic event detection in meeting-room environments

Inclusion of Video Information for Detection of Acoustic Events Using the Fuzzy Integral

Experimental results show that video information can be successfully used to improve the results of audio-based AED, and the fuzzy integral is used to fuse the outputs of the three detection systems.

Multimedia content analysis-using both audio and visual clues

This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.

SOLAR: sound object localization and retrieval in complex audio environments

  • Derek HoiemYan KeR. Sukthankar
  • Computer Science
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
  • 2005
The SOLAR system is presented, a system capable of finding sound objects in complex audio data extracted from movies, and employs boosted decision tree classifiers to select suitable features for modeling each sound object and to discriminate between the object of interest and all other sounds.

Environmental sound recognition using MP-based features

A novel method based on matching pursuit to analyze environment sounds for their feature extraction that is flexible, yet intuitive and physically interpretable, and can be used to supplement another well-known audio feature, i.e. MFCC, to yield higher recognition accuracy for environmental sounds.

Content analysis for audio classification and segmentation

A robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence is proposed, and an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis is developed.

Feature analysis and selection for acoustic event detection

This work proposes quantifying the discriminative capability of each feature component according to the approximated Bayesian accuracy and deriving a discrim inative feature set for acoustic event detection.

Multimodal Semantic Analysis and Annotation for Basketball Video

A new multiple-modality method for extracting semantic information from basketball video is presented and the positions of potential semantic events, such as "foul" and "shot at the basket," are located with additional domain knowledge.