Ecological Acoustics Perspective for Content-Based Retrieval of Environmental Sounds

  title={Ecological Acoustics Perspective for Content-Based Retrieval of Environmental Sounds},
  author={Gerard Roma and Jordi Janer and Stefan Kersten and Mattia Schirosa and Perfecto Herrera and Xavier Serra},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
In this paper we present a method to search for environmental sounds in large unstructured databases of user-submitted audio, using a general sound events taxonomy from ecological acoustics. We discuss the use of Support Vector Machines to classify sound recordings according to the taxonomy and describe two use cases for the obtained classification models: a content-based web search interface for a large audio database and a method for segmenting field recordings to assist sound design. 
Active learning of custom sound taxonomies in unstructured audio data
A system for content-based retrieval of audio clips from a large unstructured database that allows users to devise their own sound taxonomies for organizing sounds is described.
Development of the Database for Environmental Sound Research and Application (DESRA): Design, Functionality, and Retrieval Considerations
The database will include a large number of sounds produced by different sound sources, with a thorough background for each sound file, including experimentally obtained perceptual data, so that it can contain a wide variety of acoustic, contextual, semantic, and behavioral information related to an individual sound.
Towards equalization of environmental sounds using auditory-based features
Using auditory filter banks and sound texture synthesis, algorithms that can be integrated with existing audio engines and can additionally support the development of dedicated high-level audio tools aimed at content authoring or transformations based on samples are developed.
Generalisation in Environmental Sound Classification: The ‘Making Sense of Sounds’ Data Set and Challenge
A baseline classification system is introduced, a deep convolutional network, which showed strong performance with an average accuracy on the evaluation data, and is discussed in the light of two alternative explanations: An unlikely accidental category bias in the sound recordings or a more plausible true acoustic grounding of the high-level categories.
A General Framework for Visualization of Sound Collections in Musical Interfaces
A general framework for devising interactive applications based on the content-based visualization of sound collections, which allows for a modular combination of different techniques for sound segmentation, analysis, and dimensionality reduction, using the reduced feature space for interactive applications.
Authoring augmented soundscapes with user-contributed content
A complete augmented soundscapes system that, in an autonomous and continuous manner, spatializes virtual acoustic sources in a geographic location and combines the traditional text-query with content-based audio classification.
Improving the description of instrumental sounds by using ontologies and automatic content analysis
A methodology to build a sound collection by using a proposed ontology of tags and the content analysis of its sounds is defined, providing the possibility of automatically describing eventual new sounds to be integrated within the collection.
Autonomous Generation of Soundscapes using Unstructured Sound Databases
The challenges and current state of the art related to soundscapes modeling and design are presented and some applications and use-cases are mentioned within the contexts of interactive (non-linear) and linear sound design.
Automatic Urban Sound Classification Using Feature Learning Techniques
This thesis describes the process of curating a sizable dataset of real-world acoustic sounds collected from the Freesound archive and takes an established algorithm for feature learning, which has been applied effectively in image recognition and music informatics, and applies it to the classification of urban environmental sounds.
Classification of Environmental Sounds and Strategies of Categorization
In this article we report on listener categorization of meaningful environmental sounds. A starting point for this study was the phenomenological taxonomy proposed by Gaver (1993b). In the first


Semantic-audio retrieval
  • M. Slaney
  • Computer Science
    2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
A system for connecting sounds and words in linked multi-dimensional vector spaces that retrieve sounds with natural language, and the system describes new sounds with words.
Classification and retrieval of sound effects in audiovisual data management
  • Tong Zhang, C.-C. Jay Kuo
  • Computer Science
    Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020)
  • 1999
A method for the classification of sound effects which exploits time-frequency analysis of audio signals and uses the hidden Markov model as the classifier and a query-by-example retrieval approach for sound effects is proposed on top of the archiving scheme, which is proved to be highly efficient and effective.
General sound classification and similarity in MPEG-7
  • M. Casey
  • Computer Science
    Organised Sound
  • 2001
A system for generalised sound classification and similarity using a machine-learning framework that has been incorporated into the MPEG-7 international standard for multimedia content description and is therefore publicly available in the form of standardised interfaces and software reference tools for developers and researchers.
Categorization of environmental sounds
Rate of spectral dynamics is suggested as a possible scheme to categorize sound signals in the environment based on the results of measures to analyze the spectral dynamics of environmental sound signals.
Environmental Sound Recognition With Time–Frequency Audio Features
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Large-scale content-based audio retrieval from text queries
A machine learning approach for retrieving sounds that is novel in that it uses free-form text queries rather sound sample based queries, searches by audio content rather than via textual meta data, and can scale to very large number of audio documents and very rich query vocabulary.
Nearest-Neighbor Automatic Sound Annotation with a WordNet Taxonomy
A nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts, a semantic network that organizes real world knowledge, is used to overcome the need of a huge number of classifiers to distinguish many different sound classes.
Environmental sound description : comparison and generalization of 4 timbre studies
The aim of this study is to adapt the principles of sound timbre description, originally used for musical sounds, to environmental sounds. In order to reach this goal, we inventoried 4 timbre studies
A Framework for Soundscape Analysis And Re-synthesis
This paper presents a methodology for the synthesis and interactive exploration of real soundscapes. We propose a soundscape analysis method that relies upon a sound object behavior typology and a
The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music.
This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach, and reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal.