Multichannel environmental sound segmentation

  title={Multichannel environmental sound segmentation},
  author={Yui Sudo and Katsutoshi Itoyama and Kenji Nishida and Kazuhiro Nakadai},
  journal={Appl. Intell.},
This paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation… 

Sound Classification and Processing of Urban Environments: A Systematic Literature Review

It can be realized that Deep Learning architectures, attention mechanisms, data augmentation techniques, and pretraining are the most crucial factors to consider while creating an efficient sound classification model.



Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

A multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds is proposed.

Multi-channel Environmental sound segmentation

Deeplabv3+, which is one of the state-of-the-art methods for image semantic segmentation, is applied to environmental sound segmentation and the input features are expanded to multi-channel to improve the performance of the overlapping sound.

Environmental sound segmentation utilizing Mask U-Net

An environmental sound segmentation method which combines segmentation using U-Net with sound event detection using CNN to 75-classes of environmental sounds is proposed, which improved learning speed and sound source separation compared with the conventional method.

2D sound source position estimation using microphone arrays and its application to a VR-based bird song analysis system

An outlier removal method is proposed, which takes the properties of the observed sounds into consideration and leads to establishing system design guidelines that ensure a predictable performance.

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge, and an updated version of the one used in the previous challenge, with input features and training modifications to improve its performance.

Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation

This paper tightly integrates spectral and spatial information for deep learning based multi-channel speaker separation. The key idea is to localize individual speakers so that an enhancement network

A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition

This study addresses a framework for a robot audition system, including sound source localization (SSL) and sound source separation (SSS), that can robustly recognize simultaneous speeches in a real environment and proposes two methods for SSL: MUSIC based on generalized singular value decomposition (GSVD-MUSIC) and hierarchical SSL (H-SSL).

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.

Semi-automatic bird song analysis by spatial-cue-based integration of sound source detection, localization, separation, and identification

This paper proposes a system that uses automated methods from robot audition, including sound source detection, localization, separation and identification, and employs a semi-automatic annotation approach that requires much less pre-annotation.

Bird Song Scene Analysis Using a Spatial-Cue-Based Probabilistic Model

This paper addresses bird song scene analysis based on semi-automatic annotation by proposing a new Spatial-Cue-Based Probabilistic Model (SCBPM) for their integration focusing on spatial information.