Corpus ID: 232147547

ODAS: Open embeddeD Audition System

@article{Grondin2021ODASOE,
  title={ODAS: Open embeddeD Audition System},
  author={Franccois Grondin and Dominic L'etourneau and C'edric Godin and Jean-Samuel Lauzon and Jonathan Vincent and Simon Michaud and Samuel Faucher and Francçois Michaud},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.03954}
}
Artificial audition aims at providing hearing capabilities to machines, computers and robots. Existing frameworks in robot audition offer interesting sound source localization, tracking and separation performance, but involve a significant amount of computations that limit their use on robots with embedded computing capabilities. This paper presents ODAS, the Open embeddeD Audition System framework, which includes strategies to reduce the computational load and perform robot audition tasks on… Expand

Figures from this paper

Lightweight Online Separation of the Sound Source of Interest through BLSTM-Based Binary Masking
TLDR
This paper proposes a two step technique: 1) a phase-based beamformer that provides, in addition to the estimation of the SOI, an estimate of the cumulative environmental interference; and 2) a BLSTM-based TF binary masking stage that calculates a binary mask that aims to separate theSOI from the cumulativeEnvironmental interference. Expand

References

SHOWING 1-10 OF 40 REFERENCES
Lightweight and Optimized Sound Source Localization and Tracking Methods for Open and Closed Microphone Array Configurations
TLDR
A novel sound source localization method, called SRP-PHAT-HSDA, that scans space with coarse and fine resolution grids to reduce the number of memory lookups and a modified 3D Kalman (M3K) method capable of simultaneously tracking in 3D the directions of sound sources. Expand
The ManyEars open framework
TLDR
The integration of the ManyEars Library with Willow Garage’s Robot Operating System is presented and the customized microphone board and sound card distributed as an open hardware solution for implementation of robotic audition systems are introduced. Expand
3D Localization of a Sound Source Using Mobile Microphone Arrays Referenced by SLAM
TLDR
The approach explored in this paper consists of having two robots, each equipped with a microphone array, localizing themselves in a shared reference map using SLAM, and data from the microphone arrays are used to triangulate in 3D the location of a sound source in relation to the same map. Expand
A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
TLDR
This work proposes PercepNet, an efficient approach that relies on human perception of speech by focusing on the spectral envelope and on the periodicity of the speech, and demonstrates high-quality, real-time enhancement of fullband speech with less than 5% of a CPU core. Expand
GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones
TLDR
The solution presented in this paper is to train a neural network on pairs of microphones with different spacing and acoustic environmental conditions, and then use this network to estimate a time-frequency mask from all the pairs of microphone forming the array with an arbitrary shape, which is used to perform generalized eigenvalue (GEV) beamforming. Expand
A Deep Residual Network for Large-Scale Acoustic Scene Analysis
TLDR
The task of training a multi-label event classifier directly from the audio recordings of AudioSet is studied and it is found that the models are able to localize audio events when a finer time resolution is needed. Expand
Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals
TLDR
The ability of the proposed convolutional neural network based supervised learning method for estimating the direction of arrival (DOA) of multiple speakers to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated. Expand
Sound Event Localization and Detection Using CRNN on Pairs of Microphones
TLDR
This paper proposes sound event localization and detection methods from multichannel recording based on two Convolutional Recurrent Neural Networks to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array. Expand
The Pytorch-kaldi Speech Recognition Toolkit
TLDR
Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers. Expand
Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition
TLDR
Online MVDR beamforming is proposed by effectively initializing and incrementally updating the parameters of MNMF by using multichannel nonnegative matrix factorization (MNMF), which outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data. Expand
...
1
2
3
4
...