Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks

  title={Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks},
  author={Luca Comanducci and Maximo Cobos and Fabio Antonacci and Augusto Sarti},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • Luca Comanducci, M. Cobos, A. Sarti
  • Published 3 February 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TDE based on a sub-band analysis of the cross-power spectrum phase, providing a structured two… 

Figures from this paper

SyncNet: Using Causal Convolutions and Correlating Objective for Time Delay Estimation in Audio Signals
This paper proposes machine learning based method, i.e., a semi-causal convolutional neural networkisting of a set of causal and anti-causing layers with a novel correlation-based objective function for robust and reliable time-delay estimation in audio-signals in noisy and reverberating environments.
Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data
For the reduction of DNN –based TDoA estimation error, the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique is investigated.
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
This work aims to improve the accuracy results of the baseline CRNN presented in DCASE 2020 Task 3 by adding residual squeeze-excitation blocks in the convolutional part of the CRNN.
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks
A new single sound source DOA estimation and tracking system based on the well-known SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network that uses 3D convolutional layers to accurately perform the tracking of a sound source even in highly reverberant scenarios where most of the state of the art techniques fail.
High-precision time delay estimation of narrowband radio signal by PHAT-LSTM
Simulation results show that the root mean square error (RMSE) of the PHAT-LSTM is decreased in low signal-to-noise ratio (SNR) compared with traditional TDE methods.
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs
It is proved that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10° even in scenarios with a reverberation time T60 of 1.5 s.
Sound Localization by Self-Supervised Time Delay Estimation
This work adapts the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on “in the wild” internet recordings.
Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization
To reduce the shortcomings of signal acquisition with large-aperture arrays and reduce the impact of noise and interference, a Time-Frequency masking approach is proposed applying Complex Angular Central Gaussian Mixture Models for sound source directional clustering and inter-component phase analysis for polyharmonic speech component restoration.
Acoustic Emission Waveform Picking with Time Delay Neural Networks during Rock Deformation Laboratory Experiments
We report a new method using a time delay neural network to transform acoustic emission (AE) waveforms into a time series of instantaneous frequency content and permutation entropy. This permits
Source Mechanisms of Laboratory Earthquakes During Fault Nucleation and Formation
Identifying deformation and pre‐failure mechanisms preceding faulting is key for fault mechanics and for interpreting precursors to fault rupture. This study presents the results of a new and robust


Time Difference of Arrival Estimation of Speech Signals Using Deep Neural Networks with Integrated Time-frequency Masking
  • Pasi Pertilä, M. Parviainen
  • Physics
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
A direct formulation of the TF masking as a part of a DNN-based ASL structure is proposed, and combined with the use of recurrent layers it exploits the sequential progression of speaker related TDoAs.
Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions
Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-of-the-art techniques.
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach
A sub- band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA, and a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance.
Sound Source Localization Using Deep Learning Models
This study shows that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.
Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network
The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.
Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features.
Broadband doa estimation using convolutional neural networks trained with noise signals
Through experimental evaluation, the ability of the proposed noise trained CNN framework to generalize to speech sources is demonstrated and the robustness of the system to noise, small perturbations in microphone positions, as well as its ability to adapt to different acoustic conditions is investigated.
A learning-based approach to direction of arrival estimation in noisy and reverberant environments
A learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation and uses a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA.
Sound source localization based on deep neural networks with directional activate function exploiting phase information
  • Ryu Takeda, Kazunori Komatani
  • Computer Science
    2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
This paper describes sound source localization (SSL) based on deep neural networks (DNNs) using discriminative training and indicates that the method outperformed the naive DNN-based SSL by 20 points in terms of the block-level accuracy.
A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling
This letter introduces an effective strategy that extends the conventional SRP-PHAT functional with the aim of considering the volume surrounding the discrete locations of the spatial grid, increasing its robustness and allowing for a coarser spatial grid.