Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance

@inproceedings{Guerrero2016ChannelSF,
  title={Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance},
  author={Cristina Guerrero and Georgina Tryfou and Maurizio Omologo},
  booktitle={INTERSPEECH},
  year={2016}
}
In a multi-microphone distant speech recognition task, the redundancy of information that results from the availability of multiple instances of the same source signal can be exploited through channel selection. In this work, we propose the use of cepstral distance as a means of assessment of the available channels, in an informed and a blind fashion. In the informed approach the distances between the close-talk and all of the channels are calculated. In the blind method, the cepstral distances… 

Figures and Tables from this paper

Cepstral distance based channel selection for distant speech recognition

Information Fusion Approaches for Distant Speech Recognition in a Multi-microphone Setting

TLDR
Two original solutions are presented, based on information fusion approaches at different levels of the recognition system, one at front-end stage and one at post-decoding stage, namely for the problems of channel selection (CS) and hypothesis combination.

Deep Learning for Distant Speech Recognition

TLDR
Inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, a novel deep learning paradigm called “network of deep neural networks” is proposed.

A reassigned front-end for speech recognition

  • G. TryfouM. Omologo
  • Computer Science
    2017 25th European Signal Processing Conference (EUSIPCO)
  • 2017
TLDR
This paper introduces the use of the TFRCC features, a time-frequency reassigned feature set, as a front-end for speech recognition and proves the superiority of these features compared to a MFCC baseline.

References

SHOWING 1-10 OF 27 REFERENCES

Channel selection measures for multi-microphone speech recognition

Multi-source far-distance microphone selection and combination for automatic transcription of lectures

TLDR
This work shows how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels could be combined at either the waveform level using blind channel combination or at the hypothesis level using confusion network techniques to improve the accuracy of a far field lecture transcription system.

Channel selection in the short-time modulation domain for distant speech recognition

TLDR
A channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain is proposed and quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations.

Channel selection and reverberation-robust automatic speech recognition

TLDR
This thesis is focused on ASR applications in a room environment, where reverberation is the dominant source of distortion, and considers both single- and multi-microphone setups, and provides an overview of the CS measures presented in the literature so far, and compares them experimentally.

Towards Microphone Selection Based on Room Impulse Response Energy-Related Measures

TLDR
Preliminary experiments for a large vocabulary continuous speech recognition task are offered which show how microphone selection using an ideal relative energy measure can largely improve the recognition rate.

On the potential of channel selection for recognition of reverberated speech with multiple microphones

TLDR
It is experimentally shown that there exists a large margin for WER reduction by channel selection, and several possible methods which do not require any a-priori classification are discussed.

The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines

TLDR
The design and outcomes of the 3rd CHiME Challenge, which targets the performance of automatic speech recognition in a real-world, commercially-motivated scenario: a person talking to a tablet device that has been fitted with a six-channel microphone array, are presented.

Speech Dereverberation

TLDR
Speech Dereverberation presents the most important current approaches to the problem of reverberation and defines the current state of the art and encourages further work on this topic by offering open research questions to exercise the curiosity of the reader.

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

TLDR
A first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level are reported.

Channel selection based on multichannel cross-correlation coefficients for distant speech recognition

TLDR
This work presents a new channel selection method in order to increase the computational efficiency of beamforming for distant speech recognition (DSR) without sacrficing performance.