Corpus ID: 3264579

Multi-Channel Speech Recognition : LSTMs All the Way Through

@inproceedings{Erdogan2016MultiChannelSR,
  title={Multi-Channel Speech Recognition : LSTMs All the Way Through},
  author={Hakan Erdogan and Tomoki Hayashi and J. Hershey and T. Hori and Chiori Hori and Wei-Ning Hsu and Suyoun Kim and Jonathan Le Roux and Zhong Meng and Shinji Watanabe},
  year={2016}
}
Long Short-Term Memory recurrent neural networks (LSTMs) have demonstrable advantages on a variety of sequential learning tasks. In this paper we demonstrate an LSTM “triple threat” system for speech recognition, where LSTMs drive the three main subsystems: microphone array processing, acoustic modeling, and language modeling. This LSTM trifecta is applied to the CHiME-4 distant recognition challenge. Our previous state-of-the-art ASR systems for the previous CHiME challenge employed LSTM mask… Expand
Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline
This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1)Expand
Speaker Adaptation for Multichannel End-to-End Speech Recognition
TLDR
Experimental results using CHiME-4 show that the proposed multi-path adaptation scheme improves ASR performance and adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network. Expand
Densenet Blstm for Acoustic Modeling in Robust ASR
TLDR
The DenseNet topology is modified to become a kind of feature extractor for the subsequent BLSTM network operating on whole speech utterances and is able to consistently outperform a top-performing baseline based on wide residual networks and BLSTMs providing a 2.4% relative WER reduction on the real test set. Expand
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
TLDR
An internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including the most popular recurrent neural network transducer (RNN-T) and attention-based encoder-decoder (AED) models. Expand
Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr
  • Y. Xu, Chao Weng, +4 authors Dong Yu
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
TLDR
The complex ratio mask (CRM) is proposed to estimate the covariance matrix for the beamformer and a long short-term memory (LSTM) based language model is utilized to re-score hypotheses which further improves the overall performance. Expand
L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition
TLDR
A novel Learning-to-Rescore (L2RS) mechanism is proposed, which is specialized for utilizing a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to rescore the N-best lists for ASR systems. Expand
Character-Aware Attention-Based End-to-End Speech Recognition
TLDR
A novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN, which significantly reduces the model parameters in a traditional AED. Expand
Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting
TLDR
A deep bidirectional long short-term memory (BLSTM) hidden Markov model (HMM) based acoustic model with non-uniform boosted minimum classification error (BMCE) criterion which imposes more significant error cost on the keywords than those on the non-keywords is trained. Expand
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
TLDR
This paper proposes a unified architecture for end-to-end automatic speech recognition (ASR) to encompass microphone-array signal processing such as a state-of-the-art neural beamformer within the end- to-end framework and elaborate the effectiveness of this proposed method on the multichannel ASR benchmarks in noisy environments. Expand
Deep Learning for Environmentally Robust Speech Recognition
TLDR
A review of recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
TLDR
Several integration architectures are proposed and tested, including a pipeline architecture of L STM-based SE and ASR with sequence training, an alternating estimation architecture, and a multi-task hybrid LSTM network architecture. Expand
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition
This paper introduces the MERL/SRI system designed for the 3rd CHiME speech separation and recognition challenge (CHiME-3). Our proposed system takes advantage of recurrent neural networks (RNNs)Expand
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices
TLDR
NTT's CHiME-3 system is described, which integrates advanced speech enhancement and recognition techniques, which achieves a 3.45% development error rate and a 5.83% evaluation error rate. Expand
Recurrent deep neural networks for robust speech recognition
TLDR
Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations to achieve state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. Expand
Deep beamforming networks for multi-channel speech recognition
  • X. Xiao, Shinji Watanabe, +7 authors Dong Yu
  • Computer Science, Engineering
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
TLDR
This work proposes to represent the stages of acoustic processing including beamforming, feature extraction, and acoustic modeling, as three components of a single unified computational network that obtained a 3.2% absolute word error rate reduction compared to a conventional pipeline of independent processing stages. Expand
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
TLDR
A neural network adaptive beamforming (NAB) technique that uses LSTM layers to predict time domain beamforming filter coefficients at each input frame and achieves a 12.7% relative improvement in WER over a single channel model. Expand
End-to-end attention-based large vocabulary speech recognition
TLDR
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels. Expand
KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
TLDR
Experiments demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups. Expand
Joint CTC-attention based end-to-end speech recognition using multi-task learning
TLDR
A novel method for end-to-end speech recognition to improve robustness and achieve fast convergence by using a joint CTC-attention model within the multi-task learning framework, thereby mitigating the alignment issue. Expand
Towards End-To-End Speech Recognition with Recurrent Neural Networks
This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of theExpand
...
1
2
3
...