Hybrid speech recognition with Deep Bidirectional LSTM

Abstract

Deep Bidirectional LSTM (DBLSTM) recurrent neural networks have recently been shown to give state-of-the-art performance on the TIMIT speech database. However, the results in that work relied on recurrent-neural-network-specific objective functions, which are difficult to integrate with existing large vocabulary speech recognition systems. This paper investigates the use of DBLSTM as an acoustic model in a standard neural network-HMM hybrid system. We find that a DBLSTM-HMM hybrid gives equally good results on TIMIT as the previous work. It also outperforms both GMM and deep network benchmarks on a subset of the Wall Street Journal corpus. However the improvement in word error rate over the deep network is modest, despite a great increase in frame-level accuracy. We conclude that the hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates. Further investigation needs to be conducted to understand how to better leverage the improvements in frame-level accuracy towards better word error rates.

DOI: 10.1109/ASRU.2013.6707742

Extracted Key Phrases

7 Figures and Tables

Showing 1-10 of 176 extracted citations
0501002014201520162017
Citations per Year

246 Citations

Semantic Scholar estimates that this publication has received between 201 and 307 citations based on the available data.

See our FAQ for additional information.