Multiple feature extraction for RNN-based Assamese speech recognition for speech to text conversion application

Abstract

The current work proposes a prototype model for speech recognition in Assamese language using Linear Predictive Coding (LPC) and Mel frequency cepstral coefficient (MFCC). The speech recognition is a part of a speech to text conversion system. The LPC and MFCC features are extracted by two different Recurrent Neural Networks (RNN), which are used to recognize the vocal extract of Assamese language- a major language in the North Eastern part of India. In this work, decision block is designed by a combined framework of RNN block to extract the features. Using this combined architecture our system is able to generate 10% gain in the recognition rate than the case when individual architectures are used.

8 Figures and Tables

Cite this paper

@article{Dutta2012MultipleFE, title={Multiple feature extraction for RNN-based Assamese speech recognition for speech to text conversion application}, author={Kalyan Dutta and Kandarpa Kumar Sarma}, journal={2012 International Conference on Communications, Devices and Intelligent Systems (CODIS)}, year={2012}, pages={600-603} }