Implementation and Performance Evaluation of Speaker Adaptive Continuous Hindi ASR using Tri-phone based Acoustic Modelling

  title={Implementation and Performance Evaluation of Speaker Adaptive Continuous Hindi ASR using Tri-phone based Acoustic Modelling},
  author={Mohit Dua and Ankith Jain Rakesh Kumar and Tripti Chaudhary},
Speech interface to computer is the next big step that computer science needs to take for the general users. Speaking in our native language is a natural and effortless task which carried out with great speed and ease. Speech recognition will play an important role in taking technology to common man. The need is not only for speech interface but speech interface in local language like Hindi. In this paper, we evaluate the performance of Hindi Automatic Speech Recognition (ASR) by using two most… 
1 Citations

Figures and Tables from this paper

ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages
The purpose of this systematic survey is to sum up the best available research on automatic speech recognition of Indian languages that is done by synthesizing the results of several studies by analyzing the possible opportunities, challenges, techniques, methods and the evidence from studies.


Discriminative Techniques for Hindi Speech Recognition System
The existing discriminative techniques like maximum mutual information estimation (MMIE), minimum classification error (MCE), and minimum phone error (MPE) are reviewed, and a comparative study in the context of Hindi language ASR is presented.
Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system
An investigation on the possibility to integrate different types of features such as MFCC, PLP and gravity centroids to improve the performance of ASR in the context of Hindi language shows a significant improvement in case of such few combinations when applied to medium size lexicons in typical field conditions.
Acoustic Analysis for Automatic Speech Recognition
While the main focus in ASR is to obtain spectral envelope measures, human speech communication efficiently exploits the manipulation of one's vocal-cord vibration rate, and so F0 extraction and its integration into ASR are also reviewed.
Speaker adaptation techniques for automatic speech recognition
These adaptation techniques, including maximum a posteriori (MAP) estimation, maximum likelihood linear regression (MLLR), and eigenvoice are surveyed.
Spoken Language Processing: A Guide to Theory, Algorithm and System Development
Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond to create the state of the art in spoken language technology.
Automatic Speech Recognition: A Review
The main objective of the review paper is to bring to light the progress made for ASRs of different languages and the technological viewpoint of ASR in different countries and to compare and contrast the techniques used in various stages of Speech recognition and identify research topic in this challenging field.
50 Years of Progress in Speech and Speaker Recognition Research
The major themes and advances made in the past fifty years of research are surveyed to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication.
Indian Language Speech Database: A Review
Various Speech Database developed in different Indian Languages for speech recognition system & Text to Speech System are discussed.
Survey of current speech technology
Speech recognition and speech synthesis are technologies of particular interest for their support of direct communication between humans and computers through a communications mode humans commonly
A unified approach to statistical language modeling for Chinese
The paper presents a unified approach to Chinese statistical language modeling, which automatically and consistently gathers a high-quality training data set from the Web, creates ahigh-quality lexicon, and segments the training data using this lexicon all using a maximum likelihood principle, which is consistent with the trigram training.