Advances In Children's Speech Recognition Within An Interactive Literacy Tutor


In this paper we present recent advances in acoustic and language modeling that improve recognition performance when children read out loud within digital books. First we extend previous work by incorporating crossutterance word history information and dynamic n-gram language modeling. By additionally incorporating Vocal Tract Length Normalization (VTLN), Speaker-Adaptive Training (SAT) and iterative unsupervised structural maximum a posteriori linear regression (SMAPLR) adaptation we demonstrate a 54% reduction in word error rate. Next, we show how data from children’s read-aloud sessions can be utilized to improve accuracy in a spontaneous story summarization task. An error reduction of 15% over previous published results is shown. Finally we describe a novel real-time implementation of our research system that incorporates time-adaptive acoustic and language modeling.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Hagen2004AdvancesIC, title={Advances In Children's Speech Recognition Within An Interactive Literacy Tutor}, author={Andreas Hagen and Bryan L. Pellom and Sarel van Vuuren and Ronald A. Cole}, year={2004} }