Jorge Wuth

  • Citations Per Year
Learn More
In this paper the application of automatic speech recognition (ASR) technology in CAPT (Computer Aided Pronunciation Training) is addressed. A method to automatically generate the competitive lexicon, required by an ASR engine to compare the pronunciation of a target word with its correct and wrong phonetic realization, is presented. In order to enable the(More)
In this paper, a novel confidence-based reinforcement learning (RL) scheme to correct observation log-likelihoods and to address the problem of unsupervised compensation with limited estimation data is proposed. A two-step Viterbi decoding is presented which estimates a correction factor for the observation log-likelihoods that makes the recognized and(More)
The problem of pronunciation evaluation of sentences is defined as the combination of word based subjective pronunciation scores. The mean subjective word score criterion is proposed and modeled with the combination of word-based objective assessment. The word objective metric requires no a priori studies of common mistakes, and it makes use of class based(More)
This paper addresses the problem of time-varying channels in speech-recognition-based human-robot interaction using Locally-Normalized Filter-Bank features (LNFB), and training strategies that compensate for microphone response and room acoustics. Testing utterances were generated by re-recording the Aurora-4 testing database using a PR2 mobile robot,(More)
This paper describes a novel feature-space VTLN method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT sampling when the central frequency of(More)
This paper proposes a novel feature-space VTLN (vocal tract length normalization) method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT (discrete(More)
  • 1