• Publications
  • Influence
Phoneme recognition using time-delay neural networks
The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computingExpand
Readings in speech recognition
This chapter discusses four main approaches to speech recognition: template-based, knowledge-Based, Stochastic, connectionist, and connectionist. Expand
Online handwriting recognition: the NPen++ recognizer
Initial recognition rates for whole sentences are promising and show that the MS-TDNN architecture is suited to recognizing handwritten data ranging from single characters to whole sentences. Expand
Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder
In this paper, we present our first attempts in building a multilingual Neural Machine Translation framework under a unified approach. We are then able to employ attention-based NMT for many-to-manyExpand
Recognizing emotion in speech
A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task. Expand
Language-independent and language-adaptive acoustic modeling for speech recognition
Different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure are introduced for estimating acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language. Expand
A real-time face tracker
  • Jie Yang, A. Waibel
  • Computer Science
  • Proceedings Third IEEE Workshop on Applications…
  • 2 December 1996
The authors present a real-time face tracker that can track a person's face while the person moves freely in a room and can be applied to teleconferencing and many HCI applications including lip reading and gaze tracking. Expand
A time-delay neural network architecture for isolated word recognition
Abstract A translation-invariant back-propagation network is described that performs better than a sophisticated continuous acoustic parameter hidden Markov model on a noisy, 100-speaker confusableExpand
Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?
A novel method of calculating the confidence intervals for BLEU/NIST scores using bootstrapping is reported, which can determine whether two MT systems are significantly different from each other. Expand
Extracting deep bottleneck features using stacked auto-encoders
It is found that increasing the number of auto-encoders in the network produces more useful features, but requires pre-training, especially when little training data is available. Expand