Guangpu Huang

Learn More
This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field)(More)
This paper presents a method to improve a language model for a limited-resourced language using statistical machine translation from a related language to generate data for the target language. In this work, the machine translation model is trained on a corpus of parallel Mandarin-Cantonese subtitles and used to translate a large set of Mandarin(More)
This research extends our earlier work on using machine translation (MT) and word-based recurrent neural networks to augment language model training data for keyword search in conversational Cantonese speech. MT-based data augmentation is applied to two language pairs: English-Lithuanian and English-Amharic. Using filtered N-best MT hypotheses for language(More)
This paper shows that the integration of statistical and connectionist methods can greatly enhance human-computer interaction through speech. The research approach is inspired by recent advances in high performance automatic speech recognition (ASR) systems and neurocognitive researches of natural language understanding (NLU). And a modest hybrid(More)
This paper describes a recurrent neural network (RNN) based articulatory-phonetic inversion (API) model for improved speech recognition. And a specialized optimization algorithm is introduced to enable human-like heuristic learning in an efficient data-driven manner to capture the dynamic nature of English speech pronunciations. The API model demonstrates(More)
In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. Then we reported the performance of the combined auditory(More)
This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We(More)
This paper studies the dual aspects of speech recognition and synthesis using the consonant-vowel speech patterns. It examines the feasibility of bi-directional phonetic modeling, i.e., articulatory encoding and auditory decoding, in a neural based computational model supplemented with an interactive learning algorithm. Simulation results demonstrate that(More)
We describe a neural based articulatory phonetic inversion model to improve the recognition of the acoustically varying vowels and the syllable initial plosives. The model uses a set of continuous valued articulatory phonetic features (APFs) to explore the interactions between the motor control of articulators and the acoustic phonetic events. We(More)