• Publications
  • Influence
Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams
TLDR
An unsupervised learning framework is presented to address the problem of detecting spoken keywords by using segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances and obtaining the keyword detection result. Expand
Unsupervised Pattern Discovery in Speech
TLDR
It is shown how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream by exploiting the structure of repeating patterns within the speech signal. Expand
Speech database development at MIT: Timit and beyond
TLDR
The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives. Expand
A probabilistic framework for segment-based speech recognition
TLDR
This work examines a maximum a posteriori decoding strategy for feature-based recognizers and develops a normalization criterion useful for a segment-based speech recognizer. Expand
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
TLDR
A factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision by formulating it explicitly within a factorsized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. Expand
Highway long short-term memory RNNS for distant speech recognition
TLDR
This paper extends the deep long short-term memory (DL-STM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers, and introduces the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Expand
A Nonparametric Bayesian Approach to Acoustic Model Discovery
TLDR
An unsupervised model is presented that simultaneously segments the speech, discovers a proper set of sub-word units and learns a Hidden Markov Model for each induced acoustic unit and outperforms a language-mismatched acoustic model. Expand
JUPlTER: a telephone-based conversational interface for weather information
TLDR
The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. Expand
What do Neural Machine Translation Models Learn about Morphology?
TLDR
This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. Expand
An Unsupervised Autoregressive Model for Speech Representation Learning
TLDR
Speech representations learned by the proposed unsupervised autoregressive neural model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsuper supervised approaches. Expand
...
1
2
3
4
5
...