• Publications
  • Influence
Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages
TLDR
This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages in the context of speed, memory and response latency requirements for real-world applications. Expand
  • 22
  • 1
  • PDF
Improved Image Captioning with Adversarial Semantic Alignment
TLDR
We propose a new conditional GAN for image captioning that enforces semantic alignment between images and captions through a co-attentive discriminator and a context-aware LSTM sequence generator. Expand
  • 14
  • 1
  • PDF
Learning Implicit Generative Models by Matching Perceptual Features
TLDR
We propose a new effective MM approach that learns implicit generative models by performing mean and covariance matching of features extracted from pretrained convolutional layers of pretrained ConvNets. Expand
  • 8
  • 1
  • PDF
Parameter optimization for vocal tract length normalization
TLDR
This paper focuses on the optimization of model parameters for vocal tract length normalization (VTLN) with extensive results for an optimal frequency range. Expand
  • 9
  • 1
Beyond linear transforms: efficient non-linear dynamic adaptation for noise robust speech recognition
TLDR
In this paper, we present new theory and results that combine constrained Maximum Likelihood Linear Regression (MLLR), known as feature space MLLR, a state-of-the-art model adaptation technique, with Dynamic Noise Adaptation (DNA), a state of the art noise adaptation algorithm. Expand
  • 4
  • 1
  • PDF
A bandpass transform for speaker normalization
TLDR
We introduce a new spectral transformation for Speaker Normalization that allows two degrees of freedom enabling complex warpings of the frequency axis that are different from previous works with the Bilinear Transform. Expand
  • 2
  • 1
  • PDF
Adversarial Semantic Alignment for Improved Image Captions
TLDR
We study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. Expand
  • 18
  • PDF
Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition
TLDR
We present the Factorial Hidden Restricted Boltzmann Machine (FHRBM) for robust speech recognition. Expand
  • 7
  • PDF
The 2001 BYBLOS English large vocabulary conversational speech recognition system
TLDR
This paper describes the BYBLOS system that BBN used to participate in the 2001 NIST Hub-5 evaluation benchmark, and presents algorithmic improvements made to the system, along with experimental results. Expand
  • 13
Robust speech recognition using dynamic noise adaptation
TLDR
We show that a model-based technique, dynamic noise adaptation, can substantially improve the performance of commercial-grade speech recognizers trained on large amounts of data, deliver match-trained word error rate (WER) performance, and improve our best recognizer in low SNR conditions. Expand
  • 8
  • PDF