Kevin W. Wilson

Learn More
We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.
In this paper, we present a system that combines sound and vision to track multiple people. In a cluttered or noisy scene, multi-person tracking estimates have a distinctly non-Gaussian distribution. We apply a particle filter with audio and video state components, and derive observation likelihood methods based on both audio and video measurements. Our(More)
Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveformbased models have not yet matched the performance of logmel trained neural networks. We will show that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on(More)
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulate features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers(More)
We present a technique for denoising speech using temporally regularized nonnegative matrix factorization (NMF). In previous work [1], we used a regularized NMF update to impose structure within each audio frame. In this paper, we add frame-to-frame regularization across time and show that this additional regularization can also improve our speech denoising(More)
BACKGROUND Heterotopic ossification in the extremities remains a common complication in the setting of high-energy wartime trauma, particularly in blast-injured amputees and in those in whom the definitive amputation was performed within the zone of injury. The purposes of this cohort study were to report the experience of one major military medical center(More)
Standard deep neural network-based acoustic models for automatic speech recognition (ASR) rely on hand-engineered input features, typically log-mel filterbank magnitudes. In this paper, we describe a convolutional neural network - deep neural network (CNN-DNN) acoustic model which takes raw multichannel waveforms as input, i.e. without any preceding feature(More)
In this paper, we present a probabilistic tracking framework that combines sound and vision to achieve more robust and accurate tracking of multiple objects. In a cluttered or noisy scene, our measurements have a non-Gaussian, multi-modal distribution. We apply a particle filter to track multiple people using combined audio and video observations. We have(More)
Multichannel ASR systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. Recently, we explored doing multichannel enhancement jointly with acoustic modeling, where beamforming and frequency decomposition was folded into one layer of the neural network [1, 2]. In this paper, we explore(More)
We present an algorithm to find a low-dimensional decomposition of a spectrogram by formulating this as a regularized non-negative matrix factorization (NMF) problem with a regularization term chosen to encourage independence. This algorithm provides a better decomposition than standard NMF when the underlying sources are independent. It is directly(More)