Learn More
We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.
In this paper, we present a system that combines sound and vision to track multiple people. In a cluttered or noisy scene, multi-person tracking estimates have a distinctly non-Gaussian distribution. We apply a particle filter with audio and video state components, and derive observation likelihood methods based on both audio and video measurements. Our(More)
Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveform-based models have not yet matched the performance of log-mel trained neural networks. We will show that raw wave-form features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on(More)
We present a technique for denoising speech using temporally regularized nonnegative matrix factorization (NMF). In previous work [1], we used a regularized NMF update to impose structure within each audio frame. In this paper, we add frame-to-frame regularization across time and show that this additional regularization can also improve our speech denoising(More)
This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines multiple cues derived from both audio and video information. A purely visual cue is obtained using a head tracker to identify possible speakers in a scene and provide both their 3-D(More)
Standard deep neural network-based acoustic models for automatic speech recognition (ASR) rely on hand-engineered input features, typically log-mel filterbank magnitudes. In this paper, we describe a convolutional neural network - deep neural network (CNN-DNN) acoustic model which takes raw multichannel waveforms as input, i.e. without any preceding feature(More)
We present an algorithm to find a low-dimensional decomposition of a spectrogram by formulating this as a regularized non-negative matrix factorization (NMF) problem with a regularization term chosen to encourage independence. This algorithm provides a better decomposition than standard NMF when the underlying sources are independent. It is directly(More)
Joint multichannel enhancement and acoustic modeling using neural networks has shown promise over the past few years. However, one shortcoming of previous work [1, 2, 3] is that the filters learned during training are fixed for decoding, potentially limiting the ability of these models to adapt to previously unseen or changing conditions. In this paper we(More)
BACKGROUND Heterotopic ossification in the extremities remains a common complication in the setting of high-energy wartime trauma, particularly in blast-injured amputees and in those in whom the definitive amputation was performed within the zone of injury. The purposes of this cohort study were to report the experience of one major military medical center(More)