Learn More
Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different(More)
Determining multiple pitches in noisy and reverberant speech is an important and challenging task. We propose a robust multipitch tracking algorithm in the presence of both background noise and room reverberation. A new channel selection method is utilized in conjunction with an auditory front-end to extract periodicity features in the time-frequency space.(More)
We study the image segmentation on the basis of locally excitatory, globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a lateral potential for each oscillators so that only oscillators with strong connections from their neighborhood can develop high potentials. Based on the concept(More)
Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary(More)
Under noise-free conditions, the quality of reverberant speech is dependent on two distinct perceptual components: coloration and long-term reverberation. They correspond to two physical variables: signal-to-reverberant energy ratio (SRR) and reverberation time, respectively. Inspired by this observation, we propose a two-stage reverberant speech(More)
At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial(More)
A multistage neural model is proposed for an auditory scene analysis task--segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized(More)
Based on a local spatial/frequency representation,we employ a spectral histogram as a feature statistic for texture classification. The spectral histogram consists of marginal distributions of responses of a bank of filters and encodes implicitly the local structure of images through the filtering stage and the global appearance through the histogram stage.(More)
In traditional methods for noise robust automatic speech recognition, the acoustic models are typically trained using clean speech or using multi-condition data that is processed by the same feature enhancement algorithm expected to be used in decoding. In this paper, we propose a noise adaptive training (NAT) algorithm that can be applied to all training(More)
When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker(More)