Learn More
Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different(More)
Under noise-free conditions, the quality of reverberant speech is dependent on two distinct perceptual components: coloration and long-term reverberation. They correspond to two physical variables: signal-to-reverberant energy ratio (SRR) and reverberation time, respectively. Inspired by this observation, we propose a two-stage reverberant speech(More)
Determining multiple pitches in noisy and reverberant speech is an important and challenging task. We propose a robust multipitch tracking algorithm in the presence of both background noise and room reverberation. A new channel selection method is utilized in conjunction with an auditory front-end to extract periodicity features in the time-frequency space.(More)
We study the image segmentation on the basis of locally excitatory, globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a lateral potential for each oscillators so that only oscillators with strong connections from their neighborhood can develop high potentials. Based on the concept(More)
At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial(More)
Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary(More)
The discovery of long range synchronous oscillations in the visual cortex has triggered much interest in understanding the underlying neural mechanisms and in exploring possible applications of neural oscillations. Many neural models thus proposed end up relying on global connections, leading to the question of whether lateral connections alone can produce(More)
Based on a local spatial/frequency representation,we employ a spectral histogram as a feature statistic for texture classification. The spectral histogram consists of marginal distributions of responses of a bank of filters and encodes implicitly the local structure of images through the filtering stage and the global appearance through the histogram stage.(More)
Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little investigated. We propose a system to separate singing voice from(More)
A multistage neural model is proposed for an auditory scene analysis task--segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized(More)