Learn More
In emotion classification of speech signals, the popular features employed are statistics of fundamental frequency, energy contour, duration of silence and voice quality. However, the performance of systems employing these features degrades substantially when more than two categories of emotion are to be classified. In this paper, a text independent method(More)
We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a multimodal approach, where the appropriate pairing of audio and text processing helps create a more accurate system. Our audio processing technique uses a(More)
We propose a novel technique for the automatic classification of vocal and non-vocal regions in an acoustic musical signal. Our technique uses a combination of harmonic content attenuation using higher level musical knowledge of key followed by sub-band energy processing to obtain features from the musical audio signal. We employ a Multi-Model Hidden Markov(More)
Technology development in wearable sensors and biosignal processing has made it possible to detect human stress from the physiological features. However, the intersubject difference in stress responses presents a major challenge for reliable and accurate stress estimation. This research proposes a novel cluster-based analysis method to measure perceived(More)
This paper describes the details of our systems for feature extraction and search tasks of TRECVID-2004. For feature extraction, we emphasize the use of visual auto-concept annotation technique, with the fusion of text and specialized detectors, to induce concepts in videos. For the search task, our emphasis is twofold. First we employ query-specific(More)
The sensory drive theory of speciation predicts that populations of the same species inhabiting different environments can differ in sensory traits, and that this sensory difference can ultimately drive speciation. However, even in the best-known examples of sensory ecology driven speciation, it is uncertain whether the variation in sensory traits is the(More)
We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem based on a multimodal approach, using an appropriate pairing of audio and text processing to create the resulting prototype. LyricAlly's acoustic signal(More)
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright(More)
This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two important steps in a diarization process. The first step is Initial(More)
This paper describes the I 2 R/NTU system submitted for the NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation Multiple Distant Microphone (MDM) task. In our system, speaker turn detection and clustering is done using Direction of Arrival (DOA) information. Purification of the resultant speaker clusters is then done by performing GMM(More)