Learn More
We present our ongoing work in addressing the issue of overlapped speech in speaker diarization through the use of overlap segmentation, overlapped speech exclusion, and overlap segment labeling. Using feature analysis, we identify the most salient features from a candidate list including those from our previous system and a set of newly proposed features.(More)
We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena(More)
We describe the development of our speech recognition system for the NIST Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRIICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps.(More)
State-of-the-art speaker diarization systems for meetings are now at a point where overlapped speech contributes significantly to the errors made by the system. However, little if no work has yet been done on detecting overlapped speech. We present our initial work toward developing an overlap detection system for improved meeting diarization. We(More)
We examine the suitability of using decision processes to model real-world systems of intelligent adversaries. Decision processes have long been used to study cooperative multiagent interactions, but their practical applicability to adversarial problems has received minimal study. We address the pros and cons of applying sequential decision-making in this(More)
Audio Segmentation for Meetings Speech Processing by Kofi Agyeman Boakye Doctor of Philosophy in Engineering—Electrical Engineering and Computer Sciences University of California, Berkeley Professor Nelson Morgan, Chair Perhaps more than any other domain, meetings represent a rich source of content for spoken language research and technology. Two common(More)
We have created a large diverse set of cars from overhead images, which are useful for training a deep learner to binary classify, detect and count them. The dataset and all related material will be made publically available. The set contains contextual matter to aid in identification of difficult targets. We demonstrate classification and detection on this(More)