Models of speech recognition (by both human and machine) have traditionally assumed the phoneme to serve as the fundamental unit of phonetic and phonological analysis. However, phoneme-centric models have failed to provide a convincing theoretical account of the process by which the brain extracts meaning from the speech signal and have fared poorly in… (More)
In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings. The work includes a substantial data collection and transcription effort, and has required a nontrivial degree of infrastructure development. We are undertaking this because the new task area provides a significant… (More)
A beat-synchronous chroma representation enables the matching of cover versions of popular music using global cross-correlation across time-and transposition-skew.
Building machines that emulate the kinds of acoustic information processing that human beings take for granted has proved unexpectedly difficult; the human auditory system is extremely sophisticated in its adaptation to the sounds of the real world, and uses an impressive array of features as cues to organization and interpretation. As more of these cues… (More)
A neural net classifier is trained to identify the pitch of a frame of subband autocorrelation principal components. Accuracy is greatly improved for noisy, bandlimited speech, matched to the training data.
Recognising speech in the presence of non-stationary noise presents a great challenge. Missing data techniques allow recognition based on a subset of features which reflect the speech and not the interference , but identifying these valid features is difficult. Rather than relying only on low-level signal features to locate the target (such as energy… (More)
This thesis has one main goal: design algorithms that computers can use to produce expressive sounding rhythmic phrases. First, I describe four elements that can characterize musical rhythm: metric structure, tempo variation, deviations, and ametric phrases. The rst three elements can be used successfully to model percussive r h ythm. Second, I describe two… (More)
[ Beyond the spectral envelope as the fundamental representation for speech recognition ] S tate-of-the-art automatic speech recognition (ASR) systems continue to improve, yet there remain many tasks for which the technology is inadequate. The core acoustic operation has essentially remained the same for decades: a single feature vector (derived from the… (More)
This paper describes the SPRACH system developed for the 1998 Hub-4E broadcast news evaluation. The system is based on the connectionist-HMM framework and uses both recurrent neural network and multi-layer perceptron acoustic models. We describe both a system designed for the primary transcription hub, and a system for the less-than 10 times real-time… (More)
This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR performance on a corpus of 500 hours of broadcast news material… (More)