Learn More
We present``Transcriber'', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the(More)
The problem of quantitatively comparing tile performance of different broad-coverage grammars of En-glish has to date resisted solution. Prima facie, known English grammars appear to disagree strongly with each other as to the elements of even tile simplest sentences. For instance, the grammars of Steve Abneying), Don tfindle (AT&T), Bob Ingria (BBN), and(More)
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on " Annotation Graphs, " a(More)
This study investigates F 0 declination in broadcast news speech in English and Mandarin Chinese. The results demonstrate a strong relationship between utterance length and declination slope. Shorter utterances have steeper declination even after excluding the initial rising and final lowering effects. Both topline and baseline show declination, but they(More)
This study attempts to improve automatic phonetic segmentation within the HMM framework. Experiments were conducted to investigate the use of phone boundary models, the use of precise phonetic segmentation for training HMMs, and the difference between context-dependent and context-independent phone models in terms of forced alignment performance. Results(More)
This paper describes the creation and content two corpora, TDT-2 and TDT-3, created for the DARPA sponsored Topic Detection and Tracking project. The research goal in the TDT program is to create the core technology of a news understanding system that can process multilingual news content categorizing individual stories according to the topic(s) they(More)
We present a new method for measuring the "darkness" of /l/, and use it to investigate the variation of English /l/ in a large speech corpus that is automatically aligned with phones predicted from an orthographic transcript. We found a correlation between the rime duration and /l/-darkness for syllable-final /l/, but no correlation between /l/ duration and(More)
The Linguistic Data Consortium (LDC) is a non-profit consortium of universities, companies and government research laboratories that supports education, research and technology development in language related disciplines by collecting or creating, distributing and archiving language resources including data and accompanying tools, standards and formats. LDC(More)