David McKelvie

Learn More
ing service in physics and the manufacturer of the INSPEC database, indexed 174,000 items in one year alone (1996), of which about 146,500 are journal articles. However, these already impressive numbers exclude less important journals, workshop proceedings, conference papers and non-English material. Indeed, the growth rate is probably exponential—Maron and(More)
We describe an application of sentence alignment techniques and approximate string matching to the problem of extracting lexicographically interesting word-word pairs from multilingual corpora. Since our interest is in support systems for lexicographers rather than in fully automatic construction of lexicons, we would like to provide access to parameters(More)
This paper describes the design and implementation of the MATE workbench, a program which provides support for the annotation of speech and text. It provides facilities for flexible display and editing of such annotations, and complex querying of a resulting corpus. The workbench offers a more flexible approach than most existing annotation tools, which(More)
This paper describes the lt nsl system (McKelvie et al, 1996), an architecture for writing corpus processing tools. This system is then compared with two other systems which address similar issues, the GATE system (Cunningham et al, 1995) and the IMS Corpus Workbench (Christ, 1994). In particular we address the advantages and disadvantages of an sgml(More)
We investigate how non-linguistic factors influence rates of disfluency in spontaneous speech in a set of task-oriented dialogues (the HCRC Map Task Corpus). The factors we consider are: sex of the speaker; sex of the addressee; conversational role; ability to see the addressee; familiarity with the addressee; and practice at the task. Our analyses examined(More)
In this paper, we describe the results of an experiment to study the eeectiveness of using acoustic stress to improve automatic speech recognition. The CSTR speech recognition system uses hidden semi-Markov models (HSMM) with a separate lexical search component. A hybrid prosodic component has been included which determines the sentence level stress and(More)
The rapid growth in availability of high-quality recordings of natural spoken dialogue (and natural spoken material more generally) has encouraged us to to improve the interchange of transcripts of such material, in order that these resources be easy to exploit by the scientific community as a whole. In this paper, we describe a new SGML architecture which(More)
Large-scale linguistic annotation is currently employed for a wide range of purposes, including comparing communication under different conditions, testing psycholinguistic hypotheses, and training natural language engines. Current software support for linguistic annotation is poor, with much of it written for one-off tasks using special purpose data(More)