Learn More
For the purpose of developing pronunciation training tools for second language learning a corpus of non-native speech data has been collected, which consists of almost 18 hours of annotated speech signals spoken by Italian and German learners of English. The corpus is based on 250 utterances selected from typical second language learning exercises. It has(More)
Machine Readable Dictionaries (MRDs) have been used in a variety of language processing tasks including word sense disambiguation, text segmentation, information retrieval and information extraction. In this paper we describe the utilization of semantic knowledge acquired from an MRD for language modelling tasks in relation to speech recognition(More)
The title of this paper playfully contrasts two rather different approaches to language analysis. The "Noisy Channel" 's are the promoters of statistically based approaches to language learning. Many of these studies are based on the Shannons's Noisy Channel model. The "Braying Donkey" 's are those oriented towards theoretically motivated language models.(More)
Several research projects around the world are building grammatically analysed corpora ; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of competing models or theories of grammar and parsing, with the(More)
This paper is a Case Study of user involvement in the requirements speciication for project ISLE: Interactive Spoken Language Education. Developers of Spoken Language Dialogue Systems should involve users from the outset, particularly if the aim is to develop novel solutions for a generic target application area or market. As well as target end-users, SLDS(More)
This paper presents a study on the use of wide-coverage semantic knowledge for large vocabulary (theoretically unrestricted) domain-independent speech recognition. A machine readable dictionary was used to provide the semantic information about the words and a semantic model was developed based on the conceptual association between words as computed(More)
The aim of our project is to develop a system for detecting student copying in Biomedical Science laboratory practical reports. We compare contrasting approaches: " simple " methods Zipping, based on a standard file-compression tool, and Bigrams, a basic comparison of frequent bigrams; and " smart " methods using commercial-strength plagiarism-checking(More)