Learn More
Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching(More)
1. Introduction In this paper, we describe the approaches taken by two teams of researchers to the identification of spelling variants. Each team is working on a different language (English and German) but both are using historical texts from much the same time period (17 th – 19 th century). The approaches differ in a number of other respects, for example(More)
Analysis of English historical texts poses a number of obstacles for standard corpus analysis and annotation techniques. In addition to non-standard spellings and contractions, there are difficulties at the morphological, phonetic and syntactic levels. Our response has been to develop a VARiant Detector (VARD). We trained VARD on 16th-19th century data,(More)
The bias of computer searches towards form (e.g. a letter or string of letters) can be a major difficulty for linguistic analyses of texts. In particular, how form relates to context in interactive texts tends to be overlooked. Corpus linguists have been seeking to improve our understanding of the relationships between forms (e.g. collocations, lexical(More)
It is widely accepted that texts representative of a variety of genres display a marked degree of spelling variation throughout the Early Modern English (EModE) period, in spite of the English language's gradual standardization Until recently, our knowledge of this gradual standardization of variant spelling forms was largely founded on qualitative studies.(More)
Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and statistically driven algorithms have been proposed, efficient MWE extraction still remains an unsolved issue.tugal. pp. 7–12)) for MWE extraction, and explore the(More)
As reported by Wilson and Rayson (1993) and Rayson and Wilson (1996), the UCREL semantic analysis system (USAS) has been designed to undertake the automatic semantic analysis of present-day English (henceforth PresDE) texts. In this paper, we report on the feasibility of (re)training the USAS system to cope with English from earlier periods, specifically(More)
Semantic annotation is an important and challenging issue in corpus linguistics and language engineering. While such a tool is available for English in Lancaster (Wilson and Rayson 1993), few such tools have been reported for other languages. In a joint Benedict project funded by the European Community under the 'Information Society Technologies Programme',(More)
Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years, which provides a knowledge base for the USAS semantic tagger. Capturing semantic lexicological theory and empirical lexical usage information extracted from(More)