Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching… (More)
In this paper, we report on our experiment to extract Chinese multiword expressions from corpus resources as part of a larger research effort to improve a machine translation (MT) system. For existing MT systems, the issue of multi-word expression (MWE) identification and accurate interpretation from source to target language remains an unsolved problem.… (More)
In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are… (More)
We introduce an annotation type system for a data-driven NLP core system. The specifications cover formal document structure and document meta information, as well as the linguistic levels of morphology, syntax and semantics. The type system is embedded in the framework of the Unstructured Information Management Architecture (UIMA).
Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years, which provides a knowledge base for the USAS semantic tagger. Capturing semantic lexicological theory and empirical lexical usage information extracted from… (More)
The problem we address in this paper is that of providing contextual examples of translation equivalents for words from the general lexicon using comparable corpora and semantic annotation that is uniform for the source and target languages. For a sentence, phrase or a query expression in the source language the tool detects the semantic type of the… (More)
This paper reports on an experiment in which we explore a new approach to the automatic measurement of multi-word expression (MWE) compositionality. We propose an algorithm which ranks MWEs by their compositionality relative to a semantic field taxonomy based on the Lancaster English semantic lexicon (Piao et al., 2005a). The semantic information provided… (More)