Anders Nøklestad

Learn More
Automatic markup and editing of anaphora and coreference is performed within one system. The processing is trained using memory based learning, and representations derive from various lexical resources. The current model reaches an expected combined precision and recall of F=62. The further improvement of the coreference detection is work in progress.(More)
We describe a web-based corpus query system, Glossa, which combines the expressiveness of regular query languages with the user-friendliness of a graphical interface. Since corpus users are usually linguists with little interest in technical matters, we have developed a system where the user need not have any prior knowledge of the search system.(More)
We describe the development of a database containing informant judgments on a range of test sentences. The database is intended as a research resource for linguists interested in morphosyntactic variation across Scandina-vian dialects. We present the data types contained in the base, and how they are used to create a user-friendly search interface. The(More)
A general purpose text corpus meant for linguists and lexicographers needs to satify quality criteria at at least four different levels. The first two criteria are fairly well established; the corpus should have a wide variety of texts and be tagged according to a fine-grained system. The last two criteria are much less widely appreciated, unfortunately.(More)
This paper describes the Nordic Dialect Corpus, a corpus that consists of transcribed spoken dialects, with sound and video, from five North European languages (Danish, Faroese, Finnish, Icelandic, Norwegian and Swedish). The paper focuses on recent developments that have been added as a result of wishes expressed by the linguist users. These include map(More)