Maria Sukhareva

Learn More
For the study of historical language varieties, the sparsity of training data imposes immense problems on syntactic annotation and the development of NLP tools that automatize the process. In this paper, we explore strategies to compensate the lack of training data by including data from related varieties in a series of annotation projection experiments(More)
Focused retrieval (a.k.a., passage retrieval) is important at its own right and as an intermediate step in question answering systems. We present a new Web-based collection for focused retrieval. The document corpus is the Category A of the ClueWeb12 collection. Forty-nine queries from the educational domain were created. The $100$ documents most highly(More)
This paper describes a novel approach to find evidence for implicit semantic roles. Our data-driven models generalize over large amounts of explicit annotations only, in order to acquire information about implicit roles. We establish a generic background knowledge base of probablistic predicate-role co-occurrences in an unsu-pervised manner, and estimate(More)
ii Introduction NLP started to use extensively LOD in various scenarios, such as: exploring knowledge datasets (DBPedia, FreeBase, GeoNames, etc.) for annotation and information extraction; publishing language resources as LOD (WordNet, FrameNet, etc.); aggregating of the available data for various tasks (BabelNet, Global WordNet Grid); creation of(More)
We provide an overview of ongoing efforts to facilitate the study of older Germanic languages currently pursued at the We describe created resources, such as a parallel corpus of Germanic Bibles and a morphosyntactically annotated corpus of Old High German (OHG) and Old Saxon, a lexicon of OHG in XML and a multilingual etymological database. We discuss NLP(More)
Ancient corpora contain various multilingual patterns. This imposes numerous problems on their manual annotation and automatic processing. We introduce a lexicon building system, called Lexicon Expander, that has an integrated language detection module, Language Detection (LD) Toolkit. The Lexicon Expander post-processes the output of the LD Toolkit which(More)
In the LOD era, the conceptual interop-erability of language resources is established by using modular architectures like the Ontologies of Linguistic Annotations (Chiarcos, 2008a, OLiA). Available as a part of the Linguistic Linked Open Data (LLOD) cloud, 1 OLiA provides ontological representations of annotation schemes for over 70 languages, as well as(More)
  • 1