Maria Sukhareva

Learn More
Focused retrieval (a.k.a., passage retrieval) is important at its own right and as an intermediate step in question answering systems. We present a new Web-based collection for focused retrieval. The document corpus is the Category A of the ClueWeb12 collection. Forty-nine queries from the educational domain were created. The $100$ documents most highly(More)
This paper describes a novel approach to find evidence for implicit semantic roles. Our data-driven models generalize over large amounts of explicit annotations only, in order to acquire information about implicit roles. We establish a generic background knowledge base of probablistic predicate-role co-occurrences in an unsu-pervised manner, and estimate(More)
We provide an overview of ongoing efforts to facilitate the study of older Germanic languages currently pursued at the We describe created resources, such as a parallel corpus of Germanic Bibles and a morphosyntactically annotated corpus of Old High German (OHG) and Old Saxon, a lexicon of OHG in XML and a multilingual etymological database. We discuss NLP(More)
Ancient corpora contain various multilingual patterns. This imposes numerous problems on their manual annotation and automatic processing. We introduce a lexicon building system, called Lexicon Expander, that has an integrated language detection module, Language Detection (LD) Toolkit. The Lexicon Expander post-processes the output of the LD Toolkit which(More)
  • 1