• Publications
  • Influence
Inverted indexing for cross-lingual NLP
We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasetsExpand
  • 85
  • 9
Lemmatization and Morphosyntactic Tagging of Croatian and Serbian
We investigate state-of-the-art statistical models for lemmatization and morphosyntactic tagging of Croatian and Serbian. The models stem from a new manually annotated SETIMES.HR corpus of Croatian,Expand
  • 53
  • 9
Treebank Translation for Cross-Lingual Parser Induction
Cross-lingual learning has become a popular approach to facilitate the development of resources and tools for low density languages. Its underlying idea is to make use of existing tools andExpand
  • 53
  • 5
New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
In this paper we present newly developed inflectional lexcions and manually annotated corpora of Croatian and Serbian. We introduce hrLex and srLex—two freely available inflectional lexicons ofExpand
  • 36
  • 4
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The modelExpand
  • 21
  • 4
Parsing Universal Dependencies without training
We propose UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of head attachment rules. It features two-step decoding to guaranteeExpand
  • 15
  • 4
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel – languages for which nothing but a translation of parts of the Bible exists. ByExpand
  • 53
  • 3
JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages
Viable cross-lingual transfer critically depends on the availability of parallel texts. Shortage of such resources imposes a development and evaluation bottleneck in multilingual processing. WeExpand
  • 27
  • 3
Croatian Dependency Treebank: Recent Development and Initial Experiments
We present the current state of development of the Croatian Dependency Treebank – with special empahsis on adapting the Prague Dependency Treebank formalism to Croatian language specifics – andExpand
  • 23
  • 3
Baselines and test data for cross-lingual inference
The recent years have seen a revival of interest in textual entailment, sparked by i) the emergence of powerful deep neural network learners for natural language processing and ii) the timelyExpand
  • 13
  • 3