Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

@inproceedings{Hornby2017CorpusSA,
  title={Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies},
  author={Ryan Hornby and Clark Taylor and Jungyeul Park},
  booktitle={CoNLL Shared Task},
  year={2017}
}
This paper describes UALing’s approach to the CoNLL 2017 UD Shared Task using corpus selection techniques to reduce training data size. The methodology is simple: We use similarity measures to select a corpus from available training data (even from multiple corpora for surprise languages) and use the resulting corpus to complete the parsing task. The training and parsing is done with the baseline UDPipe system (Straka et al., 2016). While our approach reduces the size of training data… CONTINUE READING