• Publications
  • Influence
Parallel Data, Tools and Interfaces in OPUS
TLDR
This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. Expand
  • 895
  • 103
  • PDF
News from OPUS — A collection of multilingual parallel corpora with tools and interfaces
TLDR
The opus corpus is a growing resource providing various multilingual parallel corpora from different domains. Expand
  • 577
  • 53
  • PDF
OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
TLDR
We present a new major release of the OpenSubtitles collection of parallel corpora. Expand
  • 392
  • 50
  • PDF
Findings of the VarDial Evaluation Campaign 2017
TLDR
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of VarDial workshop at EACL’2017. Expand
  • 135
  • 18
  • PDF
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
TLDR
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial workshop at COLING’2016. Expand
  • 156
  • 16
  • PDF
Merging Comparable Data Sources for the Discrimination of Similar Languages : The DSL Corpus Collection
TLDR
This paper presents the compilation of the DSL corpus collection created for the DSL (Discriminating Similar Languages) shared task to be held at the VarDial workshop at COLING 2014. Expand
  • 84
  • 16
  • PDF
OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora
TLDR
We present a new release of the OpenSubtitles collection of parallel corpora, which is extracted from a total of 3.7 million subtitles spread over 60 languages. Expand
  • 74
  • 16
  • PDF
Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing
TLDR
The focus of this thesis is on re-using translations in natural language processing. Expand
  • 102
  • 12
  • PDF
Identifying idiomatic expressions using automatic word-alignment
TLDR
We investigate whether automatic word-alignment in existing parallel corpora facilitates the classification of candidate expressions along a continuum ranging from literal and transparent expressions to idiomatic and opaque expressions. Expand
  • 101
  • 10
  • PDF
Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF
TLDR
We apply the BiRNN-CRF model for general sequence tagging to joint segmentation and POS tagging for Chinese and achieve state-of-the-art accuracy. Expand
  • 65
  • 9
  • PDF
...
1
2
3
4
5
...