• Publications
  • Influence
The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages
TLDR
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature, with an average size of nearly 9 million words per language. Expand
  • 601
  • 86
  • PDF
DGT-TM: A freely available Translation Memory in 22 languages
TLDR
The European Commission's Directorate General for Translation, together with the EC's Joint Research Centre, is making available a large translation memory (TM; i.e. sentences and their professionally produced translations) covering twenty-two official European Union (EU) languages and their 231 language pairs. Expand
  • 118
  • 22
  • PDF
Sentiment Analysis in the News
TLDR
In this paper, we summarised our insights regarding sentiment classification for news and applied different methods to test the appropriateness of different resources and approaches to the task defined. Expand
  • 232
  • 12
  • PDF
462 Machine Translation Systems for Europe
TLDR
We built 462 machine translation systems for all language pairs of the Acquis Communautaire corpus. Expand
  • 98
  • 11
  • PDF
Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC
TLDR
We are presenting an approach to calculating the semantic similarity of documents written in the same or in different languages. Expand
  • 112
  • 9
  • PDF
Creating sentiment dictionaries via triangulation
TLDR
In this paper, we present results that verify our triangulation hypothesis, by evaluating triangulated lists and comparing them to non-triangulated machine-translated word lists. Expand
  • 112
  • 8
  • PDF
Automatic annotation of multilingual text collections with a conceptual thesaurus
TLDR
This paper presents an almost language-independent system that maps documents written in different languages onto the same multilingual conceptual thesaurus, EUROVOC. Expand
  • 101
  • 8
  • PDF
JRC Eurovoc Indexer JEX - A freely available multi-label categorisation tool
TLDR
JEX is JRC-developed multi-label classification software that learns from manually labelled data to automatically assign EuroVoc descriptors to new documents in a profile-based category-ranking task. Expand
  • 38
  • 8
  • PDF
Experiments to Improve Named Entity Recognition on Turkish Tweets
TLDR
We report on experiments that have the purpose of improving named entity recognition on Turkish tweets, using two different annotated data sets. Expand
  • 37
  • 8
  • PDF
An introduction to the Europe Media Monitor family of applications
TLDR
We present here the four publicly accessible systems of the Europe Media Monitor (EMM) family of applications, which cover between 19 and 50 languages. Expand
  • 90
  • 7
  • PDF
...
1
2
3
4
5
...