• Publications
  • Influence
The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages
TLDR
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature, with an average size of nearly 9 million words per language. Expand
  • 601
  • 86
  • PDF
BalkaNet : Aims , Methods , Results and Perspectives . A General Overview
BalkaNet is an EC funded project (IST-2000-29388) that started in September 2001 and will end in August 2004. It aims at developing [109] aligned wordnets for the following Balkan languages:Expand
  • 152
  • 15
  • PDF
Tagging romanian texts: a case study for QTAG, a language independent probabilistic tagger
TLDR
This paper describes an experiment on tagging Romanian using QTAG, a parts-of-speech tagger that has been developed originally for English, but with a clear separation between the (probabilistic) processing engine and the (language specific)resource data. Expand
  • 99
  • 9
ACCURAT - Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
TLDR
We exploit the fact that non-parallel bi- or multilingual text resources are much more widely available than parallel translation data. Expand
  • 32
  • 6
  • PDF
Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets
TLDR
The paper presents a method for word sense disambiguation based on parallel corpora based on automatic extraction of translation equivalents and being supported by available aligned wordnets. Expand
  • 84
  • 5
  • PDF
Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages
TLDR
The EU Copernicus project Multext-East has created a multi-lingual corpus of text and speech data, covering six languages of the project: Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene. Expand
  • 108
  • 5
  • PDF
The Romanian wordnet in a nutshell
TLDR
The project on the Romanian wordnet has been under continuous development for more than 10 years now and offers quantitative data for its current version. Expand
  • 18
  • 5
Sense Discrimination with Parallel Corpora
TLDR
This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging. Expand
  • 144
  • 4
  • PDF
Improved Lexical Alignment by Combining Multiple Reified Alignments
TLDR
We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence alignment) as required by an accurate word alignment. Expand
  • 42
  • 4
  • PDF
A Collection of Comparable Corpora for Under-resourced Languages
TLDR
This paper presents work on collecting comparable corpora for 9 language pairs. Expand
  • 24
  • 4
  • PDF
...
1
2
3
4
5
...