• Publications
  • Influence
The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages
TLDR
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature, with an average size of nearly 9 million words per language. Expand
  • 601
  • 86
  • PDF
Parallel corpora for medium density languages
TLDR
A general methodology for rapidly collecting, building, and aligning parallel corpora for medium density languages, illustrating our main points on the case of Hungarian, Romanian, and Slovenian. Expand
  • 388
  • 51
  • PDF
GYDER: Maxent Metonymy Resolution
TLDR
Though the GYDER system has achieved the highest accuracy scores for the metonymy resolution shared task at SemEval-2007 in all six subtasks, we don't consider the results (72.80% accuracy for org, 84.36% for loc) particularly impressive, and argue that metonymmy resolution needs more features. Expand
  • 16
  • 6
  • PDF
DCEP -Digital Corpus of the European Parliament
TLDR
We are presenting a new highly multilingual document-aligned parallel corpus called DCEP - Digital Corpus of the European Parliament. Expand
  • 25
  • 4
  • PDF
Hunmorph: Open Source Word Analysis
TLDR
We added an offline resource management component, hunlex, which complements the efficiency of our runtime layer with a high-level description language and a configurable precompiler. Expand
  • 75
  • 3
  • PDF
Hungarian named entity recognition with a maximum entropy approach
TLDR
We introduce the hunner open source language-independent named entity recognition system, and present results for Hungarian. Expand
  • 21
  • 1
  • PDF
Entropy measures and predictive recognition as mirrored in gating and lexical decision over multimorphemic Hungarian noun forms
Our paper is an attempt to indicate the relevance of information theoretical accounts to understand word recognition and morphological processing in Hungarian, along with other studies using moreExpand
  • 5
  • 1
  • PDF
Web-based frequency dictionaries for medium density languages
TLDR
The paper describes a new, freely available, web-based frequency dictionary of Hungarian that is being used for both purposes and the language-independent techniques used for creating it. Expand
  • 44
  • PDF
Using a morphological analyzer in high precision POS tagging of Hungarian
TLDR
The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. Expand
  • 20
  • PDF
...
1
2
...