Corpus ID: 1607756

Design and Development of a Named Entity Recognizer for an Agglutinative Language

@inproceedings{Alegria2004DesignAD,
  title={Design and Development of a Named Entity Recognizer for an Agglutinative Language},
  author={I. Alegria and Olatz Arregi and Irene Balza and N. Ezeiza and Izaskun Fern{\'a}ndez and R. Urizar},
  year={2004}
}
This paper presents the conclusions reached from the development of a system for Named Entity recognition in written Basque. The system was designed in four steps: first, the development of a recognizer based on linguistic information represented on finitestate-transducers; second, the generation of semi-automatically annotated corpora from the result of these transducers; third, the achievement of the best possible recognizer by training different ML techniques on these corpora; and finally… Expand

Tables from this paper

Ihardetsi : A Question Answering system for Basque built on reused linguistic processors
This paper presents Ihardetsi, a question answering system oriented to Basque. We describe the main architecture of the system, paying special attention to the use of linguistic resources and tools.Expand
Simple or Complex? Assessing the readability of Basque Texts
TLDR
A readability assessment system for Basque, ErreXail, is presented, which is going to be the preprocessing module of a Text Simplification system, and it detects the features that perform best and the most predictive ones. Expand
Using Machine Learning Techniques to Build a Comma Checker for Basque
TLDR
The research using machine learning techniques to build a comma checker to be integrated in a grammar checker for Basque is described and it is shown that these results can be improved using a bigger and a more homogeneous corpus to train. Expand
Ihardetsi: A Basque Question Answering System at QA@CLEF 2008
TLDR
Ihardetsi, a question answering system for Basque, is described, a machine translation system that first processes a question in the source language, then translates it into the target language and sends the obtained Basque question as input to the monolingual module. Expand
A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings
TLDR
A comprehensive comparison of state-of-theart multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging is performed and a new method for creating multilingual contextualized word embeddings is proposed. Expand
Information Retrieval and Information Extraction for Less Resourced Languages Ie-ir-lrl Cover Design: Xabier Artola and Maite Oronoz Information Retrieval and Information Extraction for Less Resourced Languages Ie-ir-lrl Invited Talk Regular Papers Ihardetsi: a Question Answering System for Basque B
This presentation is intended to provide some background information as well as a broader picture of some of the issues involved in developing language technology – especially information extractionExpand
Document Expansion for Cross-Lingual Passage Retrieval
TLDR
The participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF 2010 shows that IR provides good results in the monolingual task, that the performance drop in the cross-lingual system was much greater than in previous CLIR experiments, and that expansion improves the results in. Expand
Ihardetsi Question Answering System at QA@CLEF 2008
TLDR
IHARDETSI, a question answering system for Basque, is described, a machine translation system that process a question in the source language, translates into the target language and sends the obtained Basque question to Ihardetsi system. Expand
Elhuyar-IXA: Semantic Relatedness and Cross-lingual Passage Retrieval
TLDR
The participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF shows that IR provides good results in the monolingual task, that the crosslingual system performs lower than themonolingual runs, and that semantic relatedness improves the results in both tasks. Expand
An XML Framework for a Basque Question Answering System
TLDR
This paper presents a general platform for a Basque monolingual question answering (QA) system, paying special attention to: the integration of the development and evaluation environments, and the systematic use of XML declarative files to control the execution of the modules and the communication between them. Expand
...
1
2
...

References

SHOWING 1-10 OF 15 REFERENCES
Named Entity Recognition and Classification for texts in Basque 1
This paper presents a system for Named Entity (NE) recognition in written Basque to be used in a CLIR application. Being an agglutinative language, Basque has highly inflected forms, so a previousExpand
Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are ConstraintExpand
Named Entity Recognition Using a Character-based Probabilistic Approach
TLDR
A named entity recognition and classification system that uses only probabilistic character-level features and performs a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. Expand
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
TLDR
The CoNLL-2003 shared task: language-independent named entity recognition is described and a general overview of the systems that have taken part in the task and discuss their performance is presented. Expand
A Simple Named Entity Extractor using AdaBoost
TLDR
The system presented here consists of a replication, with some minor changes, of the system that obtained the best results in the CoNLL-2002 NEE task, and can be considered as a benchmark of the state–of–the– art technology for the current edition, and will allow also to make comparisons about the training corpora of both editions. Expand
Overview of MUC-7
The task of Coreference (CO) had its origins in Semeval, an attempt after MUC-5 to define semantic research tasks that needed to be solved to be successful at generating scenario templates. In theExpand
Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition
TLDR
The CoNLL-2002 shared task: language-independent named entity recognition is described and a general overview of the systems that have taken part in the task and discuss their performance is presented. Expand
Finite State Morphology
TLDR
This volume is a practical guide to finite-state theory and the affiliated programming languages lexc and xfst, and readers will learn how to write tokenizers, spelling checkers, and especially morphological analyzer/generators for words in English, French, Finnish, Hungarian and other languages. Expand
Description of the LTG System Used for MUC-7
TLDR
The basic building blocks in this system are reusable text handling tools which are modular tools with stream input/output; each tool does a very speci c job, but can be combined with other tools in a unix pipeline. Expand
Tackling the Poor Assumptions of Naive Bayes Text Classifiers
TLDR
This paper proposes simple, heuristic solutions to some of the problems with Naive Bayes classifiers, addressing both systemic issues as well as problems that arise because text is not actually generated according to a multinomial model. Expand
...
1
2
...