Proper Name Extraction from Non-Journalistic Texts
@inproceedings{Poibeau2000ProperNE, title={Proper Name Extraction from Non-Journalistic Texts}, author={T. Poibeau and Leila Kosseim}, booktitle={The Clinician}, year={2000} }
This paper discusses the influence of the corpus on the automatic identification of proper names in texts. Techniques developed for the newswire genre are generally not sufficient to deal with larger corpora containing texts that do not follow strict writing constraints (for example, e-mail messages, transcriptions of oral conversations, etc). After a brief review of the research performed on news texts, we present some of the problems involved in the analysis of two different corpora: e-mails…
Tables from this paper
117 Citations
MODERN STATISTICAL AND LINGUISTIC APPROACHES TO PROCESSING TEXTS IN NATURAL LANGUAGES
- Computer Science
- 2016
The aim of this paper is to provide an overview of modern approaches to text processing using the example of the tasks of named entities recognition and identifying the relationships between them.
Unsupervised Extraction of Keywords from News Archives
- EconomicsLTC
- 2009
A comparison of four unsupervised algorithms to automatically acquire the set of keywords that best characterise a particular multimedia archive: the Belga News Archive shows that the most successful algorithm is TextRank, derived from Google's PageRank.
Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection
- Computer ScienceDATeCH
- 2017
Evaluation result of NER with data out of a digitized Finnish historical newspaper collection Digi is reported and a rule-based tagger of Finnish, FiNER, provided by the FIN-CLARIN consortium is evaluated.
”A Novel of Character”: Towards the Automatic Annotation of Characters in a Large Corpus of French Novels
- Computer Science
- 2019
It is shown that the automatic annotation of large literary corpora makes it possible to check whether traditional classifications exhibit specific structural patterns that could be identified automatically.
A Method for Proper Noun Extraction in Kurdish
- LinguisticsSLATE
- 2017
An application based on an architecture which includes a number of name lists, a set of rules, and a setof processes that recognizes Kurdish person names can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation.
Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910
- Computer ScienceDigit. Humanit. Q.
- 2017
First large scale trials and evaluation of NER with data out of a digitized Finnish historical newspaper collection Digi is reported, first published large scale results of N ER in a historical Finnish OCRed newspaper collection.
Modern Tools for Old Content - in Search of Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910
- Computer ScienceLWDA
- 2016
First trials and evaluation of NER with data out of a digitized Finnish historical newspaper collection Digi shows that at best about half of named entities can be recognized even in a quite erroneous OCRed text.
Kalpa Publications in Computing
- Computer Science
- 2019
This paper describes some of the main ideas towards a method to associate locations with geographical data removing possible confusion between entities with the same name, and describes the research proposal focusing in ambiguity detection.
Name identification and extraction with formal concept analysis
- Computer ScienceInt. J. Mach. Learn. Cybern.
- 2017
This paper describes how FCA identifies and extracts personal names as units of thought similar to the decoding of text sequences by Viterbi algorithm as used with Hidden Markov Models.
Name identification and extraction with formal concept analysis
- Computer ScienceInternational Journal of Machine Learning and Cybernetics
- 2016
This paper describes how FCA identifies and extracts personal names as units of thought similar to the decoding of text sequences by Viterbi algorithm as used with Hidden Markov Models.
28 References
NAMED ENTITY EXTRACTION FROM SPEECH
- Computer Science
- 1998
A hidden Markov model is used to extract information from broadcast news with encouraging result that a language-independent, trainable information extraction algorithm degraded on speech input at most by the word error rate of the recognizer.
Using Collocation Statistics in Information Extraction
- Computer ScienceMUC
- 1998
The main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction, which refers to the frequency counts of the collocational relations extracted from a parsed corpus.
Named Entity Extraction from Broadcast News
- Computer Science
- 1999
This paper explores the effects of word error rate, loss of textual clues, amount of training data, changes in guidelines, and out-of-vocabulary errors in the context of the Hub4e-IE evaluation.
Locating Noun Phrases with Finite State Transducers
- Computer ScienceCOLING-ACL
- 1998
We present a method for constructing, maintaining and consulting a database of proper nouns. We describe noun phrases composed of a proper noun and/or a description of a human occupation. They are…
Combining words and prosody for information extraction from speech
- Computer ScienceEUROSPEECH
- 1999
In experiments on the Broadcast News corpus, it is found that prosodic cues alone allow sentence and topic segmentation that is at least as good as word-based methods alone, and that combining both types of cues gives significant wins.
FASTUS: A Finite-state Processor for Information Extraction from Real-world Text
- Computer ScienceIJCAI
- 1993
FASTUS has been evaluated on several blind tests that demonstrate that state-of-the-art performance on information-extraction tasks is obtainable with surprisingly little computational effort.
exibum : Un systeme experimental d'extraction d'information bilingue
- Computer Science
- 1998
The rapid results obtained through this experiment demonstrate the great advantage of system re-use in this domain, and leave us optimistic for the future development of multilingual information extraction systems.
The context of oral and written language: A framework for mode and medium switching
- LinguisticsLanguage in Society
- 1988
ABSTRACT This article demonstrates that our descriptions of orality and literacy – from the traditional dichotomy to the more recent continuum – are inadequate, largely because they are grounded in…
Electric language : A new variety of English
- Sociology
- 1996
Les As. analysent les traits lexicaux et grammaticaux d'un important corpus de messages en Communication Mediatisee par Ordinateur (CMO) envoyes a un systeme electronique de tableau d'affichage au…
MITRE: description of the Alembic system used for MUC-6
- Computer ScienceMUC
- 1995
As with several other veteran MUC participants, MITRE's Alembic system has undergone a major transformation in the past two years. The genesis of this transformation occurred during a dinner…