Use of named entity recognition and co-reference resolution tools for segmenting english texts

Abstract

In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English corpus used for text segmentation. The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document. NER was performed in the English corpus in two ways i.e., a) by using already available NER and co-reference resolution tools, b) by manually annotating text to cover four types of named entities and substituting every reference of the same instance with the same named entity identifier. The benefit of performing manual annotation instead of using a combination of already existing tools was performed by using two well known text segmentation algorithms. The comparison leads to the conclusion that, the benefit highly depends on the segment's topic and length, the number of named entity instances appearing in it, as well as the model in which each NER and co-reference resolution tool was trained to.

DOI: 10.1145/2801948.2802004

Extracted Key Phrases

1 Figure or Table

Cite this paper

@inproceedings{Fragkou2015UseON, title={Use of named entity recognition and co-reference resolution tools for segmenting english texts}, author={Pavlina Fragkou}, booktitle={Panhellenic Conference on Informatics}, year={2015} }