Entity-Based Cross-Document Coreferencing Using the Vector Space Model

  title={Entity-Based Cross-Document Coreferencing Using the Vector Space Model},
  author={Amit Bagga and Breck Baldwin},
Cross-document coreference occurs when the same person, place, event, or concept is discussed in more than one text source. Computer recognition of this phenomenon is important because it helps break "the document boundary" by allowing a user to examine information about a particular entity from multiple text sources at the same time. In this paper we describe a cross-document coreference resolution algorithm which uses the Vector Space Model to resolve ambiguities between people having the… 
A system for cross-document coreference developed in FBK that, differently for other systems needing a fixed threshold to group names referring to the same entity, tries first to guess the correct number of entities by evaluating the clusters quality, and then to co-refer person names to these entities.
Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the Cross Document Coreference task, and proposes to use ranked categories as coreference information, particularly suitable for web documents that are widely different in style and content.
A Methodology for Cross-document Coreference Cross-document Coreference: the Problem Architecture and the Methodology
This section describes a cross-document coreference resolution system that resolves coreferences for both entities and events using the Vector Space Model, built upon the University of Pennsylvania's within document coreference system, CAMP.
Cross-Document Coreference Resolution Based on Automatic Text Summary
This approach extracts query-specific and informative-indicative summary from the original text by using Hobbs algorithm and measure the similarity between two summaries to propose an automatic text summary-based cross-document coreference resolution (ATSCDCR) system.
Cross-Document Coreference on a Large Scale Corpus
This paper created entity models for different test sets and compared the following disambiguation and clustering techniques to cluster the entity models in order to create coreference chains: Incremental Vector Space, KL-Divergence, Agglomerative Vector Space.
A Study of the Effect of Document Representations in Clustering-Based Cross-Document Coreference Resolution
  • Horacio Saggion
  • Computer Science
    Multi-source, Multilingual Information Extraction and Summarization
  • 2013
This work describes experiments aiming at identifying the contribution of semantic information and summarization in a cross-document coreference resolution system that uses a clustering-based algorithm to group documents referring to the same entity.
Streaming Cross Document Entity Coreference Resolution
This paper explores identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information and shows that the approach scales to at least an order of magnitude larger data than previous reported methods.
Person Cross Document Coreference with Name Perplexity Estimates
It is shown that the amount of context required can be dynamically controlled on the basis of the prior probabilities of coreference and a new statistical model is presented for the computation of these probabilities.
Experiments on Semantic-based Clustering for Cross-document Coreference
This work describes clustering experiments for cross-document coreference for the first Web People Search Evaluation and presents an analysis of the impact that semantic information and text summarization have in the clustering process.
Cross document person name disambiguation using entity profiles
Novel features based on topic models and also document-level entity profiles—sets of information that are collected for each ambiguous person in the entire document are introduced.


How Much Processing Is Required for Cross-Document Coreference?
This paper describes and compares the position with that of the MUC-6 organizing committee regarding the amount of processing needed to resolve cross-document coreferences, and back the position by providing details of the cross- document coreference system.
A model-theoretic coreference scoring scheme
This note describes a scoring scheme for the coreference task in MUC6. It improves on the original approach by: (1) grounding the scoring scheme in terms of a model; (2) producing more intuitive
Algorithms for Scoring Coreference Chains
This paper presents several diierent scoring algorithms and detail their respective strengths and weaknesses for varying classes of processing and demonstrates that tasks like information extraction have very diiesrent needs from information retrieval in terms of how to score the performance of coreference annotation.
University of Pennsylvania: Description of the University of Pennsylvania System Used for MUC-6
An intensive effort with full-time participation from Baldwin and Reynar, and part-time efforts from the other authors was began, and a simplistic coreference resolution system which resolved proper nouns by means of string matching was implemented.
Whither Written Language Evaluation?
Common evaluations have grown to be a major component of all the ARPA Human Language Technology programs and have been a major impetus in the development of systems for performing such "information extraction" tasks, and thus in demonstrating the potential practical value of some of the written language processing technology.
How Much Processing Is Required for Cross-Document Corderence? rio appear at The First International Conference on Language Resources and Evaluation on Linguistics Coreferenee
  • 1998
Algorithms for Scoring Coreference Chains. To appear at The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference
  • Algorithms for Scoring Coreference Chains. To appear at The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference
  • 1998