Learn More
This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classi-fiers (robust linear classifier, maximum en-tropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined(More)
Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronom-inal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical(More)
Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Na¨ıve Bayes, Cosine, Decision(More)
In this paper we give an overview of the Tri-lingual Entity Discovery and Linking task at the Knowledge Base Population (KBP) track at TAC2015. In this year we introduced a new end-to-end Tri-lingual entity discovery and linking task which requires a system to take raw texts from three languages (English, Chinese and Spanish) as input, automatically extract(More)
This paper presents a comprehensive empirical exploration and evaluation of a diverse range of data characteristics which influence word sense disambiguation performance. It focuses on a set of six core supervised algorithms, including three variants of Bayesian classifiers, a cosine model, non-hierarchical decision lists, and an extension of the(More)
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned(More)
Information extraction is one of the fundamentally important tasks in Natural Language Processing , and as such it has been the subject of many evaluations and competitions. The lat-est such evaluation, the Knowledge Base Population (KBP) part of the Text Analysis Conference 2010, is focusing on two aspects: entity linking and slot filling. This paper(More)
We consider the problem of using sentence compression techniques to facilitate query-focused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search de-coder is proposed to efficiently find highly probable(More)