Learn More
The paper proposes a Constrained Entity-Alignment F-Measure (CEAF) for evaluating coreference resolution. The metric is computed by aligning reference and system entities (or coreference chains) with the constraint that a system (reference) entity is aligned with at most one reference (system) entity. We show that the best alignment is a maximum bipartite(More)
This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content(More)
Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronom-inal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical(More)
It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming. This paper presents a method to reduce this demand using active learning, which selects what samples to annotate, instead of annotating blindly the whole training corpus. Sample selection for annotation is based upon "(More)
The definitions of two coreference scoring metrics—B 3 and CEAF—are underspeci-fied with respect to predicted, as opposed to key (or gold) mentions. Several variations have been proposed that manipulate either, or both, the key and predicted mentions in order to get a one-to-one mapping. On the other hand, the metric BLANC was, until recently, limited to(More)
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned(More)
The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank (" CTB " henceforth). Word-based parse trees in CTB are first converted into character-based trees, where word-level part-of-speech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is(More)
We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art Eng-lish-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax based translation system that incorporates(More)