Learn More
We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces the best results to date on the MUC6 and MUC-7 coreference resolution data sets — F-measures of 70.4 and 63.4, respectively. Improvements arise from two sources: extra-linguistic changes to the learning framework and a large-scale(More)
This paper examines two problems in document-level sentiment analysis: (1) determining whether a given document is a review or not, and (2) classifying the polarity of a review as positive or negative. We first demonstrate that review identification can be performed with high accuracy using only unigrams as features. We then examine the role of four types(More)
We present a supervised learning approach to identification of anaphoric and non-anaphoric noun phrases and show how such information can be incorporated into a coreference resolution system. The resulting system outperforms the best MUC-6 and MUC-7 coreference resolution systems on the corresponding MUC coreference data sets, obtaining F-measures of 66.2(More)
Knowledge of the anaphoricity of a noun phrase might be profitably exploited by a coreference system to bypass the resolution of non-anaphoric noun phrases. Perhaps surprisingly, recent attempts to incorporate automatically acquired anaphoricity information into coreference systems, however, have led to the degradation in resolution performance. This paper(More)
In this paper, we view coreference resolution as a problem of ranking candidate partitions generated by different coreference systems. We propose a set of partition-based features to learn a ranking model for distinguishing good and bad partitions. Our approach compares favorably to two state-of-the-art coreference systems when evaluated on three standard(More)
We present a generative model for unsupervised coreference resolution that views coreference as an EM clustering process. For comparison purposes, we revisit Haghighi and Klein’s (2007) fully-generative Bayesian model for unsupervised coreference resolution, discuss its potential weaknesses and consequently propose three modifications to their model.(More)
State-of-the-art approaches for unsupervised keyphrase extraction are typically evaluated on a single dataset with a single parameter setting. Consequently, it is unclear how effective these approaches are on a new dataset from a different domain, and how sensitive they are to changes in parameter settings. To gain a better understanding of state-of-the-art(More)
Most machine learning solutions to noun phrase coreference resolution recast the problem as a classification task. We examine three potential problems with this reformulation, namely, skewed class distributions, the inclusion of “hard” training instances, and the loss of transitivity inherent in the original coreference relation. We show how these problems(More)