Learn More
Traditional learning-based coreference re-solvers operate by training a mention-pair classifier for determining whether two mentions are coreferent or not. Two independent lines of recent research have attempted to improve these mention-pair classifiers, one by learning a mention-ranking model to rank preceding mentions for a given anaphor, and the other by(More)
While world knowledge has been shown to improve learning-based coreference resolvers, the improvements were typically obtained by incorporating world knowledge into a fairly weak baseline resolver. Hence, it is not clear whether these benefits can carry over to a stronger baseline. Moreover, since there has been no attempt to apply different sources of(More)
Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP(More)
We examine the task of resolving complex cases of definite pronouns, specifically those for which traditional linguistic constraints on coreference (e.g., Binding Constraints, gender and number agreement) as well as commonly-used resolution heuristics (e.g., string-matching facilities, syntactic salience) are not useful. Being able to solve this task has(More)
An entity in a dialogue may be old, new, or mediated/inferrable with respect to the hearer's beliefs. Knowing the information status of the entities participating in a dialogue can therefore facilitate its interpretation. We address the under-investigated problem of automatically determining the information status of discourse entities. Specifically, we(More)
Research in named entity recognition and mention detection has typically involved a fairly small number of semantic classes, which may not be adequate if semantic class information is intended to support natural language applications. Motivated by this observation, we examine the under-studied problem of semantic sub-type induction, where the goal is to(More)
While information status (IS) plays a crucial role in discourse processing, there have only been a handful of attempts to automatically determine the IS of discourse entities. We examine a related but more challenging task, fine-grained IS determination, which involves classifying a discourse entity as one of 16 IS subtypes. We investigate the use of rich(More)
We investigate new methods for creating and applying ensembles for coreference resolution. While existing ensembles for coreference resolution are typically created using different learning algorithms , clustering algorithms or training sets, we harness recent advances in coreference modeling and propose to create our ensemble from a variety of supervised(More)
Named entity recognition for morphologically rich, case-insensitive languages, including the majority of semitic languages, Iranian languages, and Indian languages, is inherently more difficult than its English counterpart. Worse still, progress on machine learning approaches to named entity recognition for many of these languages is currently hampered by(More)
To build a coreference resolver for a new language, the typical approach is to first coreference-annotate documents from this target language and then train a resolver on these annotated documents using supervised learning techniques. However, the high cost associated with manually coreference-annotating documents needed by a supervised approach makes it(More)