Learn More
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling , a simple Monte Carlo method used to perform approximate(More)
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design,(More)
Discriminative feature-based methods are widely used in natural language processing, but sentence parsing is still dominated by gen-erative methods. While prior feature-based dynamic programming parsers have restricted training and evaluation to artificially short sentences , we present the first general, feature-rich discriminative parser, based on a(More)
We describe a machine learning system for the recognition of names in biomedical texts. The system makes extensive use of local and syntactic features within the text, as well as external resources including the web and gazetteers. It achieves an F-score of 70% on the Coling 2004 NLPBA/BioNLP shared task of identifying five biomedical named entities in the(More)
Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. This system was(More)
We present a machine learning approach to robust textual inference, in which parses of the text and the hypothesis sentences are used to measure their asymmetric " similarity " , and thereby to decide if the hypothesis can be inferred. This idea is realized in two different ways. In the first, each sentence is represented as a graph (extracted from a(More)