• Publications
  • Influence
Open Information Extraction from the Web
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
Identifying Relations for Open Information Extraction
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. Expand
Named Entity Recognition in Tweets: An Experimental Study
The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. Expand
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitraryExpand
Web document clustering: a feasibility demonstration
To satisfy the stringent requirements of the Web domain, an incremental, linear time algorithm called Suffix Tree Clustering (STC) is introduced which creates clusters based on phrases shared between documents, showing that STC is faster than standard clustering methods in this domain. Expand
Extracting Product Features and Opinions from Reviews
Opine is introduced, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products. Expand
Unsupervised named-entity extraction from the Web: An experimental study
An overview of KnowItAll's novel architecture and design principles is presented, emphasizing its distinctive ability to extract information without any hand-labeled training examples, and three distinct ways to address this challenge are presented and evaluated. Expand
Web-scale information extraction in knowitall: (preliminary results)
KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner, is introduced. Expand
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. Expand
Towards a theory of natural language interfaces to databases
This paper proves that, for a broad class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query, and shows that Precise compares favorably with Mooney's learning NLI and with Microsoft's English Query product. Expand