• Publications
  • Influence
Open Information Extraction from the Web
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
The Tradeoffs Between Open and Traditional Relation Extraction
A new model for Open IE called O-CRF is presented and it is shown that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system. Expand
Scaling to Very Very Large Corpora for Natural Language Disambiguation
This paper examines methods for effectively exploiting very large corpora when labeled data comes at a cost, and evaluates the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambigsuation. Expand
Web question answering: is more always better?
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online, and uses the redundancy available in large corpora as an important resource to simplify the query rewrites and support answer mining from returned snippets. Expand
Headline Generation Based on Statistical Translation
This paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus. Expand
Data-Intensive Question Answering
Utilisation de la redondance des reponses elles-memes pour ameliorer le resultat final de la recherche d'information- redondance due a la tres grande quantite d'informations disponibles actuellement
TextRunner: Open Information Extraction on the Web
The TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input. Expand
An Analysis of the AskMSR Question-Answering System
The architecture of the AskMSR question answering system is described and contributions of different system components to accuracy are evaluated and strategies for predicting when the question Answer system is likely to give an incorrect answer are explored. Expand
Part-of-Speech Tagging in Context
A new HMM tagger is presented that exploits context on both sides of a word to be tagged, and it is shown how this new tagger achieves state-of-the-art results in a supervised, non-training intensive framework. Expand
Machine Reading
This paper investigates how to leverage advances in machine learning and probabilistic reasoning to understand text. Expand