Open Information Extraction from the Web
- Michele Banko, Michael J. Cafarella, S. Soderland, M. Broadhead, Oren Etzioni
- Computer ScienceCACM
- 6 January 2007
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced.
The Tradeoffs Between Open and Traditional Relation Extraction
A new model for Open IE called O-CRF is presented and it is shown that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system.
Scaling to Very Very Large Corpora for Natural Language Disambiguation
This paper examines methods for effectively exploiting very large corpora when labeled data comes at a cost, and evaluates the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambigsuation.
Web question answering: is more always better?
- S. Dumais, Michele Banko, E. Brill, Jimmy J. Lin, Andrew Y. Ng
- Computer ScienceAnnual International ACM SIGIR Conference on…
- 11 August 2002
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online, and uses the redundancy available in large corpora as an important resource to simplify the query rewrites and support answer mining from returned snippets.
Headline Generation Based on Statistical Translation
- Michele Banko, Vibhu Mittal, M. Witbrock
- Computer ScienceAnnual Meeting of the Association for…
- 3 October 2000
This paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus.
An Analysis of the AskMSR Question-Answering System
- E. Brill, S. Dumais, Michele Banko
- Computer ScienceConference on Empirical Methods in Natural…
- 6 July 2002
The architecture of the AskMSR question answering system is described and contributions of different system components to accuracy are evaluated and strategies for predicting when the question Answer system is likely to give an incorrect answer are explored.
Data-Intensive Question Answering
- E. Brill, Jimmy J. Lin, Michele Banko, S. Dumais, Andrew Y. Ng
- Computer ScienceText Retrieval Conference
Utilisation de la redondance des reponses elles-memes pour ameliorer le resultat final de la recherche d'information- redondance due a la tres grande quantite d'informations disponibles actuellement
TextRunner: Open Information Extraction on the Web
- A. Yates, Michele Banko, M. Broadhead, Michael J. Cafarella, Oren Etzioni, S. Soderland
- Computer ScienceNorth American Chapter of the Association for…
- 23 April 2007
The TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input.
Part-of-Speech Tagging in Context
- Michele Banko, Robert C. Moore
- Computer ScienceInternational Conference on Computational…
- 23 August 2004
A new HMM tagger is presented that exploits context on both sides of a word to be tagged, and it is shown how this new tagger achieves state-of-the-art results in a supervised, non-training intensive framework.
- Oren Etzioni, Michele Banko, Michael J. Cafarella
- Computer ScienceAAAI Conference on Artificial Intelligence
- 16 July 2006
This paper investigates how to leverage advances in machine learning and probabilistic reasoning to understand text.