• Publications
  • Influence
Open Information Extraction from the Web
TLDR
This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. Expand
Unsupervised named-entity extraction from the Web: An experimental study
TLDR
The KnowItAll system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, scalable manner. Expand
WebTables: exploring the power of tables on the web
TLDR
We extracted 14.1 billion HTML tables from Google's general-purpose web crawl, and used statistical classification techniques to find the estimated 154M that contain high-quality relational data. Expand
Web-scale information extraction in knowitall: (preliminary results)
TLDR
This paper introduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner. Expand
Data Integration for the Relational Web
TLDR
This paper describes Octopus, a system that combines search, extraction, data cleaning and integration, and enables users to create new data sets from those found on the Web. Expand
TextRunner: Open Information Extraction on the Web
TLDR
We demonstrate a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input. Expand
Uncovering the Relational Web
TLDR
We extracted 14.1 billion HTML ta- bles from a several-billion-page portion of Google's general-purpose web crawl, and estimate that 154 million of these tables contain high-quality relational-style data. Expand
Automatic web spreadsheet data extraction
TLDR
This paper introduces a system that automatically extracts relational data from spreadsheets, thereby enabling relational spreadsheet integration. Expand
KnowItNow: Fast, Scalable Information Extraction from the Web
TLDR
Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. Expand
Automatic Optimization for MapReduce Programs
TLDR
This paper covers Manimal, which automatically analyzes MapReduce programs and applies appropriate data-aware optimizations, thereby requiring no additional help at all from the programmer. Expand
...
1
2
3
4
5
...