• Publications
  • Influence
WebTables: exploring the power of tables on the web
TLDR
We extracted 14.1 billion HTML tables from Google's general-purpose web crawl, and used statistical classification techniques to find the estimated 154M that contain high-quality relational data. Expand
  • 591
  • 61
  • PDF
The MADlib Analytics Library or MAD Skills, the SQL
TLDR
MADlib is a free, open-source library of in-database analytic methods that can be installed and executed within a relational database engine that supports extensible SQL. Expand
  • 321
  • 38
  • PDF
Uncovering the Relational Web
TLDR
We extracted 14.1 billion HTML ta- bles from a several-billion-page portion of Google's general-purpose web crawl, and estimate that 154 million of these tables contain high-quality relational-style data. Expand
  • 149
  • 15
  • PDF
Knowledge expansion over probabilistic knowledge bases
TLDR
We present ProbKB, a probabilistic knowledge base designed to infer missing facts in a scalable, probabilistically, and principled manner using a relational DBMS. Expand
  • 59
  • 12
  • PDF
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
TLDR
We introduce BayesStore, a novel probabilistic data management architecture based on a novel, first-order statistical model, and we redefine traditional query processing operators, to manipulate the data and the probabilistics models of the database in an efficient manner. Expand
  • 166
  • 11
  • PDF
Ontological Pathfinding
TLDR
We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques. Expand
  • 37
  • 8
ScaLeKB: scalable learning and inference over large knowledge bases
TLDR
We propose the Ontological Pathfinding (OP) algorithm to mine first-order inference rules from these web knowledge bases. Expand
  • 27
  • 7
Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems
Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style HTML tables can beExpand
  • 49
  • 5
  • PDF
A data science challenge for converting airborne remote sensing data into ecological information
Ecology has reached the point where data science competitions, in which multiple groups solve the same problem using the same data by different methods, will be productive for advancing quantitativeExpand
  • 15
  • 4
  • PDF
Probabilistic Data Management for Pervasive Computing: The Data Furnace Project
TLDR
The wide deployment of wireless sensor and RFID (Radio Frequency IDentification) devices is one of the key enablers for next-generation pervasive computing applications, including large-scale environmental monitoring and control, context-aware computing, and “smart digital homes. Expand
  • 48
  • 3
  • PDF