• Publications
  • Influence
Parallel stochastic gradient algorithms for large-scale matrix completion
  • B. Recht, C. Ré
  • Mathematics, Computer Science
  • Math. Program. Comput.
  • 21 April 2013
Jellyfish, an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank, is developed, which is orders of magnitude more efficient than existing codes. Expand
Snorkel: Rapid Training Data Creation with Weak Supervision
Snorkel is a first-of-its-kind system that enables users to train state- of- the-art models without hand labeling any training data and proposes an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. Expand
Data Programming: Creating Large Training Sets, Quickly
A paradigm for the programmatic creation of training sets called data programming is proposed in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. Expand
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
This work presents Tuffy, a scalable Markov Logic Networks framework that achieves scalability via three novel contributions: a bottom-up approach to grounding, a novel hybrid architecture that allows to perform AI-style local search efficiently using an RDBMS, and a theoretical insight that shows when one can improve the efficiency of stochastic local search. Expand
The MADlib Analytics Library or MAD Skills, the SQL
The MADlib project is introduced, including the background that led to its beginnings, and the motivation for its open-source nature, and an overview of the library's architecture and design patterns is provided, and a description of various statistical methods in that context is provided. Expand
HoloClean: Holistic Data Repairs with Probabilistic Inference
A series of optimizations are introduced which ensure that inference over HoloClean's probabilistic model scales to instances with millions of tuples, and yields an average F1 improvement of more than 2× against state-of-the-art methods. Expand
EmptyHeaded: A Relational Engine for Graph Processing
EmptyHeaded is presented, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines, and competes with the best-of-breed low- level engine (Galois), achieving comparable performance on PageRank and at most 3x worse performance on SSSP. Expand
An asynchronous parallel stochastic coordinate descent algorithm
We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate onExpand
DAWNBench : An End-to-End Deep Learning Benchmark and Competition
DAWNBench is introduced, a benchmark and competition focused on end-to-end training time to achieve a state-of-the-art accuracy level, as well as inference with that accuracy, and will provide a useful, reproducible means of evaluating the many tradeoffs in deep learning systems. Expand
Towards a unified architecture for in-RDBMS analytics
This work proposes a unified architecture for in-database analytics that requires changes to only a few dozen lines of code to integrate a new statistical technique, and demonstrates the feasibility of this architecture by integrating several popular analytics techniques into two commercial and one open-source RDBMS. Expand