Share This Author
Snorkel: Rapid Training Data Creation with Weak Supervision
- Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Fries, Sen Wu, C. Ré
- Computer ScienceProc. VLDB Endow.
- 1 November 2017
Snorkel is a first-of-its-kind system that enables users to train state- of- the-art models without hand labeling any training data and proposes an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution.
Parallel stochastic gradient algorithms for large-scale matrix completion
Jellyfish, an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank, is developed, which is orders of magnitude more efficient than existing codes.
Data Programming: Creating Large Training Sets, Quickly
A paradigm for the programmatic creation of training sets called data programming is proposed in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict.
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
This work presents Tuffy, a scalable Markov Logic Networks framework that achieves scalability via three novel contributions: a bottom-up approach to grounding, a novel hybrid architecture that allows to perform AI-style local search efficiently using an RDBMS, and a theoretical insight that shows when one can improve the efficiency of stochastic local search.
HoloClean: Holistic Data Repairs with Probabilistic Inference
A series of optimizations are introduced which ensure that inference over HoloClean's probabilistic model scales to instances with millions of tuples, and yields an average F1 improvement of more than 2× against state-of-the-art methods.
The MADlib Analytics Library or MAD Skills, the SQL
The MADlib project is introduced, including the background that led to its beginnings, and the motivation for its open-source nature, and an overview of the library's architecture and design patterns is provided, and a description of various statistical methods in that context is provided.
EmptyHeaded: A Relational Engine for Graph Processing
- Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, K. Olukotun, C. Ré
- Computer ScienceACM Trans. Database Syst.
- 9 March 2015
EmptyHeaded is presented, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines, and competes with the best-of-breed low- level engine (Galois), achieving comparable performance on PageRank and at most 3x worse performance on SSSP.
An asynchronous parallel stochastic coordinate descent algorithm
- Ji Liu, Stephen J. Wright, C. Ré, Victor Bittorf, S. Sridhar
- Computer ScienceJ. Mach. Learn. Res.
- 8 November 2013
We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on…
DAWNBench : An End-to-End Deep Learning Benchmark and Competition
DAWNBench is introduced, a benchmark and competition focused on end-to-end training time to achieve a state-of-the-art accuracy level, as well as inference with that accuracy, and will provide a useful, reproducible means of evaluating the many tradeoffs in deep learning systems.
Towards a unified architecture for in-RDBMS analytics
This work proposes a unified architecture for in-database analytics that requires changes to only a few dozen lines of code to integrate a new statistical technique, and demonstrates the feasibility of this architecture by integrating several popular analytics techniques into two commercial and one open-source RDBMS.