• Publications
  • Influence
Snorkel: Rapid Training Data Creation with Weak Supervision
TLDR
We present Snorkel, a first- of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Expand
  • 305
  • 40
  • PDF
SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data
TLDR
We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data. Expand
  • 31
  • 10
  • PDF
Monitoring hand hygiene via human observers: how should we be sampling?
OBJECTIVE To explore how hand hygiene observer scheduling influences the number of events and unique individuals observed. DESIGN We deployed a mobile sensor network to capture detailed movementExpand
  • 43
  • 3
  • PDF
Data programming with DDLite: putting humans in a different part of the loop
TLDR
We introduce DDLite, an interactive development framework for data programming, and report feedback collected from users across a diverse set of entity extraction tasks. Expand
  • 18
  • 1
  • PDF
Estimating the efficacy of symptom-based screening for COVID-19
There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity ofExpand
  • 6
  • 1
Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences
TLDR
We develop a weakly supervised deep learning model for classification of aortic valve malformations using up to 4,000 unlabeled cardiac MRI sequences from the UK biobank. Expand
  • 25
  • PDF
Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint Inference for Clinical Temporal Information Extraction
TLDR
We find that a joint inference-based approach using structured prediction outperforms a vanilla recurrent neural network that incorporates word embeddings trained on a variety of large clinical document sets. Expand
  • 26
  • PDF
ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information
TLDR
We present ShortFuse, a method that boosts the accuracy of deep learning models for time series by explicitly modeling temporal interactions and dependencies with structured covariates. Expand
  • 12
  • PDF
Medical device surveillance with electronic health records
TLDR
We developed and validated state-of-the-art deep learning methods that identify patient outcomes from electronic health records without requiring hand-labeled training data. Expand
  • 8
  • PDF
Assessing the accuracy of automatic speech recognition for psychotherapy
TLDR
We show that automatic speech recognition is feasible in psychotherapy, but further improvements are needed before widespread use. Expand
  • 4