• Publications
  • Influence
On the Opportunities and Risks of Foundation Models
TLDR
This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities, to their applications, and what they are even capable of due to their emergent properties.
Data Noising as Smoothing in Neural Network Language Models
TLDR
This paper derives a connection between input noising in neural network language models and smoothing in $n$-gram models and draws upon ideas from smoothing to develop effective noising schemes.
DisSent: Learning Sentence Representations from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT.
DisSent: Sentence Representation Learning from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TLDR
Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
Pragmatic Issue-Sensitive Image Captioning
TLDR
The Issue-Sensitive Image Captioning (ISIC) model, built on top of state-of-the-art pretrained neural image captioners and explicitly uses image partitions to control caption generation, generates captions that are descriptive and issue-sensitive.
DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain
TLDR
A deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free text notes and enables automated disease annotation across a broad range of clinical diagnoses with minimal pre-processing.
DeepTag: inferring diagnoses from veterinary clinical notes
TLDR
A deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes and enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing.
LitGen: Genetic Literature Recommendation Guided by Human Explanations
TLDR
This work proposes the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity, and uses semi-supervised deep learning to predict the type of evidence provided by each paper.
VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
TLDR
A large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text and shows that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and adds insights into the power of unsupervised learning for clinical natural language processing.
...
...