• Publications
  • Influence
GoEmotions: A Dataset of Fine-Grained Emotions
TLDR
GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral is introduced, and the high quality of the annotations via Principal Preserved Component Analysis is demonstrated.
Analysis of the reputation system and user contributions on a question answering website: StackOverflow
TLDR
A study of the popular Q&A website StackOverflow, in which users ask and answer questions about software development, algorithms, math and other technical topics, finds that while the majority of questions on the site are asked by low reputation users, on average a high reputation user asks more questions than a user with low reputation.
Bugram: Bug detection with n-gram language models
TLDR
This paper proposes a new approach - Bugram - that leverages n-gram language models instead of rules to detect bugs, and suggests that Bugram is complementary to existing rule-based bug detection approaches.
Natural Language Models for Predicting Programming Comments
TLDR
This work predicts comments from JAVA source files of open source projects, using topic models and n-grams, and analyzes the performance of the models given varying amounts of background data on the project being predicted.
Graph Agreement Models for Semi-Supervised Learning
TLDR
This work proposes Graph Agreement Models (GAM), which introduces an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features, and achieves state-of-the-art results on semi-supervised learning datasets.
Can self‐inhibitory peptides be derived from the interfaces of globular protein–protein interactions?
TLDR
This study assesses on a large scale the possibility of deriving self‐inhibitory peptides from protein domains with globular architectures and provides an elaborate framework for the in silico selection of candidate inhibitory molecules for protein–protein interactions.
Detection of peptide‐binding sites on protein surfaces: The first step toward the modeling and targeting of peptide‐mediated interactions
TLDR
The PeptiMap protocol, a protocol for the accurate mapping of peptide binding sites on protein structures, is presented, based on experimental evidence that peptide‐binding sites also bind small organic molecules of various shapes and polarity.
KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts
TLDR
This work proposes an unsupervised model that jointly learns a latent ontological structure of an input corpus, and identifies facts from the corpus that match the learned structure.
Bootstrapping Biomedical Ontologies for Scientific Text using NELL
TLDR
An open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner), with significant improvements over NELL's original bootstrapping algorithm on two types of tasks: learning terms from biomedical categories, and named-entity recognition for biomedical entities using a learned lexicon.
...
1
2
...