• Publications
  • Influence
Content-Based Citation Recommendation
We present a content-based method for recommending citations in an academic paper draft. We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerankExpand
  • 38
  • 10
  • PDF
Construction of the Literature Graph in Semantic Scholar
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graphExpand
  • 89
  • 8
  • PDF
Part-of-speech histograms for genre classification of text
This work addresses the problem of classifying the genre of text, which is useful for a variety of language processing problems. We propose statistics of POS histograms as classification features,Expand
  • 27
  • 5
  • PDF
Completely Lazy Learning
Local classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy because theExpand
  • 49
  • 4
  • PDF
Multi-Task Averaging
We present a multi-task learning approach to jointly estimate the means of multiple independent data sets. The proposed multi-task averaging (MTA) algorithm results in a convex combination of theExpand
  • 13
  • 1
  • PDF
Classifying Factored Genres with Part-of-Speech Histograms
This work addresses the problem of genre classification of text and speech transcripts, with the goal of handling genres not seen in training. Two frameworks employing different statistics onExpand
  • 12
  • 1
  • PDF
Citation Count Analysis for Papers with Preprints
We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. A paper's citation count isExpand
  • 10
  • 1
  • PDF
SPECTER: Document-level Representation Learning using Citation-informed Transformers
Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models areExpand
  • 5
  • 1
  • PDF
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
Key Points Question What is the magnitude of female underrepresentation in clinical studies? Findings In this cross-sectional study, machine reading to extract sex data from 43 135 published articlesExpand
  • 9
Precursor charge state prediction for electron transfer dissociation tandem mass spectra.
Electron-transfer dissociation (ETD) induces fragmentation along the peptide backbone by transferring an electron from a radical anion to a protonated peptide. In contrast with collision-inducedExpand
  • 9
  • PDF