• Publications
  • Influence
TLDR: Extreme Summarization of Scientific Documents
This work introduces SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers, and proposes CATTS, a simple yet effective learning strategy for generatingTLDRs that exploits titles as an auxiliary training signal.
Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions
This work introduces a novel data set of 7,800 tweets from users with known demographic traits where all instances of vulgar words are annotated with one of the six categories of vulgar word use and presents the first analysis of the pragmatic aspects of vulgarity and how they relate to social factors.
Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media
This study performs a large-scale, data-driven empirical analysis of vulgar words using social media data to analyze the socio-cultural and pragmatic aspects of vulgarity using tweets from users with known demographics.
Citation Text Generation
This paper establishes the task of citation text generation with a standard evaluation corpus and develops several strong baseline models for this task, and provides extensive automatic and human evaluations to illustrate the successes and shortcomings of current text generation techniques.
Explaining Relationships Between Scientific Documents
This paper establishes a dataset of 622K examples from 154K documents, and pretrain a large language model to serve as the foundation for autoregressive approaches to the task of explaining relationships between two scientific documents using natural language text.
Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users
A small sample of papers was evaluated for successful extraction of display equations and categories of paper objects identified for evaluation along with the common errors seen for each category, including semantic categories and common extraction errors.
Faithful and Plausible Explanations of Medical Code Predictions
This work proposes to train a proxy model that mimics the behavior of the trained model and provides fine-grained control over these trade-offs, and evaluates the approach on the task of assigning ICD codes to clinical notes to demonstrate that explanations from the proxy model are faithful and replicate thetrained model behavior.
SciA11y: Converting Scientific Papers to Accessible HTML
SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes the resulting paper components into a form that better supports skimming and scanning for blind and low vision readers.
Proxy Model Explanations for Time Series RNNs
A proxy model approach is introduced that is fast to train, faithful to the original model, and globally consistent in its explanations that improves over existing methods in an application to political event forecasting.
Model Distillation for Faithful Explanations of Medical Code Predictions
Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other