• Publications
  • Influence
MSˆ2: Multi-Document Summarization of Medical Studies
TLDR
This work releases MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
TLDR
This work proposes a novel framework for CCA as a document-level context extraction and labeling task, and releases MULTICITE, a new dataset of 12,653 citation contexts from over 1,200 computational linguistics papers, the largest collection of expert-annotated citation contexts to-date.
Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users
TLDR
A small sample of papers was evaluated for successful extraction of display equations and categories of paper objects identified for evaluation along with the common errors seen for each category, including semantic categories and common extraction errors.
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
TLDR
New methods that explicitly model VIsual LAyout (VILA) groups, that is, text lines or text blocks, to further improve performance are introduced and it is shown that simply inserting special tokens denoting layout group boundaries into model inputs can lead to a 1.9% Macro F1 improvement in token classification.
Incorporating Visual Layout Structures for Scientific Text Classification
TLDR
This work introduces new methods for incorporating VIsual LAyout (VILA) structures, e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance and designs a hierarchical model, H-VILA, that encodes the text based on layout structures.
A Search Engine for Discovery of Scientific Challenges and Directions (preprint)/ en
TLDR
A novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics.
ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts
TLDR
ACCoRD, an end-to-end system tack-ling the novel task of generating sets of descriptions of scientific concepts, is presented and a user study is conducted demonstrating that users prefer descriptions produced by the system, and users prefer multiple descriptions to a single “best” description.
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
TLDR
This work proposes scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences, and demonstrates its usefulness in zero-shot fact checking for biomedical claims, and proposes CLAIMGEN-BART, a new supervised method for generating claims supported by the literature, as well as KBIN, a novel methods for generating claim negations.
...
1
2
...