• Publications
  • Influence
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
TLDR
ScispaCy, a new Python library and models for practical biomedical/scientific text processing, which heavily leverages the spaCy library is described, which detail the performance of two packages of models released in scispa Cy and demonstrate their robustness on several tasks and datasets.
Pretrained Language Models for Sequential Sentence Classification
TLDR
This work constructs a joint sentence representation that allows BERT Transformer layers to directly utilize contextual information from all words in all sentences, and achieves state-of-the-art results on four datasets, including a new dataset of structured scientific abstracts.
High-Precision Extraction of Emerging Concepts from Scientific Literature
TLDR
This work presents an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work, and a substantially better precision-yield trade-off across the top 15,000 extractions.
S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
TLDR
This work presents S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation, and releases the unified dataset, model code, trained models, and evaluation suite to the research community.
Towards Personalized Descriptions of Scientific Concepts
TLDR
This paper proposes generating personalized scientific concept descriptions that are tailored to the user’s expertise and context and outlines a complete architecture for the task and releases an expert-annotated resource, ACCoRD.
Infrastructure for Rapid Open Knowledge Network Development
TLDR
A National Science Foundation Convergence Accelerator project is described to build a set of Knowledge Network Programming Infrastructure systems to address the issue of frustratingly slow building, using, and scaling large knowledge networks.
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
TLDR
PINOCCHIO is presented, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications
TLDR
The first dataset has over 300,000 ground truth academic mentor-mentee pairs obtained from multiple diverse, manually-curated sources, and linked to the Semantic Scholar (S2) knowledge graph and is formed by applying the classifier to the complete co-authorship graph of S2.