• Publications
  • Influence
Construction of the Literature Graph in Semantic Scholar
TLDR
This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task.
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
TLDR
Success is reported on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 percent on the exam’s nondiagram, multiple choice (NDMC) questions, demonstrating that modern natural language processing methods can result in mastery on this task.
A Simple Yet Strong Pipeline for HotpotQA
TLDR
This paper presents a simple pipeline based on BERT that outperforms large-scale language models on both question answering and support identification on HotpotQA (and achieves performance very close to a RoBERTa model).
Documenting the English Colossal Clean Crawled Corpus
TLDR
This work provides some of the first documentation of the English Colossal Clean Crawled Corpus (C4), one of the largest corpora of text available, and hosts an indexed version of C4 at https://c4-search.allenai.org/, allowing anyone to search it.
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
TLDR
This work provides some of the first documentation for the Colossal Clean Crawled Corpus (C4), a dataset created by applying a set of filters to a single snapshot of Common Crawl, and evaluates the text that was removed, and shows that blocklist filtering disproportionately removes text from and about minority individuals.
IKE - An Interactive Tool for Knowledge Extraction
TLDR
IKE is a new extraction tool that performs fast, interactive bootstrapping to develop high-quality extraction patterns for targeted relations and is the first interactive extraction tool to seamlessly integrate symbolic and distributional methods for search.