• Publications
  • Influence
Never-Ending Learning
TLDR
The Never-Ending Language Learner is described, which achieves some of the desired properties of a never-ending learner, and lessons learned are discussed. Expand
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
TLDR
The first public dataset of scientific peer reviews available for research purposes (PeerRead v1) is presented and it is shown that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. Expand
Keyword search on external memory data graphs
TLDR
This paper proposes a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation. Expand
Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension
TLDR
A new dataset and models for comprehending paragraphs about processes, an important genre of text describing a dynamic world, are presented and two new neural models that exploit alternative mechanisms for state prediction are introduced, in particular using LSTM input encoding and span prediction. Expand
Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
TLDR
This paper shows how the predicted effects of actions in the context of a paragraph can be improved in two ways: by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and by biasing reading with preferences from large-scale corpora. Expand
WebSets: extracting sets of entities from the web using unsupervised information extraction
TLDR
This work describes a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus that relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. Expand
Pretrained Language Models for Sequential Sentence Classification
TLDR
This work constructs a joint sentence representation that allows BERT Transformer layers to directly utilize contextual information from all words in all sentences, and achieves state-of-the-art results on four datasets, including a new dataset of structured scientific abstracts. Expand
Domain-Targeted, High Precision Knowledge Extraction
TLDR
This work has created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precisionknowledge targeted to a particular domain - in this case, elementary science. Expand
Structure, tie persistence and event detection in large phone and SMS networks
TLDR
This paper studies the communication records of 2 million anonymized customers of a large mobile phone company with 50 million interactions over a period of 6 months and proposes a change-point detection method in user behaviors using eigenvalue analysis of various behavioral features extracted over time. Expand
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
TLDR
Unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions, demonstrates that modern NLP methods can result in mastery on this task. Expand
...
1
2
3
4
...