Share This Author
The Never-Ending Language Learner is described, which achieves some of the desired properties of a never-ending learner, and lessons learned are discussed.
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
The first public dataset of scientific peer reviews available for research purposes (PeerRead v1) is presented and it is shown that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline.
Keyword search on external memory data graphs
This paper proposes a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation.
Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension
A new dataset and models for comprehending paragraphs about processes, an important genre of text describing a dynamic world, are presented and two new neural models that exploit alternative mechanisms for state prediction are introduced, in particular using LSTM input encoding and span prediction.
Pretrained Language Models for Sequential Sentence Classification
- Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld
- Computer ScienceEMNLP
- 9 September 2019
This work constructs a joint sentence representation that allows BERT Transformer layers to directly utilize contextual information from all words in all sentences, and achieves state-of-the-art results on four datasets, including a new dataset of structured scientific abstracts.
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
This work shows that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proofs that support them, and shows that generative techniques can perform a type of abduction with high precision.
Explaining Answers with Entailment Trees
ENTAILMENTBANK is created, the first dataset to contain multistep entailment trees, providing a new type of dataset (multistep entails) and baselines, offering a new avenue for the community to generate richer, more systematic explanations.
Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
- Niket Tandon, Bhavana Dalvi, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark
- Computer ScienceEMNLP
- 29 August 2018
This paper shows how the predicted effects of actions in the context of a paragraph can be improved in two ways: by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and by biasing reading with preferences from large-scale corpora.
Domain-Targeted, High Precision Knowledge Extraction
This work has created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precisionknowledge targeted to a particular domain - in this case, elementary science.
WebSets: extracting sets of entities from the web using unsupervised information extraction
This work describes a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus that relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns.