• Publications
  • Influence
Local and Global Algorithms for Disambiguation to Wikipedia
TLDR
This work analyzes approaches that utilize information from Wikipedia link structure to arrive at coherent sets of disambiguations for a given document, and compares them to more traditional (local) approaches. Expand
Unsupervised named-entity extraction from the Web: An experimental study
TLDR
An overview of KnowItAll's novel architecture and design principles is presented, emphasizing its distinctive ability to extract information without any hand-labeled training examples, and three distinct ways to address this challenge are presented and evaluated. Expand
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
TLDR
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Expand
Web-scale information extraction in knowitall: (preliminary results)
TLDR
KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner, is introduced. Expand
Construction of the Literature Graph in Semantic Scholar
TLDR
This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. Expand
TabEL: Entity Linking in Web Tables
TLDR
TabEL differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB, and enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables. Expand
Locating Complex Named Entities in Web Text
TLDR
This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text and shows that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. Expand
Abductive Commonsense Reasoning
TLDR
This study introduces a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, and conceptualizes two new tasks -- Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and Abduction NLG: a conditional generation task for explaining given observations in natural language. Expand
Definition Modeling: Learning to Define Word Embeddings in Natural Language
TLDR
The results show that a model that controls dependencies between the word being defined and the definition words performs significantly better, and that a character-level convolution layer designed to leverage morphology can complement word-level embeddings. Expand
KnowItNow: Fast, Scalable Information Extraction from the Web
TLDR
A novel architecture for IE that obviates queries to commercial search engines is introduced, embodied in a system called KnowItNow that performs high-precision IE in minutes instead of days, and the tradeoff between recall and speed is quantified. Expand
...
1
2
3
4
5
...