• Publications
  • Influence
PPDB: The Paraphrase Database
TLDR
The 1.0 release of the paraphrase database, PPDB, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140million paraphrase patterns, which capture many meaning-preserving syntactic transformations. Expand
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
TLDR
PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings. Expand
Hypothesis Only Baselines in Natural Language Inference
TLDR
This approach, which is referred to as a hypothesis-only model, is able to significantly outperform a majority-class baseline across a number of NLI datasets and suggests that statistical irregularities may allow a model to perform NLI in some datasets beyond what should be achievable without access to the context. Expand
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena. Expand
Annotated Gigaword
TLDR
This work has created layers of annotation on the English Gigaword v.5 corpus to render it useful as a standardized corpus for knowledge extraction and distributional semantics, and provides to the community a public reference set based on current state-of-the-art syntactic analysis and coreference resolution. Expand
Gender Bias in Coreference Resolution
TLDR
A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed. Expand
Open Domain Targeted Sentiment
TLDR
The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity, and this representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. Expand
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
TLDR
This work presents a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning, and demonstrates that the performance of state-of-the-art MRC systems fall far behind human performance. Expand
Answer Extraction as Sequence Tagging with Tree Edit Distance
TLDR
A linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types is constructed, casting answer extraction as an answer sequence tagging problem for the first time. Expand
Efficient spoken term discovery using randomized algorithms
TLDR
This paper investigates the use of randomized algorithms that operate directly on the raw acoustic features to produce sparse approximate similarity matrices in O( n) space and O(n log n) time and demonstrates these techniques facilitate spoken term discovery performance capable of outperforming a model-based strategy in the zero resource setting. Expand
...
1
2
3
4
5
...