Share This Author
PPDB: The Paraphrase Database
The 1.0 release of the paraphrase database, PPDB, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140million paraphrase patterns, which capture many meaning-preserving syntactic transformations.
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
- Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch
- Computer ScienceACL
- 1 July 2015
PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings.
Hypothesis Only Baselines in Natural Language Inference
- Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, Benjamin Van Durme
- Computer Science*SEMEVAL
- 2 May 2018
This approach, which is referred to as a hypothesis-only model, is able to significantly outperform a majority-class baseline across a number of NLI datasets and suggests that statistical irregularities may allow a model to perform NLI in some datasets beyond what should be achievable without access to the context.
What do you learn from context? Probing for sentence structure in contextualized word representations
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena.
Gender Bias in Coreference Resolution
- Rachel Rudinger, Jason Naradowsky, Brian Leonard, Benjamin Van Durme
- Computer ScienceNAACL
- 25 April 2018
A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed.
- Courtney Napoles, Matthew R. Gormley, Benjamin Van Durme
- Computer ScienceAKBC-WEKEX@NAACL-HLT
- 7 June 2012
This work has created layers of annotation on the English Gigaword v.5 corpus to render it useful as a standardized corpus for knowledge extraction and distributional semantics, and provides to the community a public reference set based on current state-of-the-art syntactic analysis and coreference resolution.
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
- Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme
- Computer ScienceArXiv
- 30 October 2018
This work presents a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning, and demonstrates that the performance of state-of-the-art MRC systems fall far behind human performance.
Open Domain Targeted Sentiment
- Margaret Mitchell, Jacqui Aguilar, Theresa Wilson, Benjamin Van Durme
- Computer ScienceEMNLP
- 1 October 2013
The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity, and this representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them.
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The…
Answer Extraction as Sequence Tagging with Tree Edit Distance
A linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types is constructed, casting answer extraction as an answer sequence tagging problem for the first time.