• Publications
  • Influence
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1. Expand
Supervised Open Information Extraction
A novel formulation of Open IE as a sequence tagging problem, addressing challenges such as encoding multiple extractions for a predicate, and a supervised model that outperforms the existing state-of-the-art Open IE systems on benchmark datasets. Expand
Evaluating Gender Bias in Machine Translation
An automatic gender bias evaluation method for eight target languages with grammatical gender, based on morphological analysis is devised, which shows that four popular industrial MT systems and two recent state-of-the-art academic MT models are significantly prone to gender-biased translation errors for all tested target languages. Expand
Crowdsourcing Question-Answer Meaning Representations
A crowdsourcing scheme is developed to show that QAMRs can be labeled with very little training, and a qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets. Expand
Creating a Large Benchmark for Open Information Extraction
This work develops a methodology that leverages the recent QA-SRL annotation to create a first independent and large scale Open IE annotation and uses it to automatically compare the most prominent Open IE systems. Expand
Getting More Out Of Syntax with PropS
This work presents PropS -- an output representation designed to explicitly and uniformly express much of the proposition structure which is implied from syntax, and an associated tool for extracting it from dependency trees. Expand
The Right Tool for the Job: Matching Model and Instance Complexities
This work proposes a modification to contextual representation fine-tuning which allows for an early (and fast) “exit” from neural network calculations for simple instances, and late (and accurate) exit for hard instances during inference. Expand
Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models
This work uses the CADEC corpus to train a recurrent neural network (RNN) transducer, integrated with knowledge graph embeddings of DBpedia, and shows the resulting model to be highly accurate. Expand
Evaluating Question Answering Evaluation
This work studies the suitability of existing metrics in QA and explores using BERTScore, a recently proposed metric for evaluating translation, for QA, finding that although it fails to provide stronger correlation with human judgements, future work focused on tailoring a BERT-based metric to QA evaluation may prove fruitful. Expand
Active Learning for Coreference Resolution using Discrete Annotation
We improve upon pairwise annotation for active learning in coreference resolution, by asking annotators to identify mention antecedents if a presented mention pair is deemed not coreferent. ThisExpand