• Publications
  • Influence
Neural Motifs: Scene Graph Parsing with Global Context
TLDR
This work analyzes the role of motifs: regularly appearing substructures in scene graphs and introduces Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graph graphs that improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. Expand
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
TLDR
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. Expand
From Recognition to Cognition: Visual Commonsense Reasoning
TLDR
To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. Expand
Defending Against Neural Fake News
TLDR
A model for controllable text generation called Grover, found that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data, and the best defense against Grover turns out to be Grover itself, with 92% accuracy. Expand
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
TLDR
This paper introduces to the scientific community the first opinion-level annotated corpus of sentiment and subjectivity analysis in online videos called Multimodal Opinion-level Sentiment Intensity dataset (MOSI), which is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, andper-milliseconds annotated audio features. Expand
Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages
TLDR
This article addresses the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment by introducing the first multimodal dataset with opinion-level sentiment intensity annotations and proposing a new computational representation, called multi-modal dictionary, based on a language-gesture study. Expand
HellaSwag: Can a Machine Really Finish Your Sentence?
TLDR
The construction of HellaSwag, a new challenge dataset, and its resulting difficulty, sheds light on the inner workings of deep pretrained models, and suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges. Expand
PIQA: Reasoning about Physical Commonsense in Natural Language
TLDR
The task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA are introduced and analysis about the dimensions of knowledge that existing models lack are provided, which offers significant opportunities for future research. Expand
Adversarial Filters of Dataset Biases
TLDR
This work presents extensive supporting evidence that AFLite is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks. Expand
Zero-Shot Activity Recognition with Verb Attribute Induction
TLDR
Experimental results confirm that action attributes inferred from language can provide a predictive signal for zero-shot prediction of previously unseen activities. Expand
...
1
2
3
...