• Publications
  • Influence
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared. Expand
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
TLDR
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs. Expand
Visual Relationship Detection with Language Priors
TLDR
This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Expand
Visual7W: Grounded Question Answering in Images
TLDR
A semantic link between textual descriptions and image regions by object-level grounding enables a new type of QA with visual answers, in addition to textual answers used in previous work, and proposes a novel LSTM model with spatial attention to tackle the 7W QA tasks. Expand
The future of crowd work
TLDR
This paper outlines a framework that will enable crowd work that is complex, collaborative, and sustainable, and lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation. Expand
Image retrieval using scene graphs
TLDR
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features. Expand
Soylent: a word processor with a crowd inside
TLDR
S soylent, a word processing interface that enables writers to call on Mechanical Turk workers to shorten, proofread, and otherwise edit parts of their documents on demand, and the Find-Fix-Verify crowd programming pattern, which splits tasks into a series of generation and review stages. Expand
Twitinfo: aggregating and visualizing microblogs for event exploration
TLDR
TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity, and can identify 80-100% of manually labeled peaks, facilitating a relatively complete view of each event studied. Expand
Empath: Understanding Topic Signals in Large-Scale Text
TLDR
Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Expand
Short and tweet: experiments on recommending content from information streams
TLDR
This paper studied content recommendation on Twitter to better direct user attention and explored three separate dimensions in designing such a recommender: content sources, topic interest models for users, and social voting. Expand
...
1
2
3
4
5
...