Learn More
Purely bottom-up, unsupervised segmentation of a single image into two segments remains a challenging task for computer vision. The co-segmentation problem is the process of jointly segmenting several images with similar foreground objects but different backgrounds. In this work, we combine existing tools from bottom-up image segmentation such as normalized(More)
We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency(More)
Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data,(More)
In this paper, we tackle the problem of performing efficient co-localization in images and videos. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent state-of-the-art methods, we show how we are able to naturally incorporate temporal(More)
We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally(More)
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to perform " reasoning ". Furthermore, for the task of multiple-choice VQA, nearly all of these systems train a(More)
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Many popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation , especially for morphologically rich languages with large vocabularies and many rare(More)