Scene Parsing through ADE20K Dataset
- Bolei Zhou, Hang Zhao, Xavier Puig, S. Fidler, Adela Barriuso, A. Torralba
- Computer ScienceComputer Vision and Pattern Recognition
- 21 July 2017
The ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, is introduced and it is shown that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis.
Skip-Thought Vectors
- Ryan Kiros, Yukun Zhu, S. Fidler
- Computer ScienceNIPS
- 22 June 2015
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the…
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
- Yukun Zhu, Ryan Kiros, S. Fidler
- Computer ScienceIEEE International Conference on Computer Vision
- 22 June 2015
To align movies and books, a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book are proposed.
The Role of Context for Object Detection and Semantic Segmentation in the Wild
- Roozbeh Mottaghi, Xianjie Chen, A. Yuille
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 23 June 2014
A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
- Fartash Faghri, David J. Fleet, J. Kiros, S. Fidler
- Computer ScienceBritish Machine Vision Conference
- 1 July 2017
A simple change to common loss functions used for multi-modal embeddings, inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, is introduced, which yields significant gains in retrieval performance.
Semantic Understanding of Scenes Through the ADE20K Dataset
- Bolei Zhou, Hang Zhao, Xavier Puig, S. Fidler, Adela Barriuso, A. Torralba
- Computer ScienceInternational Journal of Computer Vision
- 18 August 2016
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
- D. Damen, Hazel Doughty, Michael Wray
- Computer ScienceArXiv
- 8 April 2018
This paper introduces EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments, and had the participants narrate their own videos (after recording), thus reflecting true intention, and crowd-sourced ground-truths based on these.
Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts
- Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, S. Fidler, R. Urtasun, A. Yuille
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 8 June 2014
This work proposes a novel approach to handle large deformations and partial occlusions in animals in terms of body parts, and applies it to the six animal categories in the PASCAL VOC dataset and shows that it significantly improves state-of-the-art (by 4.1% AP) and provides a richer representation for objects.
Monocular 3D Object Detection for Autonomous Driving
- Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, S. Fidler, R. Urtasun
- Computer ScienceComputer Vision and Pattern Recognition
- 27 June 2016
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
MovieQA: Understanding Stories in Movies through Question-Answering
- Makarand Tapaswi, Yukun Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, S. Fidler
- Computer ScienceComputer Vision and Pattern Recognition
- 9 December 2015
The MovieQA dataset, which aims to evaluate automatic story comprehension from both video and text, is introduced and existing QA techniques are extended to show that question-answering with such open-ended semantics is hard.
...
...