Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
- Liang-Chieh Chen, Yukun Zhu, G. Papandreou, Florian Schroff, Hartwig Adam
- Computer ScienceEuropean Conference on Computer Vision
- 7 February 2018
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.
Searching for MobileNetV3
- Andrew G. Howard, M. Sandler, Hartwig Adam
- Computer ScienceIEEE International Conference on Computer Vision
- 6 May 2019
This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets.
Skip-Thought Vectors
- Ryan Kiros, Yukun Zhu, S. Fidler
- Computer ScienceNIPS
- 22 June 2015
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the…
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
- Yukun Zhu, Ryan Kiros, S. Fidler
- Computer ScienceIEEE International Conference on Computer Vision
- 22 June 2015
To align movies and books, a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book are proposed.
MovieQA: Understanding Stories in Movies through Question-Answering
- Makarand Tapaswi, Yukun Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, S. Fidler
- Computer ScienceComputer Vision and Pattern Recognition
- 9 December 2015
The MovieQA dataset, which aims to evaluate automatic story comprehension from both video and text, is introduced and existing QA techniques are extended to show that question-answering with such open-ended semantics is hard.
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
- Bowen Cheng, Maxwell D. Collins, Liang-Chieh Chen
- Computer ScienceComputer Vision and Pattern Recognition
- 22 November 2019
For the first time, a bottom-up approach could deliver state-of-the-art results on panoptic segmentation, and performs on par with several top-down approaches on the challenging COCO dataset.
3D Object Proposals for Accurate Object Class Detection
- Xiaozhi Chen, Kaustav Kundu, R. Urtasun
- Computer ScienceNIPS
- 7 December 2015
This method exploits stereo imagery to place proposals in the form of 3D bounding boxes in the context of autonomous driving and outperforms all existing results on all three KITTI object classes.
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
- Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, A. Yuille, Liang-Chieh Chen
- Computer ScienceEuropean Conference on Computer Vision
- 17 March 2020
This paper factorizes 2D self-attention into two 1Dself-attentions, a novel building block that one could stack to form axial-att attention models for image classification and dense prediction, and achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
Spatially Adaptive Computation Time for Residual Networks
- Michael Figurnov, Maxwell D. Collins, R. Salakhutdinov
- Computer ScienceComputer Vision and Pattern Recognition
- 7 December 2016
Experimental results are presented showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets and the computation time maps on the visual saliency dataset cat2000 correlate surprisingly well with human eye fixation positions.
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
- Liang-Chieh Chen, Maxwell D. Collins, Jonathon Shlens
- Computer ScienceNeural Information Processing Systems
- 11 September 2018
This work constructs a recursive search space for meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation and demonstrates that even with efficient random search, this architecture can outperform human-invented architectures.
...
...