Image Inpainting for Irregular Holes Using Partial Convolutions
- Guilin Liu, F. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro
- Computer ScienceEuropean Conference on Computer Vision
- 20 April 2018
This work proposes the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels, and outperforms other methods for irregular masks.
Improving Semantic Segmentation via Video Propagation and Label Relaxation
- Yi Zhu, Karan Sapra, Bryan Catanzaro
- Computer ScienceComputer Vision and Pattern Recognition
- 4 December 2018
This paper presents a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks, and introduces a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries.
Graphical Contrastive Losses for Scene Graph Parsing
- Ji Zhang, Kevin J. Shih, A. Elgammal, Andrew Tao, Bryan Catanzaro
- Computer ScienceComputer Vision and Pattern Recognition
- 7 March 2019
A set of contrastive loss formulations are proposed that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses, and show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.
Where to Look: Focus Regions for Visual Question Answering
- Kevin J. Shih, Saurabh Singh, Derek Hoiem
- Computer ScienceComputer Vision and Pattern Recognition
- 23 November 2015
A method that learns to answer visual questions by selecting image regions relevant to the text-based query that exhibits significant improvements in answering questions such as "what color", where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions.
SDC-Net: Video Prediction Using Spatially-Displaced Convolution
- F. Reda, Guilin Liu, Bryan Catanzaro
- Computer ScienceEuropean Conference on Computer Vision
- 8 September 2018
SDC module for video frame prediction with spatially-displaced convolution inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages.
Learning Collections of Part Models for Object Recognition
- Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 23 June 2013
The detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories and evaluating the part detectors' ability to discriminate and localize annotated key points on PASCAL VOC 2010.
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- Rafael Valle, Kevin J. Shih, R. Prenger, Bryan Catanzaro
- Computer ScienceInternational Conference on Learning…
- 12 May 2020
The mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality, and results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training are provided.
One TTS Alignment to Rule Them All
- Rohan Badlani, Adrian Lancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
- Computer ScienceIEEE International Conference on Acoustics…
- 23 August 2021
This paper leverages the alignment mechanism proposed in RAD-TTS and improves alignment convergence speed, simplifies the training pipeline by eliminating need for external aligners, enhances robustness to errors on long utterances and improves the perceived speech synthesis quality, as judged by human evaluators.
Graphical Contrastive Losses for Scene Graph Generation
- Ji Zhang, Kevin J. Shih, A. Elgammal, Andrew Tao, Bryan Catanzaro
- Computer ScienceArXiv
- 7 March 2019
A set of contrastive loss formulations are proposed that specifically target these types of errors within the scene graph generation problem, collectively termed the Graphical Contrastive Losses.
Partial Convolution based Padding
- Guilin Liu, Kevin J. Shih, Bryan Catanzaro
- Computer ScienceArXiv
- 28 November 2018
This paper presents a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks and demonstrates that the proposed padding scheme consistently outperforms standard zero padding with better accuracy.
...
...