Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

  title={Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models},
  author={Muyang Li and Ji Lin and Chenlin Meng and Stefano Ermon and Song Han and Jun-Yan Zhu},
During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that… 

Image Deblurring with Domain Generalizable Diffusion Models

This work investigates the generalization ability of icDPMs in deblurring, and proposes a simple but effective guidance to significantly alleviate artifacts, and improve the out-of-distribution performance.

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

A simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation and the performance outperforming diffusion-model-based image-editing algorithms that require fine-tuning.



GAN Compression: Efficient Architectures for Interactive Conditional GANs

A general-purpose compression framework for reducing the inference time and model size of the generator in cGANs and decouple the model training and architecture search via weight sharing is proposed.

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

This work proposes Iterative Latent Variable Refinement (ILVR), a method to guide the generative process in DDPM to generate high-quality images based on a given reference image, which allows adaptation of a single DDPM without any additional learning in various image generation tasks.

Anycost GANs for Interactive Image Synthesis and Editing

This paper trains the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds and develops new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection.

Exploring Sparsity in Image Super-Resolution for Efficient Inference

A Sparse Mask SR (SMSR) network to learn sparse masks to prune redundant computation and achieves state-of-the-art performance with 41%/33%/27% FLOPs being reduced for ×2/3/4 SR.

Spatially Adaptive Feature Refinement for Efficient Inference

Results show that SAR only refines less than 40% of the regions in the feature representations of a ResNet for 97% ofThe samples in the validation set of ImageNet to achieve comparable accuracy with the original model, revealing the high computational redundancy in the spatial dimension of CNNs.

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

The new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space is proposed.

Teachers Do More Than Teach: Compressing Image-to-Image Models

  • Qing JinJian Ren S. Tulyakov
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
This work revisits the search space of generative models, introducing an inception-based residual block into generators and proposes a one-step pruning algorithm that searches a student architecture from the teacher model and substantially reduces searching cost.

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

An efficient CUDA implementation of the dynamic convolutions conditioned on the input image using a gather-scatter approach is provided, achieving a significant improvement in inference speed on MobileNetV2 and ShuffleNet V2.

SBNet: Sparse Blocks Network for Fast Inference

This work leverages the sparsity structure of computation masks and proposes a novel tiling-based sparse convolution algorithm that is effective on LiDAR-based 3D object detection, and reports significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.

Generating Diverse High-Fidelity Images with VQ-VAE-2

It is demonstrated that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.