Corpus ID: 236469286

CRD-CGAN: Category-Consistent and Relativistic Constraints for Diverse Text-to-Image Generation

@article{Hu2021CRDCGANCA,
  title={CRD-CGAN: Category-Consistent and Relativistic Constraints for Diverse Text-to-Image Generation},
  author={Tao Hu and Chengjiang Long and Chunxia Xiao},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.13516}
}
  • Tao Hu, Chengjiang Long, Chunxia Xiao
  • Published 2021
  • Computer Science
  • ArXiv
Generating photo-realistic images from a text description is a challenging problem in computer vision. Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks (GANs). In this paper, we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images. Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to… Expand
Fine-Grained Image Generation from Bangla Text Description using Attentional Generative Adversarial Network
TLDR
This work proposes Bangla Attentional Generative Adversarial Network (AttnGAN) that allows intensified, multi-stage processing for high-resolution Bangla text-to-image generation and produces a fine-grained image for the first time. Expand
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
  • Xinzhi Dong, Chengjiang Long, Wenju Xu, Chunxia Xiao
  • Computer Science
  • ArXiv
  • 2021
TLDR
This paper proposes Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for image captioning that not only uses an object-level GCN to capture the object to object spatial relation within a single image, but also adopt an image-levelGCN to captured the feature information provided by similar images. Expand
Scene Inference for Object Illumination Editing
  • Zhongyun Bao, Chengjiang Long, +4 authors Chunxia Xiao
  • Computer Science
  • ArXiv
  • 2021
TLDR
A physically-based rendering method is applied to create a large-scale, high-quality dataset, named IH dataset, which provides rich illumination information for seamless illumination integration task, and a deep learning-based SI-GAN method is proposed, a multi-task collaborative network, which makes full use of the multi-scale attention mechanism and adversarial learning strategy. Expand
A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction
TLDR
HF-VAD is proposed, a Hybrid framework that integrates Flow reconstruction and Frame prediction seamlessly to handle Video Anomaly Detection and employs a Conditional Variational Autoencoder (CVAE), which captures the high correlation between video frame and optical flow, to predict the next frame given several previous frames. Expand
Luminance Attentive Networks for HDR Image and Panorama Reconstruction
  • Hanning Yu, Wentao Liu, Chengjiang Long, Bo Dong, Qin Zou, Chunxia Xiao
  • Computer Science, Engineering
  • ArXiv
  • 2021
TLDR
Extensive experiments show that the proposed approach LANet can reconstruct visually convincing HDR images and demonstrate its superiority over state-of-the-art approaches in terms of all metrics in inverse tone mapping. Expand
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
TLDR
A novel MultiScale Residual Graph Convolution Network (MSR-GCN) is proposed for human pose prediction task in the manner of end-to-end, where GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. Expand

References

SHOWING 1-10 OF 61 REFERENCES
Adversarial Learning of Semantic Relevance in Text to Image Synthesis
TLDR
A new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input is described, based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Expand
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Expand
RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge
TLDR
A novel rich feature generating text-to-image synthesis, called RiFeGAN, to enrich the given description and exploits multi-captions attentional generative adversarial networks to synthesize images from those features. Expand
Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
TLDR
LD-CGAN is a compact and structured single-stream network that consists of one generator and two independent discriminators to regularize and generate 642 and 1282 images in one feed-forward process and significantly decreases the number of parameters and computation time. Expand
Object-Driven Text-To-Image Synthesis via Adversarial Training
TLDR
A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality. Expand
Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis
TLDR
A new method for text-to-image synthesis, dubbed Multi-sentence Auxiliary Generative Adversarial Networks (MA-GAN), which significantly outperforms the state-of-the-art methods and guarantees the generation similarity of related sentences by exploring the semantic correlation between different sentences describing the same image. Expand
Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
TLDR
This work proposes a simple yet effective regularization term to address the mode collapse issue for cGANs and explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. Expand
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images. Expand
Semantics Disentangling for Text-To-Image Generation
TLDR
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics. Expand
Generative Adversarial Text to Image Synthesis
TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels. Expand
...
1
2
3
4
5
...