SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

  title={SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation},
  author={Yen-Chi Cheng and Hsin-Ying Lee and S. Tulyakov and Alexander G. Schwing and Liangyan Gui},
In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is… 

Figures and Tables from this paper

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

A diffusion-based 3D generation framework -- locally attentional SDF diffusion, to model plausible 3D shapes, via 2D sketch image input, empowered by a novel view-aware local attention mechanism for image-conditioned shape generation, greatly improving local controllability and model generalizability.

SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation

This model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in conditional setup and proposes a cascaded framework to effectively learn diffusion with high-dimensional embedding vectors of parts.

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

RenderDiffusion is presented, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision, that generates and renders an intermediate three-dimensional representation of a scene in each denoising step and allows for 2D inpainting to edit 3D scenes.

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

Text2Room is presented, a method for generating room-scale textured 3D meshes from a given text prompt as input that proposes a continuous alignment strategy that iteratively fuses scene frames with the existing geometry to create a seamless mesh.

Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

  • Rundi WuRuoshi LiuCarl VondrickChangxi Zheng
  • Computer Science
  • 2023
Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details that facilitates applications such as retargeting, outpainting and local editing.

Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback

  • Yiqi LinHao Wu Lin Wang
  • Computer Science
  • 2023
A novel language-guided interactive 3D generation system that integrates LLMs as a 3D layout interpreter into the off-the-shelf layout-to-3D generative models, allowing users to flexibly and interactively generate visual content.

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

DiffRF is introduced, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models which directly operates on an explicit voxel grid representation and learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation.

CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs

CommonScenes is a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense and shows clear advantages over other methods regarding generation consistency, quality, and diversity.

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Text2Tex is presented, a novel method for generating high-quality textures for 3D meshes from the given text prompts that significantly outperforms the existing text-driven approaches and GAN-based methods.

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

This work proposes a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames and implements sentence-conditioned soft attention over the memories that enables effective reference resolution and learns to maintain scene and actor consistency when needed.

Cross-Modal 3D Shape Generation and Manipulation

A generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces is proposed and is conceptually simple, easy to implement, robust to input domain shifts, and flexible to diverse reconstruction on partial 2D inputs.

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation

This paper proposes an au-toregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and gener-ation and shows that the proposed method outperforms the specialized state-of-the-art methods trained for individual tasks.

Efficient Geometry-aware 3D Generative Adversarial Networks

This work introduces an expressive hybrid explicit implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry by decoupling feature generation and neural rendering.

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

A novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains is proposed, which allows for the generation, editing, and animation of personalized artistic 3D avatars on artistic datasets.

3D generation on ImageNet

A 3D generator with Generic Priors (3DGP) is developed: a 3D synthesis framework with more general assumptions about the training data, and it is demonstrated that it scales to very challenging datasets, like ImageNet.

SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization

It is demonstrated that the proposed SDFDiff, a novel approach for image-based shape optimization using differentiable rendering of 3D shapes represented by signed distance functions, can be integrated with deep learning models, which opens up options for learning approaches on 3D objects without 3D supervision.

3D Shape Generation and Completion through Point-Voxel Diffusion

Point-Voxel Diffusion is a unified, probabilistic formulation for unconditional shape generation and conditional, multi-modal shape completion that marries denoising diffusion models with the hybrid, pointvoxel representation of 3D shapes.

Generative Multiplane Images: Making a 2D GAN 3D-Aware

This work modifications a classical GAN, i.e., StyleGANv2, as little as possible to produce a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth, alleviating memory concerns and enabling fast training of GMPIs in less than half a day at a resolution of $1024^2.

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks.

GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis

This paper proposes a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene, and introduces a multi-scale patch-based discriminator to demonstrate synthesis of high-resolution images while training the model from unposed 2D images alone.