• Corpus ID: 227053765

Multi-Plane Program Induction with 3D Box Priors

  title={Multi-Plane Program Induction with 3D Box Priors},
  author={Yikai Li and Jiayuan Mao and Xiuming Zhang and Bill Freeman and Joshua B. Tenenbaum and Noah Snavely and Jiajun Wu},
We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene. Unlike prior work on image-based program synthesis, which assumes the image contains a single visible 2D plane, we present Box Program Induction (BPI), which infers a program-like scene representation that simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes… 

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

This work presents Worldsheet, a method for novel view synthesis using just a single RGB image as input, and proposes a novel differentiable texture sampler that allows the authors' wrapped mesh sheet to be textured and rendered differentiably into an image from a target viewpoint.

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

  • Xuanchi RenXiaolong Wang
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
This paper proposes a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions, and introduces a locality constraint based on the input cameras to guide self-attention among a large number of patches across space and time.

Image Synthesis with Appearance Decomposition

This thesis focuses on studying how appearance decomposition can improve image synthesis methods using two examples, and introduces a periodicity-aware single image framework to synthesize a scene of near-periodic patterns (NPP).

Learning to Infer 3D Shape Programs with Differentiable Renderer

An analytical yet differentiable executor that is more faithful and controllable in interpreting shape programs (particularly in extrapolation) and more sample efficient (requires no training) to facilitate the generator’s learning when ground truth programs are not available and should be especially useful when new shape-program components are enrolled either by human designers or—in the context of library learning— algorithms themselves.

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

The proposed Holistically-Attracted Wireframe Parsing (HAWP), a parsimonious representation that encodes a line segment using a closed-form 4D geometric vector, enables lifting line segments in wireframe to an end-to-end trainable holistic attraction that has built-in geometry-awareness, context-awareness and robustness.

Learning Continuous Implicit Representation for Near-Periodic Patterns

A neural implicit representation using a coordinate-based MLP with single image optimization to handle both global consistency and local variations of Near-Periodic Patterns and improves the robustness of the method.

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

A neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image and achieves state-of-the-art performance on the ScanNet and NYUv2 datasets.

Test-time adaptation with slot-centric models

Slot-TTA is proposed, a semi-supervised instance segmentation model equipped with a slot-centric image rendering component that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives, and it is shown that test-time adaptation greatly improves segmentation in out-of-distribution scenes.

Quasi-globally Optimal and Real-time Visual Compass in Manhattan Structured Environments

We present a drift-free visual compass to estimate the three degrees of freedom (DoF) rotational motion of a camera by recognizing structural regularities in a Manhattan world (MW), which posits that

Generating Fast and Slow: Scene Decomposition via Reconstruction

GFS-Nets (Generating Fast and Slow Networks) are proposed that alleviate the problem of segmenting scenes into constituent entities and show the proposed curriculum suffices to break the reconstruction-segmentation trade-off, and slow inference greatly improves segmentation in out-of-distribution scenes.



Perspective Plane Program Induction From a Single Image

The proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the inverse graphics problem of inferring a holistic representation for natural images.

SynSin: End-to-End View Synthesis From a Single Image

This work proposes a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view and outperforms baselines and prior work on the Matterport, Replica, and RealEstate10K datasets.

Image completion using planar structure guidance

We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region

PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image

A deep neural architecture that detects and reconstructs piecewise planar regions from a single RGB image using a variant of Mask R-CNN and refines an arbitrary number of segmentation masks with a novel loss enforcing the consistency with a nearby view during training.

Learning to Infer and Execute 3D Shape Programs

This paper proposes 3D shape programs, integrating bottom-up recognition systems with top-down, symbolic program structure to capture both low-level geometry and high-level structural priors for 3D shapes.

Im2Struct: Recovering 3D Shape Structure from a Single RGB Image

This work develops a convolutional-recursive auto-encoder comprised of structure parsing of a 2D image followed by structure recovering of a cuboid hierarchy, which achieves unprecedentedly faithful and detailed recovery of diverse 3D part structures from single-view 2D images.

Free-Form Image Inpainting With Gated Convolution

The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

Foreground-Aware Image Inpainting

  • Wei XiongJiahui Yu Jiebo Luo
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
This work proposes a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion, and shows that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inPainting.

Generative Image Inpainting with Contextual Attention

This work proposes a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.