Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision Supplementary Materials

Abstract

Similar to the spatial transformer network introduced in [1], we propose a 2-step procedure: (1) performing dense sampling from input volume (in 3D world coordinates) to output volume (in screen coordinates), and (2) flattening the 3D spatial output across disparity dimension. In the experiment, we assume that transformation matrix is always given as input, parametrized by the viewpoint α. Again, the 3D point (xi , y s i , z s i ) in input volume V ∈ RH×W×D and corresponding point (xi, y i , di) in output volume U ∈ RH×W ′×D′ is linked by perspective transformation matrix Θ4×4. Here, (W,H,D) and (W ′, H ′, D′) are the width, height and depth of input and output volume, respectively. x s i y i z i 1  = θ11 θ12 θ13 θ14 θ21 θ22 θ23 θ24 θ31 θ32 θ33 θ34 θ41 θ42 θ43 θ44  x̃i t ỹi t z̃i t 1  (2)

Cite this paper

@inproceedings{Yan2016PerspectiveTN, title={Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision Supplementary Materials}, author={Xinchen Yan and Jimei Yang and Ersin Yumer and Yijie Guo and Honglak Lee}, year={2016} }