Corpus ID: 229924024

Language-Mediated, Object-Centric Representation Learning

  title={Language-Mediated, Object-Centric Representation Learning},
  author={Ruocheng Wang and Jiayuan Mao and S. Gershman and Jiajun Wu},
We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object segmentation, notably MONet and Slot Attention. While these algorithms learn an object-centric representation just by reconstructing the input image, LORL enables them to further learn to associate the learned representations to concepts, i.e., words for object… Expand