Visually Grounded Compound PCFGs

@article{Zhao2020VisuallyGC,
  title={Visually Grounded Compound PCFGs},
  author={Yanpeng Zhao and Ivan Titov},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.12404}
}
Exploiting visual groundings for language understanding has recently been drawing much attention. In this work, we study visually grounded grammar induction and learn a constituency parser from both unlabeled text and its visual groundings. Existing work on this task (Shi et al., 2019) optimizes a parser via Reinforce and derives the learning signal only from the alignment of images and sentences. While their model is relatively accurate overall, its error distribution is very uneven, with low… Expand

Figures and Tables from this paper

Clustering Contextualized Representations of Text for Unsupervised Syntax Induction
TLDR
This work proposes a deep embedded clustering approach which jointly transforms contextualized text representations into a lower dimension cluster friendly space and clusters them, and enhances them by augmenting them with task-specific representations. Expand
VLGrammar: Grounded Grammar Induction of Vision and Language
TLDR
This work presents VLGrammar, a method that uses compound probabilistic contextfree grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously, and proposes a novel contrastive learning framework to guide the joint learning of both modules. Expand
Video-aided Unsupervised Grammar Induction
TLDR
This paper investigates video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video, and proposes a Multi-Modal Compound PCFG model (MMC-PCFG), which outperforms each individual modality and previous state-of-the-art systems on three benchmarks. Expand
An Empirical Study of Compound PCFGs
TLDR
This work relies on a fast implementation of C-PCFGs to conduct evaluation complementary to that of (CITATION), and highlights three key findings: C-pcFGs are data-efficient, C- PCFGs make the best use of global sentence-level information in preterminal rule probabilities, and the best configurations of C+PFGs on English do not always generalize to morphology-rich languages. Expand
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
TLDR
The Bi-Bimodal Fusion Network (BBFN), a novel end-to-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations that significantly outperforms the SOTA. Expand
Dependency Induction Through the Lens of Visual Perception
TLDR
The experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50% as compared to state-of-the-art models trained on pure text. Expand
Grounding ‘Grounding’ in NLP
TLDR
This work investigates the gap between definitions of “grounding” in NLP and Cognitive Science, and presents ways to both create new tasks or repurpose existing ones to make advancements towards achieving a more complete sense of grounding. Expand
KANDINSKYPatterns - An experimental exploration environment for Pattern Analysis and Machine Intelligence
TLDR
This paper discusses existing diagnostic tests and test datasets such as CLEVR, CLEVERER, CLOSURE, CURI, Bongard-LOGO, V-PROM, and presents the KANDINSKYPatterns, named after the Russian artist Wassily Kandinksy, which have computationally controllable properties and are easily distinguishable by human observers. Expand
Neural Bi-Lexicalized PCFG Induction
TLDR
This paper proposes an approach to parameterize L-PCFGs without making implausible independence assumptions, which directly models bilexical dependencies and meanwhile reduces both learning and representation complexities of L PCFGs. Expand
PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols
TLDR
A new parameterization form of PCFGs based on tensor decomposition is presented, which has at most quadratic computational complexity in the symbol number and therefore allows us to use a much larger number of symbols. Expand
...
1
2
...

References

SHOWING 1-10 OF 54 REFERENCES
Visually Grounded Neural Syntax Acquisition
We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking atExpand
Unsupervised Learning of PCFGs with Normalizing Flow
TLDR
A neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model to extend PCFG induction to use semantic and morphological information. Expand
What is Learned in Visually Grounded Neural Syntax Acquisition
TLDR
This analysis considers the case study of the Visually Grounded Neural Syntax Learner, a recent approach for learning syntax from a visual training signal, and finds significantly less expressive versions of the model produce similar predictions and perform just as well, or even better. Expand
Learning visually grounded words and syntax for a scene description task
  • D. Roy
  • Computer Science
  • Comput. Speech Lang.
  • 2002
TLDR
A spoken language generation system that learns to describe objects in computer-generated visual scenes and generates syntactically well-formed compound adjective noun phrases, as well as relative spatial clauses was comparable to human-generated descriptions. Expand
Cooperative Learning of Disjoint Syntax and Semantics
TLDR
This work presents a recursive model inspired by Choi et al. (2018) that reaches near perfect accuracy on this task and performs competitively on several natural language tasks, such as Natural Language Inference and Sentiment Analysis. Expand
Viterbi Training Improves Unsupervised Dependency Parsing
We show that Viterbi (or "hard") EM is well-suited to unsupervised grammar induction. It is more accurate than standard inside-outside re-estimation (classic EM), significantly faster, and simpler.Expand
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
TLDR
A model to transfer visual information to textual representations by learning an intermediate representation space: the grounded space is proposed and it is shown that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks. Expand
Parsing with Compositional Vector Grammars
TLDR
A Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations and improves performance on the types of ambiguities that require semantic information such as PP attachments. Expand
Guiding Unsupervised Grammar Induction Using Contrastive Estimation
TLDR
It is shown that, using the same features, log-linear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task). Expand
Multimodal Distributional Semantics
TLDR
This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter. Expand
...
1
2
3
4
5
...