Dependency Induction Through the Lens of Visual Perception

@inproceedings{Su2021DependencyIT,
  title={Dependency Induction Through the Lens of Visual Perception},
  author={Ruisi Su and Shruti Rijhwani and Hao Zhu and Junxian He and Xinyu Wang and Yonatan Bisk and Graham Neubig},
  booktitle={CONLL},
  year={2021}
}
Most previous work on grammar induction focuses on learning phrasal or dependency structure purely from text. However, because the signal provided by text alone is limited, recently introduced visually grounded syntax models make use of multimodal information leading to improved performance in constituency grammar induction. However, as compared to dependency grammars, constituency grammars do not provide a straightforward way to incorporate visual information without enforcing language… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 29 REFERENCES

Visually Grounded Compound PCFGs

This work studies visually grounded grammar induction and learns a constituency parser from both unlabeled text and its visual groundings, and shows that using an extension of probabilistic context-free grammar model, it can do fully-differentiable end-to-end visually grounded learning.

Unsupervised Learning of Syntactic Structure with Invertible Neural Projections

A novel generative model is proposed that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior so long as the prior is well-behaved.

Visually Grounded Neural Syntax Acquisition

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking at

Dependency Grammar Induction with a Neural Variational Transition-based Parser

A neural transition-based parser for dependency grammar induction, whose inference procedure utilizes rich neural features with time complexity, and achieves performance comparable to graph-based models, both on the English Penn Treebank and on the Universal Dependency Treebank.

Constituency Parsing with a Self-Attentive Encoder

It is demonstrated that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser, and it is found that separating positional and content information in the encoder canlead to improved parsing accuracy.

CRF Autoencoder for Unsupervised Dependency Parsing

An unsupervised dependency parsing model based on the CRF autoencoder that is discriminative and globally normalized which allows us to use rich features as well as universal linguistic priors and evaluated the performance of the model on eight multilingual treebanks.

The Return of Lexical Dependencies: Neural Lexicalized PCFGs

Novel neural models of lexicalized PCFGs are presented that allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model and results in stronger results on both representations than achieved when modeling either formalism alone.

Head-Driven Statistical Models for Natural Language Parsing

  • M. Collins
  • Computer Science
    Computational Linguistics
  • 2003
Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

Unified Vision-Language Pre-Training for Image Captioning and VQA

VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions and VQA 2.0.

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency

This work presents a generative model for the unsupervised learning of dependency structures and describes the multiplicative combination of this dependency model with a model of linear constituency that works and is robust cross-linguistically.