R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

  title={R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling},
  author={Xiang Hu and Haitao Mi and Zujie Wen and Yafang Wang and Yi Su and Jing Zheng and Gerard de Melo},
Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. In this paper, we propose a recursive Transformer model based on differentiable CKY style binary trees to emulate this composition process, and we extend the bidirectional language model pre-training… 

Figures and Tables from this paper

Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

This paper uses a top-down parser as a model-guided pruning method, which also enables parallel encoding during inference, and proposes a unified R2D2 method that overcomes local optima and slow inference issues.

Forming Trees with Treeformers

Treeformer is introduced, an architecture inspired by the CKY algorithm and Transformer which learns a composition operator and pooling function in order to construct hierarchical encodings for phrases and sentences.

Learning with Latent Structures in Natural Language Processing: A Survey

This work surveys three main families of methods to learn surrogate gradients, continuous relaxation, and marginal likelihood maximization via sampling to incorporate better inductive biases for improved end-task performance and better interpretability.



Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

The novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

Learning to Compose Words into Sentences with Reinforcement Learning

Reinforcement learning is used to learn tree-structured neural networks for computing representations of natural language sentences and it is shown that while they discover some linguistically intuitive structures, they are different than conventional English syntactic structures.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Unsupervised Recurrent Neural Network Grammars

An inference network parameterized as a neural CRF constituency parser is developed to maximize the evidence lower bound and apply amortized variational inference to unsupervised learning of RNNGs.

Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart, is introduced.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders

DIORA is introduced, a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree that outperforms previously reported results for unsupervised binary constituency parsing on the benchmark WSJ dataset.

The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization

A new model, the Forest Convolutional Network, is introduced that avoids all of the challenges of current recursive neural network approaches for computing sentence meaning, by taking a parse forest as input, rather than a single tree, and by allowing arbitrary branching factors.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Grammar Induction with Neural Language Models: An Unusual Replication

It is found that this model represents the first empirical success for latent tree learning, and that neural network language modeling warrants further study as a setting for grammar induction.