Bridging CNNs, RNNs, and Weighted Finite-State Machines

  title={Bridging CNNs, RNNs, and Weighted Finite-State Machines},
  author={Roy Schwartz and Sam Thomson and Noah A. Smith},
Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances. In this paper we present SoPa, a new model that aims to bridge these two approaches. SoPa combines neural representation learning with weighted finite-state automata (WFSAs) to learn a soft version of traditional surface patterns. We show that SoPa is an extension of a one-layer CNN, and that such CNNs are equivalent to a restricted version… 

Figures and Tables from this paper

Rational Recurrences

It is shown that several recent neural models use rational recurrences, and one such model is presented, which performs better than two recent baselines on language modeling and text classification and demonstrates that transferring intuitions from classical models like WFSAs can be an effective approach to designing and understanding neural models.

RNN Architecture Learning with Sparse Regularization

This work applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state automata (WFSAs) and shows that sparsifying such models makes them easier to visualize, and presents models that rely exclusively on as few as three WFSAs after pruning more than 90% of the weights.

A Formal Hierarchy of RNN Architectures

It is hypothesized that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy, and empirical results to support this conjecture are provided.

Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

This work presents a method to extract a weighted finite automaton (WFA) from a recurrent neural network (RNN) based on the WFA learning algorithm by Balle and Mohri, which is an extension of Angluin's classic L* algorithm.

Neural Finite-State Transducers: Beyond Rational Relations

Neural finite state transducers are introduced, a family of string transduction models defining joint and conditional probability distributions over pairs of strings that compete favorably against seq2seq models while offering interpretable paths that correspond to hard monotonic alignments.

Cold-start and Interpretability: Turning Regular Expressions into Trainable Recurrent Neural Networks

FA-RNNs are proposed, a type of recurrent neural networks that combine the advantages of neural networks and regular expression rules that significantly outperform previous neural approaches in both zero-shot and low-resource settings and remain very competitive in rich- resource settings.

The Role of Interpretable Patterns in Deep Learning for Morphology

A modified version of the standard sequence-to-sequence model, where the encoder is a pattern matching network, and each pattern scores all possible N character long subwords on the source side, and the highest scoring subword's score is used to initialize the decoder as well as the input to the attention mechanism.

The Surprising Computational Power of Nondeterministic Stack RNNs

This paper shows that nondeterminism and the neural controller interact to produce two more unexpected abilities of the nondeterministic stack RNN, which can recognize not only CFLs, but also many non-context-free languages.

Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

This paper presents R ETO M ATON – retrieval automaton, which approximates the datastore search, based on (1) saving pointers between consecutivedatastore entries, and (2) clustering of entries into “states”.

Higher-order Derivatives of Weighted Finite-state Machines

This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines and provides a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature.



Recurrent Neural Networks as Weighted Language Recognizers

It is shown that approximations and heuristic algorithms are necessary in practical applications of single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications.

Recurrent Additive Networks

Recurrent additive networks are introduced, a new gated RNN which is distinguished by the use of purely additive latent state updates, and it is formally shown that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums.

Convolutional Sequence to Sequence Learning

This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.

Weighting Finite-State Transductions With Neural Context

This work proposes to keep the traditional architecture, which uses a finite-state transducer to score all possible output strings, but to augment the scoring function with the help of recurrent networks, and defines a probability distribution over aligned output strings in the form of a weighted finite- state automaton.

Character-Aware Neural Language Models

A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

A Primer on Neural Network Models for Natural Language Processing

This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

This work presents a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time.

Modeling Skip-Grams for Event Detection with Convolutional Neural Networks

This work proposes to improve the current CNN models for ED by introducing the non-consecutive convolution, leading to the significant performance improvement over the current state-of-the-art systems.

Representation of Linguistic Form and Function in Recurrent Neural Networks

A method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks is proposed and shows that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence.

Understanding Neural Networks through Representation Erasure

This paper proposes a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words.