• Publications
  • Influence
DyNet: The Dynamic Neural Network Toolkit
DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language.
A Dependency Parser for Tweets
A new dependency parser for English tweets, TWEEBOPARSER, which builds on several contributions: new syntactic annotations for a corpus of tweets, with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data.
Episodic Memory in Lifelong Language Learning
This work proposes an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.
Segmental Recurrent Neural Networks
Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that segmental recurrent neural networks obtain substantially higher accuracies compared to models that do not explicitly represent segments.
What Do Recurrent Neural Network Grammars Learn About Syntax?
By training grammars without nonterminal labels, it is found that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.
Document Context Language Models
A set of multi-level recurrent neural network language models, called Document-Context Language Models (DCLM), which incorporate contextual information both within and beyond the sentence, are presented and empirically evaluated.
Segmental Recurrent Neural Networks for End-to-End Speech Recognition
Practical training and decoding issues as well as the method to speed up the training in the context of speech recognition are discussed, and the model is self-contained and can be trained end-to-end.
A Mutual Information Maximization Perspective of Language Representation Learning
We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a
Random Feature Attention
RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, is proposed and explored, showing that RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets.
Learning and Evaluating General Linguistic Intelligence
This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.