A Primer on Neural Network Models for Natural Language Processing

@article{Goldberg2016APO,
  title={A Primer on Neural Network Models for Natural Language Processing},
  author={Yoav Goldberg},
  journal={ArXiv},
  year={2016},
  volume={abs/1510.00726}
}
Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in fields such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with… 

Neural Network Methods for Natural Language Processing

TLDR
This book focuses on the application of neural network models to natural language data, and introduces more specialized neural network architectures, including 1D convolutional neural networks, recurrent neural Networks, conditioned-generation models, and attention-based models.

An analysis of convolutional neural networks for sentence classification

TLDR
A series of experiments with Convolutional Neural Networks for sentence-level classification tasks with different hyperparameter settings and how sensitive model performance is to changes in these configurations are shown.

WORD BASED PREDICTION USING LSTM NEURAL NETWORKS FOR LANGUAGE MODELING

  • Computer Science
  • 2021
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Natural language generation as neural sequence learning and beyond

TLDR
A deep reinforcement learning framework is proposed to inject prior knowledge into neural based NLG models and apply it to sentence simplification, and a novel hierarchical recurrent neural network architecture to represent and generate multiple sentences is proposed.

Empirical Exploration of Novel Architectures and Objectives for Language Models

TLDR
An empirical comparison of LSTM and CNN language models on English broadcast news and various conversational telephone speech transcription tasks is presented and a novel criterion for training language models that combines word and class prediction in a multi-task learning framework is proposed.

Chapter 1 Analyzing Neural Network Optimization with Gradient Tracing

  • Brian DuSell
  • Computer Science
  • 2018
TLDR
In this chapter, the effect of neural network components on trainability is explored with a graph-traversal technique dubbed “gradient tracing,” which analyzes the extent to which certain components influence parameter updates and facilitate learning.

Natural Language Understanding with Distributed Representation

TLDR
This lecture note introduces readers to a neural network based approach to natural language understanding/processing and spends much time on describing basics of machine learning and neural networks, only after which how they are used for natural languages is introduced.

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

TLDR
It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.

Neural information retrieval: at the end of the early years

TLDR
The successes of neural IR thus far are highlighted, obstacles to its wider adoption are cataloged, and potentially promising directions for future research are suggested.

Survey of Neural Text Representation Models

TLDR
This survey systematize and analyze 50 neural models from the last decade, focusing on task-independent representation models, discuss their advantages and drawbacks, and subsequently identify the promising directions for future neural text representation models.
...

References

SHOWING 1-10 OF 199 REFERENCES

LSTM Neural Networks for Language Modeling

TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Statistical Language Models Based on Neural Networks

TLDR
Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently and achieves the best published performance on well-known Penn Treebank setup.

Recurrent neural network based language model

TLDR
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic

Character-Aware Neural Language Models

TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

Natural Language Understanding with Distributed Representation

TLDR
This lecture note introduces readers to a neural network based approach to natural language understanding/processing and spends much time on describing basics of machine learning and neural networks, only after which how they are used for natural languages is introduced.

Learning Longer Memory in Recurrent Neural Networks

TLDR
This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

A Convolutional Neural Network for Modelling Sentences

TLDR
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

TLDR
It is shown that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

Joint Language and Translation Modeling with Recurrent Neural Networks

TLDR
This work presents a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words which shows competitive accuracy compared to the traditional channel model features.
...