• Corpus ID: 182952667

A Survey on Neural Network Language Models

  title={A Survey on Neural Network Language Models},
  author={Kun Jing and Jungang Xu},
As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences. [] Key Method The structure of classic NNLMs is described firstly, and then some major improvements are introduced and analyzed. We summarize and compare corpora and toolkits of NNLMs. Further, some research directions of NNLMs are discussed.

Figures and Tables from this paper

Word Embedding Evaluation in Downstream Tasks and Semantic Analogies

The results show that a diverse and comprehensive corpus can often outperform a larger, less textually diverse corpus, and that batch training may cause quality loss in THE AUTHORS models.

Survey of Neural Text Representation Models

This survey systematize and analyze 50 neural models from the last decade, focusing on task-independent representation models, discuss their advantages and drawbacks, and subsequently identify the promising directions for future neural text representation models.

A Study of Pre-trained Language Models in Natural Language Processing

It is expected this article will provide a practical guide for learners to understanding, using and developing PLMs with the abundant literature existing for various NLP tasks.

SG-Drop: Faster Skip-Gram by Dropping Context Words 1

The SG-Drop model is designed to reduce training time efficiently and allows controlling training time with its hyperparameter, which could train word embedding faster than reducing training epochs while better preserving the quality.

Computational Modeling of Agglutinative Languages: The Challenge for Southern Bantu Languages

This work focuses on the adoption of sub-word models for the Southern Bantu languages, which are agglutinative languages that have words built out of distinctly identifiable sub-parts that carry specific meanings and functions.

Improving the Performance of the LSTM and HMM Models via Hybridization

This work investigates the effectiveness of a combination of the Hidden Markov Model (HMM) with the Long Short-Term Memory (LSTM) model via a process known as hybridization, which is introduced in this paper.

Statistical machine translation outperforms neural machine translation in software engineering: why and how

This work provides a hypothesis that SE corpus has inherent characteristics that NMT will confront challenges compared to the state-of-the-art translation engine based on Statistical Machine Translation, and implements and optimize the original SMT and NMT to mitigate those challenges.

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification

The proposed autoBOT method offers competitive classification performance on fourteen real-world classification tasks when compared against a competitive autoML approach that evolves ensemble models, as well as state-of-the-art neural language models such as BERT and RoBERTa.

A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization

The use of pre-trained language models in complement with neural network architecture for abstractive summarization task is suggested, finding a transformer-based encoder-decoder architecture to be the new state-of-the-art.



Impact of Word Classing on Recurrent Neural Network Language Model

In experiments with a standard test set, it is found that 5% 7% relative reduction in perplexity (PPL) could be obtained by the Brown algorithm, compared to the frequency-based word-classing method.

LSTM Neural Networks for Language Modeling

This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Factored Neural Language Models

A new type of neural probabilistic language model is presented that learns a mapping from both words and explicit word features into a continuous space that is then used for word prediction and significantly reduces perplexity on sparse-data tasks.

Factored Language Model based on Recurrent Neural Network

This study extends RNNLM by explicitly integrating additional linguistic information, including morphological, syntactic, or semantic factors, that is expected to enhance RNNLMs.

Structured Output Layer Neural Network Language Models for Speech Recognition

A novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM is extended, able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs.

Character-Aware Neural Language Models

A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

Character-Word LSTM Language Models

We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the

A Neural Knowledge Language Model

A Neural Knowledge Language Model (NKLM) which combines symbolic knowledge provided by a knowledge graph with the RNN language model, and shows that the NKLM significantly improves the perplexity while generating a much smaller number of unknown words.

RNNLM - Recurrent Neural Network Language Modeling Toolkit

We present a freely available open-source toolkit for training recurrent neural network based language models. It can be easily used to improve existing speech recognition and machine translation

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.