Estimation of Gap Between Current Language Models and Human Performance

@inproceedings{Shen2017EstimationOG,
  title={Estimation of Gap Between Current Language Models and Human Performance},
  author={Xiaoyu Shen and Youssef Oualil and Clayton Greenberg and Mittul Singh and Dietrich Klakow},
  booktitle={INTERSPEECH},
  year={2017}
}
Language models (LMs) have gained dramatic improvement in the past years due to the wide application of neural networks. This raises the question of how far we are away from the perfect language model and how much more research is needed in language modelling. As for perplexity giving a value for human perplexity (as an upper bound of what is reasonably expected from an LM) is difficult. Word error rate (WER) has the disadvantage that it also measures the quality of other components of a speech… 

Figures and Tables from this paper

Improving Language Model Performance with Smarter Vocabularies
TLDR
This article will explore using part-of-speech (POS) tagging to identify word types and then use this information to create a “smarter” vocabulary that achieves a lower perplexity score, for a given epoch, than a similar model using a top-N type vocabulary.
AS DISCRETE LATENT VARIABLES
TLDR
It is found that extracting words as latent variables significantly outperforms the state-of-the-art discrete latent variable models such as VQ-VAE and there is a hierarchy in language such that an entire text can be predicted much more easily based on a sequence of a small number of keywords, which can be easily found by classical methods as tf-idf.
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
TLDR
It is shown that a low-skilled threat model can be built just by combining publicly available LMs and show that the produced fake reviews can fool both humans and machines.
Closing Brackets with Recurrent Neural Networks
TLDR
This work investigates whether recurrent neural networks are capable of learning the rules of opening and closing brackets by applying them to synthetic Dyck languages that consist of different types of brackets, and provides an analysis of the statistical properties of these languages as a baseline.
Extractive Summary as Discrete Latent Variables
TLDR
It is found that extracting tokens as latent variables significantly outperforms the state-of-the-art discrete latent variable models such as VQ-VAE and is speculated that this extraction process may be useful for unsupervised hierarchical text generation.
ATA AND S ELF-TRAINING
TLDR
This work first train a data-to-English text generation system, before employing techniques in unsupervised neural machine translation and self-training to establish the Pidgin-toEnglish cross-lingual alignment.
Diversifying Dialogue Generation with Non-Conversational Text
TLDR
This paper collects a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets and presents a training paradigm to effectively incorporate these text via iterative back translation.
Unsupervised Pidgin Text Generation By Pivoting English Data and Self-Training
TLDR
This work develops techniques targeted at bridging the gap between Pidgin English and English in the context of natural language generation, and first train a data-to-English text generation system, before employing techniques in unsupervised neural machine translation and self-training to establish the Pidgan-to -English cross-lingual alignment.
Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer
TLDR
This work proposes leveraging a unified text-to-text Transformer for LJP, where the dependencies among sub-tasks can be naturally established within the auto-regressive decoder, and shows that this unified transformer, albeit pretrained on general-domain text, outperforms pretrained models tailored specifically for the legal domain.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
English Conversational Telephone Speech Recognition by Humans and Machines
TLDR
An independent set of human performance measurements on two conversational tasks are performed and it is found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve.
One billion word benchmark for measuring progress in statistical language modeling
TLDR
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.
Towards improved language model evaluation measures
TLDR
New measures of language model quality are described that retain the ease of computation and task independence that are perplexity’s strengths, yet are considerably better correlated with word error rate.
Achieving Human Parity in Conversational Speech Recognition
TLDR
The human error rate on the widely used NIST 2000 test set is measured, and the latest automated speech recognition system has reached human parity, establishing a new state of the art, and edges past the human benchmark.
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Recurrent neural network based language model
TLDR
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Exploring the Limits of Language Modeling
TLDR
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
Testing the correlation of word error rate and perplexity
Strategies for training large scale neural network language models
TLDR
This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.
Generating Sentences from a Continuous Space
TLDR
This work introduces and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences that allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features.
...
...