Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

  title={Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones},
  author={Zhenisbek Assylbekov and Rustem Takhanov and Bagdat Myrzakhmetov and Jonathan North Washington},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster. 

Figures and Tables from this paper

Revisiting Neural Language Modelling with Syllables

With a comparable perplexity, it is shown that syllables outperform characters, annotated morphemes and unsupervised subwords and the overlapping of syllables concerning other subword pieces.

Reusing Weights in Subword-Aware Neural Language Models

The best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

Revisiting Syllables in Language Modelling and Their Application on Low-Resource Machine Translation

In pairwise and multilingual systems, syllables outperform unsupervised subwords, and further morphological segmentation methods, when translating into a highly synthetic language with a transparent orthography (Shipibo-Konibo).

Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

The study plays a crucial role in the processing of agglutinative and syllabic-based languages by contributing quality word embeddings from syllables, a robust Conv–LSTM model that learns syllables for not only language modeling and POS tagging, but also for other downstream NLP tasks.

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Inspired by the learning methodology of Swahili in beginner classes, respective syllables were encoded instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network.

Major–Minor Long Short-Term Memory for Word-Level Language Model

The LM with MMLSTMs surpasses the existing state-of-the-art model on Penn Treebank and WikiText-2 data sets and outperforms the baseline by 3.3 points in perplexity onWikiText-103 data set without increasing model parameter counts.

Subword-level Word Vector Representations for Korean

This paper decomposes Korean words into the jamo-level, beyond the character- level, allowing a systematic use of subword information, and shows that the simple method outperforms word2vec and character-level Skip-Grams on semantic and syntactic similarity and analogy tasks and contributes positively toward downstream NLP tasks such as sentiment analysis.

Revisiting CNN for Highly Inflected Bengali and Hindi Language Modeling

This is the first study on the effectiveness of different architectures drawn from three deep learning paradigms Convolution, Recurrent, and Transformer neural nets for modeling two widely used languages, Bengali and Hindi.

CNN for Modeling Sanskrit Originated Bengali and Hindi Language

This first study on the effectiveness of different architectures from Convolution, Recurrent, and Transformer neural net paradigm for modeling Bengali and Hindi outperforms pretrained BERT with 16X less parameters and achieves much better performance than SOTA LSTMs on multiple real-world datasets.

Topics in Natural Language Processing Japanese Morphological Analysis

The various methods that have been proposed are introduced, information of Japanese corpora and dictionaries for NLP research is collected, several morphological analysers on Japanese lemmatisation task are evaluated, and future directions based on recurrent neural networks language modelling are proposed.



Character-Word LSTM Language Models

We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the


A simple technique for learning sub-word level units from data is proposed, and it is shown that neural network based models can be order of magnitude smaller than compressed n-g ram models, at the same level of performance when applied to a Bro dcast news RT04 speech recognition task.

Character-Aware Neural Language Models

A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.

From Characters to Words to in Between: Do We Capture Morphology?

None of the character-level models match the predictive accuracy of a model with access to true morphological analyses, even when learned from an order of magnitude more data.

Strategies for Training Large Vocabulary Neural Language Models

A systematic comparison of strategies to represent and train large vocabularies, includingsoftmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization, and extends selfnormalization to be a proper estimator of likelihood and introduce an efficient variant of softmax.

Compositional Morphology for Word Representations and Language Modelling

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model, and performs both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that the model learns morphological representation that both perform well on word similarity tasks and lead to substantial reductions in perplexity.

Co-learning of Word Representations and Morpheme Representations

This paper introduces the morphological knowledge as both additional input representation and auxiliary supervision to the neural network framework and will produce morpheme representations, which can be further employed to infer the representations of rare or unknown words based on their morphological structure.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.