Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study

  title={Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study},
  author={Raymond Hendy Susanto and Hai Leong Chieu and Wei Lu},
In this paper, we investigate case restoration for text without case information. Previous such work operates at the word level. We propose an approach using character-level recurrent neural networks (RNN), which performs competitively compared to language modeling and conditional random fields (CRF) approaches. We further provide quantitative and qualitative analysis on how RNN helps improve truecasing. 

Figures and Tables from this paper

Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

A fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model is proposed, the first of its kind for truecasing, which improves the performance of downstream NLP tasks such as named entity recognition and language modeling.

Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model

This work uses the truecaser to normalize user-generated text in a Federated Learning framework for language modeling, and demonstrates that the improvement translates to reduced prediction error rates in a virtual keyboard application.

Reproducing "ner and Pos When Nothing Is Capitalized"

It is shown that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper and suggesting that there might be some hidden factors impacting performance.

Capitalization Feature and Learning Rate for Improving NER Based on RNN BiLSTM-CRF

This experiment uses the deep learning approach with Recurrent Neural Network Bidirectional Long Short Term Conditional Random Field (RNN-BiLSTM-CRF) and comparing three optimization algorithms: Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), and Adadelta, with the CoNLL2003 dataset.

An Efficient Architecture for Predicting the Case of Characters using Sequence Models

This paper attempts to solve the problem by restoring the correct case of characters, commonly known as Truecasing by using a combination of convolutional neural networks, bi-directional long short-term memory networks and conditional random fields, which work at a character level without any explicit feature engineering.

Do Character-Level Neural Network Language Models Capture Knowledge of Multiword Expression Compositionality?

Experimental results on two kinds of MWEs and two languages suggest that character-level neural network language models capture knowledge of multiword expression compositionality, in particular for English noun compounds and the particle component of English verb-particle constructions.

ner and pos when nothing is capitalized

This work shows that the most effective strategy is a concatenation of cased and lowercased training data, producing a single model with high performance on both case and uncased text, and this result holds across tasks and input representations.

Case-Sensitive Neural Machine Translation

Two types of case-sensitive neural machine translation (NMT) approaches are introduced to alleviate the above problems: i) adding case tokens into the decoding sequence, and ii) adopting case prediction to the conventional NMT.

Truecasing German user-generated conversational text

It is shown that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixed- case words where their fall-back mechanisms come into play.

Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text

This paper proposes to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output by chunking input sentences and merging output sequences, and shows that the method proposed delivers the best results.



Character-Aware Neural Language Models

A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

Capitalizing Machine Translation

A probabilistic bilingual capitalization model for capitalizing machine translation outputs using conditional random fields significantly outperforms a strong monolingual capitalization models baseline, especially when working with small datasets and/or European language pairs.

Visualizing and Understanding Recurrent Networks

This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

Recurrent neural network based language model

Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

Learning long-term dependencies with gradient descent is difficult

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

ResToRinG CaPitaLiZaTion in #TweeTs

The truecasing method shows an improvement in named entity recognition and part-of-speech tagging tasks and a statistical truecaser for tweets using a 3-gram language model built with truecased newswire texts and tweets.

Effective Approaches to Attention-based Neural Machine Translation

A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus, and explores several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking.

Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo