• Corpus ID: 9506393

Neural Network Language Model for Chinese Pinyin Input Method Engine

@inproceedings{Chen2015NeuralNL,
  title={Neural Network Language Model for Chinese Pinyin Input Method Engine},
  author={Shenyuan Chen and Zhao Hai and Rui Wang},
  booktitle={PACLIC},
  year={2015}
}
Neural network language models (NNLMs) have been shown to outperform traditional ngram language model. [] Key Method In this paper, an efficient solution is proposed by converting NNLMs into back-off n-gram language models, and we integrate the converted NNLM into pinyin IME. Our experimental results show that the proposed method gives better decoding predictive performance for pinyin IME with satisfied efficiency.

Figures and Tables from this paper

Open Vocabulary Learning for Neural Chinese Pinyin IME
TLDR
The proposed neural P2C conversion model augmented by an online updated vocabulary with a sampling mechanism to support open vocabulary learning during IME working outperforms commercial IMEs and state-of-the-art traditional models on standard corpus and true inputting history dataset in terms of multiple metrics.
Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary
TLDR
A neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism is proposed that reduces the decoding time on CPUs up to 50$\%$ on pinyin-to-character tasks at the same or only negligible change in conversion accuracy.
Pinyin-to-Chinese conversion on sentence-level for domain-specific applications using self-attention model
TLDR
A neural self-attention model is proposed for Pinyin Sequence to Chinese Sequence (PS2CS) conversion method, which directly infers the entire Chinese sequence by feeding the unsegmented pinyin character sequence into.
KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction
TLDR
KNPTC is applied to correct typos in real-life datasets, which achieves 32.77% increment on average in accuracy rate of typo correction compared against the state-of-the-art system.
Neural or Statistical: An Empirical Study on Language Models for Chinese Input Recommendation on Mobile
TLDR
An extensive empirical study is conducted to show the differences between statistical and neural language models, and shows that the two different approach have individual advantages, and a hybrid approach will bring a significant improvement.
Enabling Real-time Neural IME with Incremental Vocabulary Selection
TLDR
This work articulate the bottleneck of neural IME decoding to be the heavy softmax computation over a large vocabulary and proposes an approach that incrementally builds a subset vocabulary from the word lattice that is potentially applicable to other incremental sequence-to-sequence decoding tasks such as real-time continuous speech recognition.
Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
TLDR
The proposed neural P2C model is learned by encoding previous input utterance as extra context to enable an IME capable of predicting character sequence with incomplete pinyin input, which demonstrates the first engineering practice of building Chinese aided IME.
Moon IME: Neural-based Chinese Pinyin Aided Input Method with Customizable Association
TLDR
The Moon IME is presented, a pinyin IME that integrates the attention-based neural machine translation (NMT) model and Information Retrieval (IR) to offer amusive and customizable association ability.
Real-time Neural-based Input Method
TLDR
This work applies a LSTM-based language model to input method and evaluates its performance for both prediction and conversion tasks with Japanese BCCWJ corpus and proposes incremental softmax approximation approach, which computes softmax with a selected subset vocabulary and fixes the stale probabilities when the vocabulary is updated in future steps.
Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification
TLDR
This work proposes a novel model called Moto: Enhancing Embedding with Multiple Joint Factors, designed an attention mechanism to distill the useful parts by fusing the four-level information above more effectively, and conducts extensive experiments on four popular tasks.
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
TLDR
The results show that the proposed novel neural network based bilingual LM growing method can improve both the perplexity score for LM evaluation and BLEU score for SMT, and significantly outperforms the existing LM growing methods without extra corpus.
Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
TLDR
This work proposes a method for converting CSLMs into back-off n-gram language models (BNLMs) so that they can be used in the first pass decoding of SMT, and shows that they outperform the original BNLMs and are comparable with the traditional use of CSL Ms in reranking.
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition
TLDR
An approximate method for converting a feedforward NNLM into a back-off n-gram language model that can be used directly in existing LVCSR decoders is proposed and can be applied to any type of non-back-off language model to enable efficient decoding.
Training Continuous Space Language Models: Some Practical Issues
TLDR
This work studies the performance and behavior of two neural statistical language models so as to highlight some important caveats of the classical training algorithms, and introduces a new initialization scheme and new training techniques to greatly reduce the training time and to significantly improve performance.
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
TLDR
A novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window, which is purely lexicalized and can be integrated into any MT decoder.
A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction
TLDR
A joint graph model to globally optimize PTC and typo correction for IME and the evaluation results show that the proposed method outperforms both existing academic and commercial IMEs.
Decoding with Large-Scale Neural Language Models Improves Translation
TLDR
This work develops a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and incorporates it into a machine translation system both by reranking k-best lists and by direct integration into the decoder.
Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation
TLDR
A novel neural network based bilingual LM growing method that enables us to use bilingual parallel corpus for LM growing in SMT and shows that the new method outperforms the existing approaches on both SMT performance and computational efficiency significantly.
A New Statistical Approach To Chinese Pinyin Input
TLDR
This approach uses a trigram-based language model and a statistically based segmentation to deal with real input and includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which enables modeless PinyIn input.
A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion
TLDR
This paper introduces a new approach to solve the Chinese Pinyin-to-character (PTC) conversion problem that effectively combines the features of continuous source sequence and target sequence.
...
1
2
3
...