• Corpus ID: 11512678

A Convolutional Architecture for Word Sequence Prediction

@inproceedings{Wang2015ACA,
  title={A Convolutional Architecture for Word Sequence Prediction},
  author={Mingxuan Wang and Zhengdong Lu and Hang Li and Wenbin Jiang and Qun Liu},
  year={2015}
}
We propose a convolutional neural network, named genCNN, for word sequence prediction. Different from previous work on neural networkbased language modeling and generation (e.g., RNN or LSTM), we choose not to greedily summarize the history of words as a fixed length vector. Instead, we use a convolutional neural network to predict the next word with the history of words of variable length. Also different from the existing feedforward networks for language modeling, our model can effectively… 

Figures and Tables from this paper

An Empirical Study of Language CNN for Image Captioning
TLDR
This paper introduces a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning, and is competitive with the state-of-the-art methods.
Recurrent neural network language models for automatic speech recognition
TLDR
A novel pre-training method for RNNLMs is presented, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM, and the context-enhancement of RNNlMs is enhanced using prosody and syntactic features.
Recurrent Highway Networks with Language CNN for Image Captioning
TLDR
This model can naturally exploit the hierarchical and temporal structure of history words, which are critical for image caption generation, and is competitive with the state-of-the-art methods.
Character-Level Quantum Mechanical Approach for a Neural Language Model
TLDR
A character-level neural language model (NLM) that is based on quantum theory that integrates a convolutional neural network that isbased on network-in-network (NIN) and the quantum semantic space model.
Learning Semantically Coherent and Reusable Kernels in Convolution Neural Nets for Sentence Classification
TLDR
Experimental results show the usefulness of the core ideas of learning semantically coherent kernels and leveraging reusable kernels for efficient learning on several benchmark datasets by achieving performance close to the state-of-the-art methods but with semantic and reusable properties.
Locally Smoothed Neural Networks
TLDR
A novel locally smoothed neural network (LSNN) is proposed, which is to represent the weight matrix of the locally connected layer as the product of the kernel and the smoother, where the kernel is shared over different local receptive fields, and the smoothness is for determining the importance and relations of differentLocal receptive fields.
Topic Augmented Neural Network for Short Text Conversation
TLDR
The extensive evaluation of TANN, using large human annotated data sets, shows that TANN outperforms simple neutral network methods, while beating other typical matching models with a large margin.
Topic Augmented Neural Network for Retrieval-based Chatbots
TLDR
The extensive evaluation of TANN, using large human annotated data sets, shows that TANN outperforms simple neutral network methods, while beating other typical matching models with a large margin.
Response selection with topic clues for retrieval-based chatbots
Deep Learning applied to NLP
TLDR
This paper will try to explain the basics of CNNs, its different variations and how they have been applied to NLP.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
A Convolutional Neural Network for Modelling Sentences
TLDR
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.
Joint Language and Translation Modeling with Recurrent Neural Networks
TLDR
This work presents a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words which shows competitive accuracy compared to the traditional channel model features.
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Convolutional Neural Network Architectures for Matching Natural Language Sentences
TLDR
Convolutional neural network models for matching two sentences are proposed, by adapting the convolutional strategy in vision and speech and nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling.
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Recurrent Continuous Translation Models
We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences
Decoding with Large-Scale Neural Language Models Improves Translation
TLDR
This work develops a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and incorporates it into a machine translation system both by reranking k-best lists and by direct integration into the decoder.
Improving deep neural networks for LVCSR using rectified linear units and dropout
TLDR
Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.
Domain Adaptation via Pseudo In-Domain Data Selection
TLDR
The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding.
Efficient Estimation of Word Representations in Vector Space
TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
...
...