• Corpus ID: 9506832

THE IMPORTANCE OF SUBWORD EMBEDDINGS IN SENTENCE PAIR MODELING

@inproceedings{Lan2018THEIO,
  title={THE IMPORTANCE OF SUBWORD EMBEDDINGS IN SENTENCE PAIR MODELING},
  author={Wuwei Lan and Wei Xu},
  booktitle={NAACL 2018},
  year={2018}
}
Sentence pair modeling is critical for many NLP tasks, such as paraphrase identification, semantic textual similarity, and natural language inference. Most state-of-the-art neural models for these tasks rely on pretrained word embedding and compose sentence-level semantics in varied ways; however, few works have attempted to verify whether we really need pretrained embeddings in these tasks. In this paper, we study how effective subwordlevel (character and character n-gram) representations are… 

Tables from this paper

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering
TLDR
It is shown that encoding contextual information by LSTM and inter-sentence interactions are critical and the Enhanced Sequential Inference Model is the best so far for larger datasets, while the Pairwise Word Interaction Model achieves the best performance when less data is available.
Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences
TLDR
This paper proposes Co-Stack Residual Affinity Networks (CSRAN), a new and universal neural architecture for this problem, and introduces a new bidirectional alignment mechanism that learns affinity weights by fusing sequence pairs across stacked hierarchies.

References

SHOWING 1-10 OF 29 REFERENCES
Inter-Weighted Alignment Network for Sentence Pair Modeling
TLDR
A model to measure the similarity of a sentence pair focusing on the interaction information is proposed and the word level similarity matrix is utilized to discover fine-grained alignment of two sentences.
Towards Universal Paraphrastic Sentence Embeddings
TLDR
This work considers the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database, and compares six compositional architectures, finding that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data.
ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs
TLDR
This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences and proposes three attention schemes that integrate mutual influence between sentences into CNNs; thus, the representation of each sentence takes into consideration its counterpart.
From Paraphrase Database to Compositional Paraphrase Model and Back
TLDR
This work proposes models to leverage the phrase pairs from the Paraphrase Database to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB’s internal scores while simultaneously improving its coverage.
Boosting Named Entity Recognition with Neural Character Embeddings
TLDR
This work proposes a language-independent NER system that uses automatically learned features only and demonstrates that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features.
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
TLDR
This work presents a novel bi-LSTM model, which combines the POS tagging loss function with an auxiliary loss function that accounts for rare words, which obtains state-of-the-art performance across 22 languages, and works especially well for morphologically complex languages.
Character-Aware Neural Language Models
TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
Neural Paraphrase Identification of Questions with Noisy Pretraining
TLDR
A variant of the decomposable attention model is shown to results in accurate performance on the problem of paraphrase identification of questions, while being far simpler than many competing neural architectures.
From Characters to Words to in Between: Do We Capture Morphology?
TLDR
None of the character-level models match the predictive accuracy of a model with access to true morphological analyses, even when learned from an order of magnitude more data.
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
TLDR
A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.
...
1
2
3
...