• Corpus ID: 49434227

Enhancing Sentence Embedding with Generalized Pooling

  title={Enhancing Sentence Embedding with Generalized Pooling},
  author={Qian Chen and Zhenhua Ling and Xiao-Dan Zhu},
Pooling is an essential component of a wide variety of sentence representation and embedding models. This paper explores generalized pooling methods to enhance sentence embedding. We propose vector-based multi-head attention that includes the widely used max pooling, mean pooling, and scalar self-attention as special cases. The model benefits from properly designed penalization terms to reduce redundancy in multi-head attention. We evaluate the proposed model on three different tasks: natural… 

Figures and Tables from this paper

Enhancing sentence embedding with dynamic interaction

A new dynamic interaction method for improving the final sentence representation that aims to make the states of the last layer more conducive to the next classification layer by introducing some constraint from theStates of the previous layers.

Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture

This model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings' ability to capture some of the important linguistic properties of sentences.

Generalize Sentence Representation with Self-Inference

Experimental results demonstrate that the proposed Self Inference Neural Network model sets a new state-of-the-art on MultiNLI, Scitail and is competitive on the remaining two datasets over all sentence encoding methods.

Vector-Based Attentive Pooling for Text-Independent Speaker Verification

A vector-based attentive pooling method, which adopts vectorial attention instead of scalar attention, which can extract finegrained features for discriminating different speakers in text-independent speaker verification.

Sentence embeddings in NLI with iterative refinement encoders

This work proposes a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural language Inference.

Enhanced-RCNN: An Efficient Method for Learning Sentence Similarity

Experimental results show that the enhanced recurrent convolutional neural network model (Enhanced-RCNN) outperforms the baselines and achieves the competitive performance on two real-world paraphrase identification datasets.

REGMAPR - Text Matching Made Easy

REGMAPR achieves state-of-the-art results for paraphrase detection on the SICK dataset and for textual entailment on the SNLI dataset among models that do not use inter-sentence attention.

Cell-aware Stacked LSTMs for Modeling Sentences

The suggested architecture accepts both hidden and memory cell states of the preceding layer and fuses information from the left and the lower context using the soft gating mechanism of LSTMs to modulate the amount of information to be delivered not only in horizontal recurrence but also in vertical connections.

Multilingual NMT with a Language-Independent Attention Bridge

A new framework for the efficient development of multilingual neural machine translation (NMT) using this model and scheduled training and achieves substantial improvements over strong bilingual models and performs well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning.

Improving Self-Attention Networks With Sequential Relations

Experiments in natural language inference, machine translation and sentiment analysis tasks show that the sequential relation modeling helps self-attention networks outperform existing approaches.



Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention

A sentence encoding-based model for recognizing text entailment that utilized the sentence's first-stage representation to attend words appeared in itself, which is called "Inner-Attention" in this paper.

A Structured Self-attentive Sentence Embedding

A new model for extracting an interpretable sentence embedding by introducing self-attention is proposed, which uses a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.

A Convolutional Neural Network for Modelling Sentences

A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference

This work presents a simple sequential sentence encoder based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings that achieves the new state-of-the-art encoding result on the original SNLI dataset.

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

This paper describes a model (alpha) that is ranked among the top in the Shared Task, on both the in- domain test set and on the cross-domain test set, demonstrating that the model generalizes well to theCross-domain data.

A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference

A new compare-propagate architecture is introduced where alignments pairs are compared and then propagated to upper layers for enhanced representation learning, and novel factorization layers are adopted for efficient compression of alignment vectors into scalar valued features, which are then used to augment the base word representations.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

Distraction-based neural networks for modeling documents

This paper proposes neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.