Representation learning for very short texts using weighted word embedding aggregation

@article{Boom2016RepresentationLF,
  title={Representation learning for very short texts using weighted word embedding aggregation},
  author={Cedric De Boom and Steven Van Canneyt and Thomas Demeester and B. Dhoedt},
  journal={Pattern Recognit. Lett.},
  year={2016},
  volume={80},
  pages={150-156}
}

Figures and Tables from this paper

Semantically Enriched Weighted Word Embedding for Short Text Representation
TLDR
An aggregated weighted word embedding representation called Semantically Enriched Weighted Word Embedding (SEWWE) for Short Text Representation is designed and outperforms other tf-idf based methods.
Short Texts Semantic Similarity Based on Word Embeddings
TLDR
This paper describes some experiments carried out to evaluate the performance of different forms of word embeddings and their aggregations in the task of measuring the similarity of short texts and test five approaches for aggregating words into text.
A Self-Training Approach for Short Text Clustering
TLDR
The method is proposed, which learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network.
A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings
TLDR
A set of experiments is carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts and indicates that extended methods perform better from the original in most of the cases.
Mining Summary of Short Text with Centroid Similarity Distance
TLDR
This paper shows that the centroid embeddings approach can be applied to short text to capture semantically similar sentences for summarization and demonstrates that this approach can outperform other methods on two annotated LREC track dataset.
Task-Optimized Word Embeddings for Text Classification Representations
TLDR
This paper proposes a supervised algorithm that produces a task-optimized weighted average of word embeddings for a given task, which combines the compactness and expressiveness of the word-embedding representations with theword-level insights of a BoW-type model, where weights correspond to actual words.
Short Text Representation Model Construction Method Based on Novel Semantic Aggregation Technology
TLDR
Compared with the existing short text semantic representation model, the representation model of short text, which is proposed in this paper, shows a high semantic representation ability in specific domain and in the open domain.
Exponential Word Embeddings: Models and Approximate Learning
TLDR
This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word, which is especially beneficial when noisy and little training data is available.
Selective word encoding for effective text representation
TLDR
This paper adapts a trainable orderless aggregation algorithm to obtain a more discriminative abstract representation for text representation and proposes an effective term-weighting scheme to compute the relative importance of words from the context based on their conjunction with the problem in an end-to-end learning manner.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Learning Semantic Similarity for Very Short Texts
TLDR
The conclusion is made that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments, which is a first step towards a hybrid method that combines the strength of dense distributed representations -- as opposed to sparse term matching -- with the strength to automatically reduce the impact of less informative terms.
Short Text Similarity with Word Embeddings
TLDR
This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.
Short Text Clustering via Convolutional Neural Networks
TLDR
The extensive experimental study on two public short text datasets shows that the deep feature representation learned by the proposed convolutional neural networks approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.
From Word Embeddings To Document Distances
TLDR
It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.
Distributed Representations of Sentences and Documents
TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
A Short Texts Matching Method Using Shallow Features and Deep Features
TLDR
A model to generate deep features, which describe the semantic relevance between short “text object”, is designed and achieves the state-of-the-art performance by using shallow features and deep features.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
#TagSpace: Semantic Embeddings from Hashtags
TLDR
A convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal that outperforms a number of baselines on a document recommendation task and is useful for other tasks as well.
Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
TLDR
A new deep convolutional neural network is proposed that exploits from characterto sentence-level information to perform sentiment analysis of short texts and achieves state-of-the-art results for single sentence sentiment prediction.
Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding
TLDR
The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which is intended to be useful for the task of interest even though the training is done on unlabeled data.
...
...