Efficient Sentence Embedding via Semantic Subspace Analysis

  title={Efficient Sentence Embedding via Semantic Subspace Analysis},
  author={Bin Wang and Fenxiao Chen and Yun Cheng Wang and C.-C. Jay Kuo},
  journal={2020 25th International Conference on Pattern Recognition (ICPR)},
A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we… 
2 Citations

Figures and Tables from this paper

Mining Latent Semantic Correlation inspired by Quantum Entanglement
A QE-inspired Network is implemented under the constraints of quantum formalism and the Local Semantic Measurement and Extraction are proposed for effectively capturing probability distribution information from the entangled state of a bipartite quantum system, which has a clear geometrical motivation but also supports a well-founded probabilistic interpretation.
Task-Specific Dependency-based Word Embedding Methods


Parameter-free Sentence Embedding via Orthogonal Basis
An innovative method based on orthogonal basis to combine pre-trained word embeddings into sentence representations and shows superior performance compared with non-parameterized alternatives and it is competitive to other approaches relying on either large amounts of labelled data or prolonged training time.
Efficient Sentence Embedding using Discrete Cosine Transform
DCT embeddings indeed preserve more syntactic information compared with vector averaging, and the model yields better overall performance in downstream classification tasks that correlate with syntactic features, which illustrates the capacity of DCT to preserve word order information.
Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation
The VLAWE representation, which is learned in an unsupervised manner, into a classifier and shown that it is useful for a diverse set of text classification tasks, is compared with a broad range of recent state-of-the-art methods.
EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition
This work explores an algorithm rooted in fluid-dynamics, known as higher-order Dynamic Mode Decomposition, which is designed to capture the eigenfrequencies, and hence the fundamental transition dynamics, of periodic and quasi-periodic systems.
SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
Through extensive experiments on multi-class and multi-label classification tasks, this work outperforms the previous state-of-the-art method, NTSG and achieves a significant reduction in training and prediction times compared to other representation methods.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
From Word Embeddings To Document Distances
It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.
P-SIF: Document Embeddings Using Partition Averaging
P-SIF, a partitioned word averaging model to represent long documents that retains the simplicity of simple weighted word averaging while taking a document's topical structure into account and concatenates them all to represent the overall document.
Concatenated p-mean Word Embeddings as Universal Cross-Lingual Sentence Representations
It is shown that the concatenation of different types of power mean word embeddings considerably closes the gap to state-of-the-art methods monolingually and substantially outperforms these more complex techniques cross-lingually.