High-Dimensional Vector Semantics

@article{Andrecut2018HighDimensionalVS,
  title={High-Dimensional Vector Semantics},
  author={M. Andrecut},
  journal={ArXiv},
  year={2018},
  volume={abs/1802.09914}
}
  • M. Andrecut
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
In this paper we explore the “vector semantics” problem from the perspective of “almost orthogonal” property of high-dimensional random vectors. We show that this intriguing property can be used to “memorize” random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering. 
1 Citations

Figures and Topics from this paper

Additive Feature Hashing
TLDR
It is shown that additive feature hashing can be performed directly by adding the hash values and converting them into high-dimensional numerical vectors, and the results numerically are illustrated using synthetic, language recognition, and SMS spam detection data. Expand

References

SHOWING 1-10 OF 11 REFERENCES
Efficient Estimation of Word Representations in Vector Space
TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand
Neural Word Embedding as Implicit Matrix Factorization
TLDR
It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization. Expand
Random indexing of text samples for latent semantic analysis
TLDR
Random Indexing of Text Samples for Latent Semantic Analysis Pentti Kanerva Jan Kristoferson Anders Holst kanerva@sics.se, aho@sic.se RWCP Theoretical Foundation SICS Laboratory Swedish Institute of Computer Science, Box 1263, SE-16429 Kista, Sweden LatentSemantic Analysis is a method of computing vectors that captures ent corpus and the vectors capture words-by-contexts matrix meaning. Expand
An Introduction to Random Indexing
TLDR
The Random Indexing word space approach is introduced, which presents an efficient, scalable and incremental alternative to standard word space methods. Expand
An evaluation of Naive Bayesian anti-spam filtering
TLDR
It is reached that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice. Expand
A Primer on Neural Network Models for Natural Language Processing
TLDR
This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. Expand
Approximation with random bases: Pro et Contra
TLDR
This work considers and analyze published procedures, both randomized and deterministic, for selecting elements from families of parameterized elementary functions that have been shown to ensure the rate of convergence in L2 norm of order O(1/N), where N is the number of elements. Expand
Neural Network Methods for Natural Language Processing
Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to natural language data. The first half of the book (Parts I and II)Expand
Context vectors ; general purpose approximate meaning representations self - organized from raw data . In J . M . Zurada et al . Computational intelligence : imitating life
  • 1994
Context vectors; general purpose approximate meaning representations selforganized from raw data
  • 1994
...
1
2
...