• Corpus ID: 4765505

Context is Everything: Finding Meaning Statistically in Semantic Spaces

  title={Context is Everything: Finding Meaning Statistically in Semantic Spaces},
  author={E. Zelikman},
  • E. Zelikman
  • Published 22 March 2018
  • Computer Science
  • ArXiv
This paper introduces Contextual Salience (CoSal), a simple and explicit measure of a word's importance in context which is a more theoretically natural, practically simpler, and more accurate replacement to tf-idf. CoSal supports very small contexts (20 or more sentences), out-of context words, and is easy to calculate. A word vector space generated with both bigram phrases and unigram tokens reveals that contextually significant words disproportionately define phrases. This relationship is… 

Figures and Tables from this paper

Improving Context-Aware Semantic Relationships in Sparse Mobile Datasets
New algorithms for building external features into sentence embeddings and semantic similarity scores are developed and tested on embedding spaces on data from Twitter, showing that applying PCA with eight components to the embedding space and appending multimodal features yields the best outcomes.
Data-to-Text Generation with Attention Recurrent Unit
A novel Attention Recurrent Unit is proposed to generate short descriptive texts conditioned on database records and a method called DoubleAtten is designed to enhance the attention distribution when computing the generation probabilities.
NRF2 drug repurposing using a question-answer artificial intelligence system
A question-answer artificial intelligence (QAAI) system that is capable of repurposing drug compounds and demonstrates its ability to predict new indications for already existing drugs.
Repurposing Zileuton as a Depression Drug Using an AI and In Vitro Approach
The results indicate that reorienting zileuton usage to modulate M1 macrophages could be a novel and safer therapeutic option for treating depression, based on the integration of RNA-seq analysis and text mining using deep neural networks.


Improving Word Representations via Global Context and Multiple Word Prototypes
A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Deep Contextualized Word Representations
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
Local Vector-based Models for Sense Discrimination
A local vector-based model of reduced dimensionality which is linguistically coherent and can be computed for multivariate Gaussian mixtures is proposed and shown to show the advantages of unrestricted Gaussian models compared to K-means.
Learned in Translation: Contextualized Word Vectors
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs
Using compressed sensing theory it is shown that representations combining the constituent word vectors can be information-preserving linear measurements of Bag-of-n- Grams (BonG) representations of text, leading to a new theoretical result about LSTMs: embeddings derived from a low-memory LSTM are provably at least as powerful on classification tasks as a linear classi fier over BonG vectors.
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.
Word Representations via Gaussian Embedding
This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distributions, and investigates the ability of these embedDings to model entailment and other asymmetric relationships, and explores novel properties of the representation.