• Corpus ID: 2504414

Bayesian Paragraph Vectors

  title={Bayesian Paragraph Vectors},
  author={Geng Ji and Robert Bamler and Erik B. Sudderth and Stephan Mandt},
Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and sentences. In this work, we propose a novel interpretation for neural-network-based paragraph vectors by developing an unsupervised generative model… 

Figures and Tables from this paper

Exponential Word Embeddings: Models and Approximate Learning

This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word, which is especially beneficial when noisy and little training data is available.

Identification, Interpretability, and Bayesian Word Embeddings

  • Adam M. Lauretig
  • Computer Science
    Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
  • 2019
It is found that inaugural addresses became less internationalist after 1945, which goes against the conventional wisdom, and that an increase in bellicosity is associated with a increase in hostile actions by the United States, showing that elite deliberations are not cheap talk, and helping confirm the validity of the model.

Augmenting and Tuning Knowledge Graph Embeddings

This work proposes an efficient method for large scale hyperparameter tuning by interpreting knowledge graph embeddings by interpreting these models in a probabilistic framework and uses a variational expectation-maximization approach to tune thousands of such hyperparameters with minimal additional cost.

Learning Hawkes Processes from a Handful of Events

This work develops an efficient algorithm based on variational expectation-maximization that significantly outperforms state-of-the-art methods under short observation sequences and is able to optimize over an extended set of hyper-parameters.



Bayesian Neural Word Embedding

Experimental results demonstrate the performance of the proposed scalable Bayesian neural word embedding algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean.

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the

Dynamic Word Embeddings

Experimental results on three different corpora demonstrate that the dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources

Investigation of unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources shows that edit distance data is cleaner and more easily-aligned than the heuristic data.

Learning word embeddings efficiently with noise-contrastive estimation

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time.

Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval

  • H. PalangiL. Deng R. Ward
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
A model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells is developed and is shown to significantly outperform several existing state of the art methods.