Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database

  title={Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database},
  author={E. Altszyler and M. Sigman and D. Slezak},
  • E. Altszyler, M. Sigman, D. Slezak
  • Published 2016
  • Computer Science
  • ArXiv
  • Word embeddings have been extensively studied in large text datasets. However, only a few studies analyze semantic representations of small corpora, particularly relevant in single-person text production studies. In the present paper, we compare Skip-gram and LSA capabilities in this scenario, and we test both techniques to extract relevant semantic patterns in single-series dreams reports. LSA showed better performance than Skip-gram in small size training corpus in two semantic tests. As a… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Bigger does not mean better! We prefer specificity
    • 6
    Word Embedding in Small Corpora: A Case Study in Quran
    Text Analytics Techniques in the Digital World: Word Embeddings and Bias
    • 1
    DeepMove: Learning Place Representations through Large Scale Movement Data
    • 6
    • PDF


    Publications referenced by this paper.
    Efficient Estimation of Word Representations in Vector Space
    • 15,078
    • PDF
    Distributed Representations of Words and Phrases and their Compositionality
    • 18,830
    • PDF
    Software Framework for Topic Modelling with Large Corpora
    • 2,597
    • PDF
    Improving Distributional Similarity with Lessons Learned from Word Embeddings
    • 932
    • PDF
    Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
    • 1,120
    • PDF
    A unified architecture for natural language processing: deep neural networks with multitask learning
    • 4,030
    • PDF
    Neural Word Embedding as Implicit Matrix Factorization
    • 1,205
    • PDF
    Semantic Compositionality through Recursive Matrix-Vector Spaces
    • 1,103
    • PDF
    Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD
    • 238
    • PDF