P-SIF: Document Embeddings Using Partition Averaging

@article{Gupta2020PSIFDE,
  title={P-SIF: Document Embeddings Using Partition Averaging},
  author={Vivek Gupta and A. Saw and Pegah Nokhiz and Praneeth Netrapalli and Piyush Rai and P. Talukdar},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.09069}
}
  • Vivek Gupta, A. Saw, +3 authors P. Talukdar
  • Published 2020
  • Computer Science
  • ArXiv
  • Simple weighted averaging of word vectors often yields effective representations for sentences which outperform sophisticated seq2seq neural models in many tasks. While it is desirable to use the same method to represent documents as well, unfortunately, the effectiveness is lost when representing long documents involving multiple sentences. One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all… CONTINUE READING
    3 Citations

    Figures, Tables, and Topics from this paper

    References

    SHOWING 1-10 OF 64 REFERENCES
    Efficient Vector Representation for Documents through Corruption
    • 77
    • Highly Influential
    • PDF
    Distributed Representations of Sentences and Documents
    • 5,712
    • Highly Influential
    • PDF
    Topical Word Embeddings
    • 297
    • Highly Influential
    • PDF
    A Simple but Tough-to-Beat Baseline for Sentence Embeddings
    • 741
    • Highly Influential
    LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations
    • 12
    • Highly Influential
    • PDF
    Gaussian LDA for Topic Models with Word Embeddings
    • 226
    • Highly Influential
    • PDF
    Word Mover's Embedding: From Word2Vec to Document Embedding
    • 51
    • Highly Influential
    • PDF
    SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
    • 29
    • Highly Influential
    • PDF
    Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
    • 95
    • Highly Influential
    • PDF