P-SIF: Document Embeddings Using Partition Averaging
@article{Gupta2020PSIFDE, title={P-SIF: Document Embeddings Using Partition Averaging}, author={Vivek Gupta and A. Saw and Pegah Nokhiz and Praneeth Netrapalli and Piyush Rai and P. Talukdar}, journal={ArXiv}, year={2020}, volume={abs/2005.09069} }
Simple weighted averaging of word vectors often yields effective representations for sentences which outperform sophisticated seq2seq neural models in many tasks. While it is desirable to use the same method to represent documents as well, unfortunately, the effectiveness is lost when representing long documents involving multiple sentences. One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all… CONTINUE READING
Supplemental Code
Github Repo
Source code for our AAAI 2020 paper P-SIF: Document Embeddings using Partition Averaging
Figures, Tables, and Topics from this paper
3 Citations
On Dimensional Linguistic Properties of the Word Embedding Space
- Computer Science
- RepL4NLP@ACL
- 2020
- 1
- PDF
Corruption Is Not All Bad: Incorporating Discourse Structure into Pre-training via Corruption for Essay Scoring
- Computer Science
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 64 REFERENCES
Efficient Vector Representation for Documents through Corruption
- Computer Science
- ICLR
- 2017
- 77
- Highly Influential
- PDF
Distributed Representations of Sentences and Documents
- Computer Science
- ICML
- 2014
- 5,712
- Highly Influential
- PDF
Words are not Equal: Graded Weighting Model for Building Composite Document Vectors
- Computer Science
- ICON
- 2015
- 8
- PDF
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
- Computer Science
- ICLR
- 2017
- 741
- Highly Influential
LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations
- Computer Science
- PRCV
- 2018
- 12
- Highly Influential
- PDF
Word Mover's Embedding: From Word2Vec to Document Embedding
- Computer Science, Mathematics
- EMNLP
- 2018
- 51
- Highly Influential
- PDF
SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
- Computer Science
- EMNLP
- 2017
- 29
- Highly Influential
- PDF
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
- Computer Science
- ArXiv
- 2016
- 95
- Highly Influential
- PDF