Corpus ID: 195658268

Hierarchical Optimal Transport for Document Representation

@article{Yurochkin2019HierarchicalOT,
  title={Hierarchical Optimal Transport for Document Representation},
  author={Mikhail Yurochkin and Sebastian Claici and Edward Chien and Farzaneh Mirzazadeh and Justin M. Solomon},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.10827}
}
The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. [...] Key Method We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover's distance. We evaluate our technique for $k$-NN classification and show better interpretability and scalability with comparable performance to current methods at…Expand
Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
TLDR
This work presents ASPIRE, a new scientific document similarity model based on matching finegrained aspects that improves performance on document similarity tasks across four datasets, and presents a fast method that involves matching only single sentence pairs, and a method that makes sparse multiple matches with optimal transport. Expand
Neural Sinkhorn Topic Model
TLDR
A novel OT-based topic modelling framework, which enjoys appealing simplicity, effectiveness, and efficiency, and significantly outperforms several state-of-the-art models in terms of both topic quality and document representations. Expand
Semantics-assisted Wasserstein Learning for Topic and Word Embeddings
TLDR
In Sawl, an NMF-like unified objective is formulated that integrates the regularized Wasserstein distance loss with a context factorization of word context information and can refine the word embeddings for capturing corpus-specific semantics, enabling to boost topics and word embeddeddings each other. Expand
Re-evaluating Word Mover's Distance
TLDR
It is found that WMD in high-dimensional spaces behaves more similarly to BOW than in low- dimensional spaces due to the curse of dimensionality. Expand
Interpretable contrastive word mover's embedding
TLDR
It is shown that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover’s Embedding, can be significantly enhanced by adding interpretability, and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes. Expand
Supervised Tree-Wasserstein Distance
TLDR
This work rewrite the Wasserstein distance on the tree metric by the parent–child relationships of a tree, and formulate it as a continuous optimization problem using a contrastive loss, and shows that the STW distance can be computed fast, and improves the accuracy of document classification tasks. Expand
Hierarchical Optimal Transport for Multimodal Distribution Alignment
TLDR
This work introduces a hierarchical formulation of OT which leverages clustered structure in data to improve alignment in noisy, ambiguous, or multimodal settings and demonstrates that when clustered structure exists in datasets, and is consistent across trials or time points, a hierarchical alignment strategy can provide significant improvements in cross-domain alignment. Expand
Geometric Dataset Distances via Optimal Transport
TLDR
This work proposes an alternative notion of distance between datasets that is model-agnostic, does not involve training, can compare datasets even if their label sets are completely disjoint and has solid theoretical footing. Expand
Hierarchical Optimal Transport for Robust Multi-View Learning
TLDR
The proposed hierarchical optimal transport (HOT) method is applicable to both unsupervised and semi-supervised learning, and experimental results show that it performs robustly on both synthetic and real-world tasks. Expand
Gaussian-Smoothed Optimal Transport: Metric Structure and Statistical Efficiency
TLDR
This work proposes a novel Gaussian-smoothed OT (GOT) framework, that achieves the best of both worlds: preserving the 1-Wasserstein metric structure while alleviating the empirical approximation curse of dimensionality. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 40 REFERENCES
From Word Embeddings To Document Distances
TLDR
It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates. Expand
Automatic Evaluation of Topic Coherence
TLDR
A simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Expand
Supervised Word Mover's Distance
TLDR
This paper proposes an efficient technique to learn a supervised metric, which it is called the Supervised-WMD (S-W MD) metric, and provides an arbitrarily close approximation of the original WMD distance that results in a practical and efficient update rule. Expand
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
Topic mover's distance based document classification
  • Xinhui Wu, H. Li
  • Computer Science
  • 2017 IEEE 17th International Conference on Communication Technology (ICCT)
  • 2017
TLDR
Experiments for document classification on six real world datasets show that compared with word-based WMD, the proposed TMD can achieve much lower time complexity with the same accuracy. Expand
Word Mover’s Embedding: From Word2Vec to Document Embedding
TLDR
The Word Mover’s Embedding (WME) is proposed, a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings that consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length. Expand
Distilled Wasserstein Learning for Word Embedding and Topic Modeling
We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distanceExpand
Indexing by Latent Semantic Analysis
TLDR
A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. Expand
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Latent Dirichlet Allocation
We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], andExpand
...
1
2
3
4
...