# A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU

@article{Tithi2021ANP, title={A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU}, author={Jesmin Jahan Tithi and Fabrizio Petrini}, journal={ArXiv}, year={2021}, volume={abs/2107.06433} }

The Word Movers Distance (WMD) measures the semantic dissimilarity between two text documents by computing the cost of optimally moving all words of a source/query document to the most similar words of a target document. Computing WMD between two documents is costly because it requires solving an optimization problem that costs O (V 3log(V )) where V is the number of unique words in the document. Fortunately, WMD can be framed as an Earth Mover’s Distance (EMD) for which the algorithmic… Expand

#### Figures and Tables from this paper

#### References

SHOWING 1-10 OF 22 REFERENCES

From Word Embeddings To Document Distances

- Mathematics, Computer Science
- ICML
- 2015

It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates. Expand

Word Mover's Distance for Agglomerative Short Text Clustering

- Computer Science
- ACIIDS
- 2019

This paper investigates the word mover’s distance metrics to automatically cluster short text using the word semantic information and utilizes the agglomerative strategy as the clustering method to efficiently group texts based on their similarity. Expand

Word Mover’s Embedding: From Word2Vec to Document Embedding

- Computer Science, Mathematics
- EMNLP
- 2018

The Word Mover’s Embedding (WME) is proposed, a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings that consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length. Expand

Classifying Extremely Short Texts by Exploiting Semantic Centroids in Word Mover's Distance Space

- Computer Science
- WWW
- 2019

A better regularized word mover's distance (RWMD) based centroid classifier for short texts, named RWMD-CC, which computes a representative semantic centroid for each category under the RWMD measure, and predicts test documents by finding the closest semanticCentroid. Expand

Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation

- Computer Science
- AAAI
- 2016

Earth Mover's Distance is introduced by providing a natural formulation that translates words in a holistic fashion, addressing the limitations of the nearest neighbor, and extended to a new task of identifying parallel sentences, useful for statistical machine translation systems, thereby expanding the application realm of bilingual word embeddings. Expand

Efficient Estimation of Word Representations in Vector Space

- Computer Science
- ICLR
- 2013

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand

Pair-Wise: Automatic Essay Evaluation using Word Mover's Distance

- Computer Science
- CSEDU
- 2018

A pair-wise semantic similarity essay evaluation by using the Word Mover’s Distance is proposed, which relies on Neural Word Embedding to measure the similarity between words. Expand

Using Centroids of Word Embeddings and Word Mover's Distance for Biomedical Document Retrieval in Question Answering

- Computer Science
- BioNLP@ACL
- 2016

A document retrieval method for question answering is proposed that represents documents and questions as weighted centroids of word embeddings and reranks the retrieved documents with a relaxation of Word Mover's Distance and is competitive with PUBMED. Expand

A faster strongly polynomial minimum cost flow algorithm

- Mathematics, Computer Science
- STOC '88
- 1988

This algorithm improves the best previous strongly polynomial algorithm due to Galil and Tardos, by a factor of m/n, and is even more efficient if the number of arcs with finite upper bounds, say m', is much less than m. Expand

Sinkhorn Distances: Lightspeed Computation of Optimal Transport

- Computer Science, Mathematics
- NIPS
- 2013

This work smooths the classic optimal transport problem with an entropic regularization term, and shows that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers. Expand