# Sublinear Time Approximation of Text Similarity Matrices

@article{Ray2021SublinearTA, title={Sublinear Time Approximation of Text Similarity Matrices}, author={Archan Ray and Nicholas Monath and Andrew McCallum and Cameron Musco}, journal={ArXiv}, year={2021}, volume={abs/2112.09631} }

We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for n data points requires Ω( n 2 ) similarity computations. This quadratic scaling is a signiﬁcant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the…

## Figures and Tables from this paper

## References

SHOWING 1-10 OF 68 REFERENCES

Sublinear Time Low-Rank Approximation of Distance Matrices

- Computer ScienceNeurIPS
- 2018

A recursive algorithm based on additive projection-cost preserving sampling is developed and it is shown that for any underlying distance metric d, it is possible to achieve an additive error low rank approximation in sublinear time.

Improved Approximation Algorithms for Large Matrices via Random Projections

- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation.

On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2005

An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.

Fast and stable deterministic approximation of general symmetric kernel matrices in high dimensions

- Computer ScienceArXiv
- 2021

This paper develops a unified theoretical framework for analyzing Nyström approximations, which is valid for both SPSD and indefinite kernels and is independent of the specific scheme for selecting landmark points, and proposes the anchor net method, which operates entirely on the dataset without requiring the access to K or its matrix-vector product.

Metric and non-metric proximity transformations at linear costs

- Computer ScienceNeurocomputing
- 2015

Supervised low rank indefinite kernel approximation using minimum enclosing balls

- Computer ScienceNeurocomputing
- 2018

CUR Low Rank Approximation of a Matrix at Sub-linear Cost

- Computer ScienceArXiv
- 2019

It is proved that some old and new sublinear cost algorithms solve the dual LRA problem, that is, with a high probability compute close LRA of a random matrix admitting LRA.

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

- Computer Science, Mathematics2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
- 2017

It is shown how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, and time lower bounds for low- rank approximation of PSD matrices are proved.

Similarity Search in High Dimensions via Hashing

- Computer ScienceVLDB
- 1999

Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.

Relative-Error CUR Matrix Decompositions

- Computer Science, MathematicsSIAM J. Matrix Anal. Appl.
- 2008

These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.