• Corpus ID: 245329838

Sublinear Time Approximation of Text Similarity Matrices

@article{Ray2021SublinearTA,
  title={Sublinear Time Approximation of Text Similarity Matrices},
  author={Archan Ray and Nicholas Monath and Andrew McCallum and Cameron Musco},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.09631}
}
We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for n data points requires Ω( n 2 ) similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the… 

References

SHOWING 1-10 OF 68 REFERENCES
Sublinear Time Low-Rank Approximation of Distance Matrices
TLDR
A recursive algorithm based on additive projection-cost preserving sampling is developed and it is shown that for any underlying distance metric d, it is possible to achieve an additive error low rank approximation in sublinear time.
Improved Approximation Algorithms for Large Matrices via Random Projections
  • Tamás Sarlós
  • Computer Science
    2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
TLDR
The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation.
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
TLDR
An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.
Fast and stable deterministic approximation of general symmetric kernel matrices in high dimensions
TLDR
This paper develops a unified theoretical framework for analyzing Nyström approximations, which is valid for both SPSD and indefinite kernels and is independent of the specific scheme for selecting landmark points, and proposes the anchor net method, which operates entirely on the dataset without requiring the access to K or its matrix-vector product.
Metric and non-metric proximity transformations at linear costs
Supervised low rank indefinite kernel approximation using minimum enclosing balls
CUR Low Rank Approximation of a Matrix at Sub-linear Cost
TLDR
It is proved that some old and new sublinear cost algorithms solve the dual LRA problem, that is, with a high probability compute close LRA of a random matrix admitting LRA.
Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices
TLDR
It is shown how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, and time lower bounds for low- rank approximation of PSD matrices are proved.
Similarity Search in High Dimensions via Hashing
TLDR
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Relative-Error CUR Matrix Decompositions
TLDR
These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.
...
1
2
3
4
5
...