• Publications
  • Influence
Frequent Directions: Simple and Deterministic Matrix Sketching
TLDR
F FrequentDirections outperforms exemplar implementations of existing streaming algorithms in the space-error tradeoff and is mergeable and hence trivially parallelizable.
Outlier Robust ICP for Minimizing Fractional RMSD
TLDR
A new distance measure is formalized, fractional root mean squared distance (FRMSD), which incorporates the fraction of inliers into the distance function and is guaranteed to converge to a locally optimal solution.
Attenuating Bias in Word Vectors
TLDR
New simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them are explored and it is verified how names are masked carriers of gender bias and then used as a tool to attenuate bias in embeddings.
Relative Errors for Deterministic Low-Rank Matrix Approximations
TLDR
It is shown that Frequent Directions cannot be adapted to a sparse version in an obvious way that retains the l original rows of the matrix, as opposed to a linear combination or sketch of the rows.
Mergeable summaries
TLDR
This paper demonstrates that the MG and the SpaceSaving summaries for heavy hitters are indeed mergeable or can be made mergeable after appropriate modifications, and provides the best known randomized streaming bound for ε-approximate quantiles that depends only on ε, of size O(1 overε log 3/21 over ε).
Quality and efficiency for kernel density estimates in large data
Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide
Distributed Trajectory Similarity Search
TLDR
This paper proposes a distributed query framework to process trajectory similarity search over a large set of trajectories and implemented the proposed framework in Spark, a popular distributed data processing engine, by carefully considering different design choices.
Radio tomographic imaging and tracking of stationary and moving people via kernel distance
TLDR
This work presents and evaluates a system which can locate stationary or moving people, without calibration, by using kernel distance to quantify the difference between two histograms of signal strength measurements.
Spatial scan statistics: approximations and performance study
TLDR
A simple exact algorithm for finding the largest discrepancy region in a domain and a new approximation algorithm for a large class of discrepancy functions (including the Kulldorff scan statistic) that improves the approximation versus run time trade-off of prior methods are described.
On Measuring and Mitigating Biased Inferences of Word Embeddings
TLDR
A mechanism for measuring stereotypes using the task of natural language inference is designed and a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe), and it is shown that for gender bias, these techniques extend to contextualizedembeddings when applied selectively only to the static components of contextualized embeddeds.
...
...