• Publications
  • Influence
Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses
TLDR
It is argued that practitioners should first decide their target hypothesis before choosing an assessment method, and best practices and guidelines tailored to NLP research are provided, as well as an easy-to-use package for Bayesian assessment of hypotheses, complementing existing tools.
A Practical Algorithm for Distributed Clustering and Outlier Detection
TLDR
To the best of the knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers, and the clear superiority of this algorithm against all the baseline algorithms in almost all metrics.
On the Capabilities and Limitations of Reasoning for Natural Language Understanding
TLDR
This work presents the first formal framework to study empirical observations of linguistic variability in undirected graphs, addressing the ambiguity, redundancy, incompleteness, and inaccuracy that the use of language introduces when representing a hidden conceptual space.
ParsiNLU: A Suite of Language Understanding Challenges for Persian
TLDR
This work introduces ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on, and presents the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compares them with human performance.
Palindrome Recognition In The Streaming Model
TLDR
A one-pass randomized algorithm that solves the Palindrome Problem, which has an additive error and uses square root of n space and two variants of the algorithm which solve related and practical problems.
Streaming Periodicity with Mismatches
TLDR
A one-pass streaming algorithm that computes the k-periods of a string $S$ using $\text{poly}(k, \log n)$ bits of space, regardless of period length is given.
PhISCS-BnB: A Fast Branch and Bound Algorithm for the Perfect Tumor Phylogeny Reconstruction Problem
TLDR
PhISCS-BnB, a Branch and Bound algorithm to compute the most likely perfect phylogeny (PP) on an input genotype matrix extracted from a SCS data set, which not only offers an optimality guarantee, but is also 10 to 100 times faster than the best available methods on simulated tumorSCS data.
On the Possibilities and Limitations of Multi-hop Reasoning Under Linguistic Imperfections
TLDR
A framework to quantify the amount and effect of ambiguity, redundancy, incompleteness, and inaccuracy that the use of language introduces when representing a hidden conceptual space and suggests an alternative path forward: focusing on aligning the two spaces via richer representations, before investing in reasoning with many hops.
Effective sketching methods for value function approximation
TLDR
This work explores the utility of sketching for matrix methods, and demonstrates how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.
...
...