Matching Map Recovery with an Unknown Number of Outliers

  title={Matching Map Recovery with an Unknown Number of Outliers},
  author={Arshak Minasyan and Tigran Galstyan and S. A. Hunanyan and Arnak S. Dalalyan},
We consider the problem of finding the matching map between two sets of d -dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If n and m are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality k ∗ ≤ min( n, m ) . We show that, in the high-dimensional setting, if the signal-to-noise… 

Figures and Tables from this paper



One-Way Matching of Datasets with Low Rank Signals

Under a stylized model, it is shown that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task.

Maximum Flow and Minimum-Cost Flow in Almost-Linear Time

We give an algorithm that computes exact maximum flows and minimum-cost flows on directed graphs with m edges and polynomially bounded integral demands, costs, and capacities in $m^{1+o(1)}$ time.

Optimal detection of the feature matching map in presence of noise and outliers

The main result shows that, in the high-dimensional setting, a detection region of unknown injection may be characterized by the sets of vectors for which the inlier-inlier distance is of order at least d 1 / 4 and theInlier-outlier distance

Minimax Rates in Permutation Estimation for Feature Matching

The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as a problem of permutation estimation and a theoretical analysis of the accuracy of several natural estimators is provided.

SCANPY: large-scale single-cell gene expression data analysis

This work presents Scanpy, a scalable toolkit for analyzing single-cell gene expression data that includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks, and AnnData, a generic class for handling annotated data matrices.

Random Graph Matching in Geometric Models: the Case of Complete Graphs

An approximate maximum likelihood estimator is derived, which provably achieves, with high probability, perfect recovery of π ∗ when σ = o ( n − 2 /d ) and almost perfect recovery with a vanishing fraction of errors when ρ = n − 1 /d .

Matrix Reordering for Noisy Disordered Matrices: Optimality and Computationally Efficient Algorithms

This work first establishes the fundamental statistical limit for the matrix reordering problem in a decision-theoretic framework and shows that a constrained least square estimator is rate-optimal, and proposes a novel polynomial-time adaptive sorting algorithm.

Localization in 1D non-parametric latent space models from pairwise affinities

An estimation procedure is introduced that provably localizes all the latent positions in a one-dimensional torus with a maximum error of the order of √ log(n)/n, with highprobability, and is proven to be minimax optimal.

Strong recovery of geometric planted matchings

The problem of efficiently recovering the matching between an unlabelled collection of n points in R and a small random perturbation of those points is studied and it is shown that the MLE makes n errors for an explicit δ ∈ (0, 1).