#### Filter Results:

#### Publication Year

1999

2013

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

#### Data Set Used

Learn More

This is an introductory survey of the emerging theory of two new classes of (discrete, countable) groups, called hyperlinear and sofic groups. They can be characterized as subgroups of metric ultraproducts of families of, respectively, uni-tary groups U (n) and symmetric groups S n , n ∈ N. Hyperlinear groups come from theory of operator algebras (Connes'… (More)

We suggest that the curse of dimensionality affecting the similarity-based search in large datasets is a manifestation of the phenomenon of concentration of measure on high-dimensional structures. We prove that, under certain geometric assumptions on the query domain Ω and the dataset X, if Ω satisfies the so-called concentration property, then for most… (More)

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is… (More)

—Exchangeable random variables form an important and well-studied generalization of i.i.d. variables, however simple examples show that no nontrivial concept or function classes are PAC learnable under general exchangeable data inputs X1, X2,. . .. Inspired by the work of Berti and Rigo on a Glivenko–Cantelli theorem for exchangeable inputs, we propose a… (More)

We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper. The main features of our approach are that a high intrinsic dimension of a dataset reflects the presence of the curse of dimensionality (in a certain mathematically precise sense), and that dimension of a discrete… (More)

— We propose an axiomatic approach to the concept of an intrinsic dimension of a dataset, based on a viewpoint of geometry of high-dimensional structures. Our first axiom postulates that high values of dimension be indicative of the presence of the curse of dimensionality (in a certain precise mathematical sense). The second axiom requires the dimension to… (More)

We suggest a variation of the Hellerstein— Koutsoupias—Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used… (More)

We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. This type of similarity search has importance in both providing a building block to more complex algorithms and for possible use in direct biological investigations, and… (More)

We discuss some aspects of approximating functions on high-dimensional data sets with additive functions or ANOVA decompositions, that is, sums of functions depending on fewer variables each. It is seen that under appropriate smoothness conditions, the errors of the ANOVA decompositions are of order O(n m/2) for indendent predictor variables and… (More)