Petros Drineas

Learn More
A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n× n Gram matrix G such that computations of interest may be performed more rapidly. The(More)
In many applications, the data consist of (or may be naturally formulated as) an m×n matrix A. It is often of interest to find a low-rank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a specified rank k, where k is much smaller than m and n. Methods such as the singular value decomposition (SVD) may be used to find an(More)
In many applications, the data consist of (or may be naturally formulated as) an m× n matrix A which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an m×n matrix A, compute approximations to A(More)
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix(More)
Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational resources, such as the computation time, random access memory (RAM), and the number of passes over the data, than do previously known algorithms for these(More)
We consider the problem of selecting the “best” subset of exactly k columns from an m× n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2,m2n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log(More)
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We(More)
We consider low-rank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVD-like decompositions for column-based matrix reconstruction, and (ii) two(More)
Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. In a typical setting, one lets n be the number of constraints and d be the number of variables, with n d. Then, existing exact methods find a solution vector in O(nd2) time. We present two randomized algorithms that provide(More)
Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated(More)