Ahmed K. Farahat

Learn More
The Nyström method is an efficient technique for obtaining a low-rank approximation of a large kernel matrix based on a subset of its columns. The quality of the Nyström approximation highly depends on the subset of columns used, which are usually selected using random sampling. This paper presents a novel recursive algorithm for calculating the Nyström(More)
In data mining applications, data instances are typically described by a huge number of features. Most of these features are irrelevant or redundant, which negatively affects the efficiency and effectiveness of different learning algorithms. The selection of relevant features is a crucial task which can be used to allow a better understanding of data or(More)
Dictionary learning and sparse representation (DLSR) is a recent and successful mathematical model for data representation that achieves state-ofthe-art performance in various fields such as pattern recognition, machine learning, computer vision, and medical imaging. The original formulation for DLSR is based on the minimization of the reconstruction error(More)
The Kernel Mean Matching (KMM) is an elegant algorithm that produces density ratios between training and test data by minimizing their maximum mean discrepancy in a kernel space. The applicability of KMM to large-scale problems is however hindered by the quadratic complexity of calculating and storing the kernel matrices over training and test data. To(More)
In today’s information systems, the availability of massive amounts of data necessitates the development of fast and accurate algorithms to summarize these data and represent them in a succinct format. One crucial problem in big data analytics is the selection of representative instances from large and massively distributed data, which is formally known as(More)
Document clustering algorithms usually use vector space model (VSM) as their underlying model for document representation. VSM assumes that terms are independent and accordingly ignores any semantic relations between them. This results in mapping documents to a space where the proximity between document vectors does not reflect their true semantic(More)
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been(More)
Document clustering algorithms usually use vector space model (VSM) as their underlying model for document representation. VSM assumes that terms are independent and accordingly ignores any semantic relations between them. This results in mapping documents to a space where the proximity between document vectors does not reflect their true semantic(More)
The kernel k-means is an effective method for data clustering which extends the commonly-used k-means algorithm to work on a similarity matrix over complex data structures. It is, however, computationally very complex as it requires the complete kernel matrix to be calculated and stored. Further, its kernelized nature hinders the parallelization of its(More)
Given a very large data set distributed over a cluster of several nodes, this paper addresses the problem of selecting a few data instances that best represent the entire data set. The solution to this problem is of a crucial importance in the big data era as it enables data analysts to understand the insights of the data and explore its hidden structure.(More)