Alejandro Murua

Learn More
The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the advantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is(More)
The goal of clustering is to identify distinct groups in a dataset. The basic idea of model-based clustering is to approximate the data density by a mixture model, typically a mixture of Gaussians, and to estimate the parameters of the component densities, the mixing fractions, and the number of components from the data. The number of distinct groups in the(More)
We describe a probabilistic approach to simultaneous image segmentation and intensity estimation for complementary DNA microarray experiments. The approach overcomes several limitations of existing methods. In particular, it (a) uses a flexible Markov random field approach to segmentation that allows for a wider range of spot shapes than existing methods,(More)
ÐA useful notion of weak dependence between many classifiers constructed with the same training data is introduced. It is shown that if both this weak dependence is low and the expected margins are large, then decison rules based on linear combinations of these classifiers can achieve error rates that decrease exponentially fast. Empirical results with(More)
An unsupervised stochastic clustering method based on the ferromagnetic Potts spin model is introduced as a powerful tool to determine functionally connected regions. The method provides an intuitively simple approach to clustering and makes no assumptions of the number of clusters in the data or their underlying distribution. The performance of the method(More)
We explore the possibility of recognizing speech signals using a large collection of coarse acoustic events, which describe temporal relations between a small number of local features of the spectrogram. The major issue of invariance to changes in duration of speech signal events is addressed by defining temporal relations in a rather coarse manner,(More)
Many clustering methods, such as K-means, kernel K-means, and MNcut clustering, follow the same recipe: (1) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; (iii) optimize this figure of merit over partitions. Potts model(More)