Marcílio Carlos Pereira de Souto

Learn More
The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no(More)
We present a novel framework that applies a meta-learning approach to clustering algorithms. Given a dataset, our meta-learning approach provides a ranking for the candidate algorithms that could be used with that dataset. This ranking could, among other things, support non-expert users in the algorithm selection task. In order to evaluate the framework(More)
In this paper, we present an algorithm for cluster analysis that provides a robust way to deal with datasets presenting different types of clusters and allows finding more than one structure in a dataset. Our approach is based on ideas from cluster ensembles and multi-objective clustering. We apply a Pareto-based multi-objective genetic algorithm with a(More)
Different algorithms have been proposed in the literature to cluster gene expression data, however there is no single algorithm that can be considered the best one independently on the data. In this work, we applied the concepts of Meta-Learning to relate features of gene expression data sets to the performance of clustering algorithms. In our context, each(More)
For highly imbalanced data sets, almost all the instances are labeled as one class, whereas far fewer examples are labeled as the other classes. In this paper, we present an empirical comparison of seven different clustering evaluation indices when used to assess partitions generated from highly imbalanced data sets. Some of the metrics are based on(More)
Ensemble of classifiers is an effective way of improving performance of individual classifiers. However, the choice of the ensemble members can become a very difficult task, in which, in some cases, it can lead to ensembles with no performance improvement. In order to avoid this situation, there is a need to find effective classifier member selection(More)
Diversity is considered as one of the main prerequisites for an efficient use of ensemble systems. One way of increasing diversity is through the use of feature selection methods in ensemble systems. In this paper, a class-based feature selection method for ensemble systems is proposed. The proposed method is inserted into the filter approach of feature(More)
In this work, we present the use of Ranking Meta-Learning approaches to ranking and selecting algorithms for problems of time series forecasting and clustering of gene expression data. Given a problem (forecasting or clustering), the Meta-Learning approach provides a ranking of the candidate algorithms, according to the characteristics of the problem’s(More)
Multi-classifier systems, also known as ensembles or committees, have been widely used to solve several classification problems, because they usually provide better performance than the individual classifiers. However, in order to build robust ensembles, it is necessary that the individual classifiers are as accurate as diverse among themselves – this is(More)
This paper investigates the performance of some multi-classifier systems, focusing on the benefits that can be gained when integrating different types of classifiers (hybrid multi-classifier systems). An empirical evaluation shows that the integration of different types of classifiers can lead to an improvement in performance in some practical(More)