José M. Peña

Learn More
Data Mining is playing a key role in most enterprises, which have to analyse great amounts of data in order to achieve higher profits. Nevertheless, due to the large datasets involved in this process, the data mining field must face some technological challenges. Grid Computing takes advantage of the low-load periods of all the computers connected to a(More)
Evolutionary techniques are one of the most successful paradigms in the field of optimization. In this paper we present a new approach, named GA-EDA, which is a new hybrid algorithm based on genetic and estimation of distribution algorithms. The original objective is to get benefits from both approaches. In order to perform an evaluation of this new(More)
This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some irrelevant features. We report experimental results on(More)
Personalized recommender systems can be classified into three main categories: content-based, mostly used to make suggestions depending on the text of the web documents, collaborative filtering, that use ratings from many users to suggest a document or an action to a given user and hybrid solutions. In the col-laborative filtering task we can find(More)
Successful secondary structure predictions provide a starting point for direct tertiary structure modelling, and also can significantly improve sequence analysis and sequence-structure threading for aiding in structure and function determination. Hence the improvement of predictive accuracy of the secondary structure prediction becomes essential for future(More)
The emergence of applications with greater processing and speedup requirements, such as Grand Challenge Applications (GCA), involves new computing and I/O needs. Many of these applications require access to huge data repositories and other I/O sources, making the I/O phase a bottleneck in the computing systems, due to its poor performance. In this sense,(More)
– Electronic, web-based commerce enables and demands the application of intelligent methods to analyze information collected from consumer web sessions. We propose a method of increasing the granularity of the user session analysis by isolating useful subsessions within web page access sessions, where each subsession represents a frequently traversed path(More)
The use of parallel file systems constitutes a high-performance solution to the problem known as I/O crisis in parallel or distributed environments. In the last years, clusters have become one of the most cheap and flexible frameworks for the deployment of parallel and distributed applications. Both parallel file systems and clusters have been successfully(More)
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with assumptions of conditional independence among features given the class, called na¨ıve Bayes, is competitive with state of the art clas-sifiers. On this paper a new naive Bayes classifier called Interval Estimation na¨ıve Bayes is proposed. Interval Estimation(More)