Learn More
In this paper we show an instance-based reasoning e-mail filtering model that outperforms classical machine learning techniques and other successful lazy learners approaches in the domain of anti-spam filtering. The architecture of the learning-based anti-spam filter is based on a tuneable enhanced instance retrieval network able to accurately generalize(More)
A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to(More)
In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ 2-text, Mutual Information and(More)
In this paper we propose a novel feature selection method able to handle concept drift problems in spam filtering domain. The proposed technique is applied to a previous successful instance-based reasoning e-mail filtering system called SpamHunting. Our achieved information criterion is based on several ideas extracted from the well-known information(More)
Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in(More)
Information Retrieval focuses on finding documents whose content matches with a user query from a large document collection. As formulating well-designed queries is difficult for most users, it is necessary to use query expansion to retrieve relevant information. Query expansion techniques are widely applied for improving the efficiency of the textual(More)
Resumen Este trabajo presenta un nuevo método para la compresión de textos, que permite la búsqueda directa de palabras y frases dentro del texto sin necesidad de descomprimirlo. Este método es directamente comparable, en tasa de compresión, con las técnicas basadas en Huffman orientadas a palabras y proporciona una compresión más simple y rápida,(More)