Learn More
Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical , but non-probabilistic classifier based on the Winnow algorithm. The feature(More)
This paper discusses the design decisions underlying the CRM114 Discriminator software, how it can be configured as a spam filter, and what we may glean from the preliminary TREC 2005 results. Unlike most other filters, CRM114 is not a fixed-purpose antispam filter; rather, it's a general purpose language meant to expedite the creation of text filters. The(More)
A large number of spam filtering and other mail classification systems have been proposed and implemented in the recent past. This paper describes a possible unification of these filters, allowing their technology to be described in a uniform way, and allowing comparison between similar systems to be considered in a more analytic style. In particular,(More)
OSBF­Lua is a C module for the Lua language which implements a Bayesian classifier enhanced with Orthogonal Sparse Bigrams ­ OSB ­ for feature extraction and Exponential Differential Document Count ­ EDDC – for feature selection. These two techniques, combined with the new training method introduced for TREC 2006 produce a highly accurate filter, yet very(More)
We consider the well­known K­nearest­neighbor (KNN) classifier as a spam filter. We compare KNN­based classification in both equal­vote and decreasing­rank forms to a well­tested Markov Random Field (MRF) spam classifier. As KNN classification is known to be asymtotically bounded to be not worse than twice the performance of the best possible probabalistic(More)
learning of pattern-match rules for information extraction.up relational learning of pattern matching rules for information extraction. The use of word sense disambiguation in an information extraction system. A maximum entropy approach to information extraction from semi-structured and free text. [Cir01] Fabio Ciravegna. (LP) 2 , an adaptive algorithm for(More)
This paper discusses the design decisions underlying the CRM114 Discriminator software, how it can be configured as a spam filter, and what we may glean from the preliminary TREC 2005 results. Unlike most other filters, CRM114 is not a fixed-purpose antispam filter; rather, it's a general purpose language meant to expedite the creation of text filters. The(More)
INTRODUCTION High prevalence of pre-pregnancy weight excess, as well as evidence of increased risk of maternal and perinatal complications associated with nutritional states have been observed lately. Considering the possible ethnic and environmental influences few studies have assessed this risk in pregnant Brazilian women. OBJECTIVES This study aimed at(More)
  • 1