Learn More
Due to complexity of biomedical problems, it is difficult or even impossible to build a perfect model with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). Here " effective " means that a DSS should not only predict unseen samples accurately, but also work in a human-understandable way. In this(More)
Due to complexity of biomedical classification problems, it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). Here 'effective' means that a DSS should not only predict unseen samples accurately, but also work in a human-understandable way. In this(More)
— Unwanted and malicious messages dominate Email traffic and pose a great threat to the utility of email communications. Reputation systems have been getting momentum as the solution. Such systems extract Email senders behavior data based on global sending distribution, analyze them and assign a value of trust to each IP address sending email messages. We(More)
Millions of new domains are registered every day and the many of them are malicious. It is challenging to keep track of malicious domains by only Web content analysis due to the large number of domains. One interesting pattern in legitimate domain names is that many of them consist of English words or look like meaningful English while many malicious domain(More)
Selecting informative and discriminative genes from huge microarray gene expression data is an important and challenging bioinformatics research topic. This paper proposes a fuzzy-granular method for the gene selection task. Firstly, genes are grouped into different function granules with the fuzzy C-means algorithm (FCM). And then informative genes in each(More)
When hundreds of thousands of applications need to be analyzed within a short period of time, existing static and dynamic malware detection methods may become less desirable because they could quickly exhaust system and human resources. Additionally, many behavioral malware detection methods may not be practical because they require the collection of(More)
A hybrid Computational Intelligence-based Knowledge Discovery system is presented in this paper. The system works in three phases. In phase 1, many feature selection algorithms are utilized to select informative cancer-related genes from microarray expression data. Compared with other algorithms, our GSVM-RFE algorithm demonstrates superior performance on(More)
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation(More)
To discriminate spam Web hosts/pages from normal ones, text-based and link-based data are provided for Web Spam Challenge Track II. Given a small part of labeled nodes (about 10%) in a Web linkage graph, the challenge is to predict other nodes' class to be spam or normal. We extract features from link-based data, and then combine them with text-based(More)