Yuanchen He

Learn More
— Unwanted and malicious messages dominate Email traffic and pose a great threat to the utility of email communications. Reputation systems have been getting momentum as the solution. Such systems extract Email senders behavior data based on global sending distribution, analyze them and assign a value of trust to each IP address sending email messages. We(More)
Due to complexity of biomedical classification problems, it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). Here 'effective' means that a DSS should not only predict unseen samples accurately, but also work in a human-understandable way. In this(More)
— Millions of new domains are registered every day and the many of them are malicious. It is challenging to keep track of malicious domains by only Web content analysis due to the large number of domains. One interesting pattern in legitimate domain names is that many of them consist of English words or look like meaningful English while many malicious(More)
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation(More)
To discriminate spam Web hosts/pages from normal ones, text-based and link-based data are provided for Web Spam Challenge Track II. Given a small part of labeled nodes (about 10%) in a Web linkage graph, the challenge is to predict other nodes' class to be spam or normal. We extract features from link-based data, and then combine them with text-based(More)