Yasuhide Kawada

Learn More
This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in(More)
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we(More)
During the last few decades, the requirments of the international market imposed by economic forces have led to the necessity to develop effective and efficent electronic natural language processing tools. Many Machine Translation (MT) systems are being developed world wide, especially in Japan and Europe to address this chalanges in the 21 century. The(More)
Spam blogs or splogs are blogs hosting spam posts, created using machine generated or hijacked content for the sole purpose of hosting advertisements or increasing the number of inlinks of target sites. Among those splogs, this paper focuses on detecting a group of splogs which are estimated to be created by an identical spammer. We especially show that(More)
Among other domains and topics on which some issues are frequently argued in the blogosphere, the domain of crime is one of the most seriously discussed by various kinds of bloggers. Such information on crimes in blogs is especially valuable for outsiders from abroad who are not familiar with cultures and crimes in foreign countries. This paper proposes a(More)
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we(More)
This paper presents techniques of retrieving useful information from a mixture of Web pages collected from either question-answer sites (Q&A sites) or Web search engines. The proposed techniques are designed to discover the maximum possible amount of know-how knowledge from such collections of Web pages, where know-how knowledge is defined as text(More)
This paper addresses the problem of identifying irrelevant items from a small set of similar documents using Web search engine suggests. Specifically, we collected volumes of Web pages through Web search engines and inspected the page contents using topic models. Among each cluster of pages sharing the same topic indicated by the topic model, our technique(More)