Yasuhide Kawada

Learn More
This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in(More)
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we(More)
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we(More)
During the last few decades, the requirments of the international market imposed by economic forces have led to the necessity to develop effective and efficent electronic natural language processing tools. Many Machine Translation (MT) systems are being developed world wide, especially in Japan and Europe to address this chalanges in the 21 century. The(More)
This paper presents techniques of retrieving useful information from a mixture of Web pages collected from either question-answer sites (Q&A sites) or Web search engines. The proposed techniques are designed to discover the maximum possible amount of know-how knowledge from such collections of Web pages, where know-how knowledge is defined as text(More)
This paper addresses the problem of identifying irrelevant items from a small set of similar documents using Web search engine suggests. Specifically, we collected volumes of Web pages through Web search engines and inspected the page contents using topic models. Among each cluster of pages sharing the same topic indicated by the topic model, our technique(More)
This paper proposes to utilize a search engine as a social sensor which is to be used for predicting market shares. More specifically, this paper studies a task of comparing rates of concerns of those who search for Web pages among several companies which supply products, given a specific products domain. In this paper, we measure concerns of those who(More)
In this paper, we address the issue of how to overview the knowledge of a given query keyword. We especially focus on concerns of those who search for Web pages with a given query keyword, and study how to efficiently overview the whole list of Web search information needs of a given query keyword. First, we collect Web search information needs of a given(More)