Learn More
We are investigating automatic generation of a review (or survey) article in a specific subject domain. In a research paper, there are passages where the author describes the essence of a cited paper and the differences between the current paper and the cited paper (we call them citing areas). These passages can be considered as a kind of summary of the(More)
This paper introduces the Patent Mining Task of the Seventh TCIR Workshop and the test collections produced in this task. The task's goal was the classification of research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard. For this task, 12 participant groups submitted(More)
Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve such inefficient searching, we have(More)
We propose a method for detecting survey articles in a multilingual database. Generally, a survey article cites many important papers in a research domain. Using this feature, it is possible to detect survey articles. We applied HITS, which was devised to retrieve Web pages using the notions of authority and hub. We can consider that important papers and(More)
We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summariza-tion Challenge (TSC1) as introduction to TSC2. Then we explain TSC2 including the participants, the two tasks in TSC2, data(More)
In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same(More)
For travelers who plan to visit a particular tourist spot, information about it is required. In this paper, we propose a method for extracting and organizing appropriate information from weblogs (blogs). Recently, increased numbers of travelers have been writing of their travel experiences via blogs. We call these travel blog entries, and they contain much(More)
In this paper, we propose a method for compiling travel information automatically. For the compilation , we focus on travel blogs, which are defined as travel journals written by bloggers in diary form. We consider that travel blogs are a useful information source for obtaining travel information, because many bloggers' travel experiences are written in(More)
In this paper, we describe a method for automatic acquisition of script knowledge from a Japanese text collection. Script knowledge represents a typical sequence of actions that occur in a particular situation. We extracted sequences (pairs) of actions occurring in time order from a Japanese text collection and then chose those that were typical of certain(More)