Tetsuya Nakatoh

Learn More
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a test bed for information extraction from search results.(More)
A Deep Web wrapper is a program that extracts contents from search results. We propose a new automatic wrapper generation algorithm which discovers a repetitive pattern from search results. The repetitive pattern is expressed by token sequences which consist of HTML tags, plain texts and wild-cards. The algorithm applies a string matching with mismatches to(More)
Literature survey of scientific articles depends on the relevancy and the quality of the obtained list. Relevancy might be controlled by an appropriate search query and the relevancy ranking of the search result. Citation count (CC) is widely used and useful as an easy measure to evaluate the quality of articles. However, articles with high citation count(More)
Individual opinions and experiences are published in Web as CGM (consumer generated media). A tourism blog which a tourist wrote his experience and impression in a certain area is very helpful information for other tourists. However, a user cannot obtain such precious information without knowing the relation of blog articles and concrete place-names. We(More)
Blog articles by tourists contain interesting and personal experiences of where and how they have gone, what they have done and what they thought. Such individual experiences are helpful in many cases compared to the general and official information about the tourist resort by tourist agents. However, it is not easy to choose related articles and to extract(More)
Blog articles by tourists contain interesting and personal experiences of where and how they have gone, what they have done and what they thought. Such individual experiences are helpful in many cases compared to the general and official information about the tourist resort by tourist agents. However, it is not easy to choose related articles and to extract(More)