Learn More
This paper introduces an overview of the RITE (Recognizing Inference in TExt) task in NTCIR-9. We evaluate systems that automatically recognize entailment, paraphrase, and contradiction between two texts written in Japanese, Simplified Chinese, or Traditional Chinese. The task consists of four subtasks: Binary classification of entailment (BC); Multi-class(More)
This paper describes an overview of RITE-2 (Recognizing Inference in TExt) task in NTCIR-10. We evaluated systems that automatically recognize semantic relations between sentences such as paraphrase, entailment, contradiction in Japanese, Simplified Chinese and Traditional Chinese. The tasks in RITE-2 are Binary Classification of entailment (BC Subtask),(More)
In this paper, we first describe the concept of data overlay, which is a mechanism to implement arbitrary data structure on top of any structured P2P DHT. With this abstraction, we developed a highly scalable, efficient and robust infrastructure, called SOMO, to perform resource management for P2P DHT. It does so by gathering and disseminating system(More)
Main approaches to corpus-based semantic class mining include distributional similarity (DS) and pattern-based (PB). In this paper, we perform an empirical comparison of them, based on a publicly available dataset containing 500 million web pages, using various categories of queries. We further propose a frequencybased rule to select appropriate approaches(More)
This paper discusses large scale keyword searching on top of peer-to-peer (P2P) networks. The state-of-the-art keyword searching techniques for unstructured and structured P2P systems are query flooding and inverted list intersection respectively. However, it has been demonstrated that P2P-based large scale full-text searching is not feasible by using(More)
Current web search engines return result pages containing mostly text summary even though the matched web pages may contain informative pictures. A text excerpt (i.e. snippet) is generated by selecting keywords around the matched query terms for each returned page to provide context for user's relevance judgment. However, in many scenarios, we found that(More)
Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their(More)
 A semantic class is a collection of items (words or phrases) which have semantically peer or sibling relationship. This paper studies the employment of topic models to automatically construct semantic classes, taking as the source data a collection of raw semantic classes (RASCs), which were extracted by applying predefined patterns to web pages. The(More)
Modern web search engines, while indexing billions of web pages, are expected to process queries and return results in a very short time. Many approaches have been proposed for efficiently computing top-k query results, but most of them ignore one key factor in the ranking functions of commercial search engines - term-proximity, which is the metric of the(More)