Cohesiveness Relationships to Empower Keyword Search on Tree Data on the Web

Abstract

Keyword search has been for several years the most popular technique for retrieving information over semistructured data on the web. The reason of this unprecedented success is well known and twofold: (1) the user does not need to master a complex query language to specify her requests for data, and (2) she does not need to have any knowledge of the structure of the data sources. However, these advantages come with two drawbacks: (1) as a result of the imprecision of keyword queries, there is usually a huge number of candidate results of which only very few match the user’ s intent. Unfortunately, the existing semantics are ad-hoc and they generally fail to“guess”the user intent. (2) As the number of keywords and the size of data grows the existing approaches do not scale satisfactorily. In this paper, we focus on keyword search on tree data and we introduce keyword queries which can express cohesiveness relationships. Intuitively, a cohesiveness relationship on keywords indicates that the instances of these keywords in a query result should form a cohesive whole, where instances of the other keywords do not interpolate. Cohesive keyword queries allow also keyword repetition and cohesiveness relationship nesting. Most importantly, despite their increased expressiveness, they enjoy both advantages of plain keyword search. We provide formal semantics for cohesive keyword queries on tree data which ranks query results on the proximity of the keyword instances. We design a stack based algorithm which builds a lattice of keyword partitions to efficiently compute keyword queries and further leverages cohesiveness relationships to significantly reduce the dimensionality of the lattice. We implemented our approach and ran extensive experiments to measure the effectiveness of keyword queries and the efficiency and scalability of our algorithm. Our results demonstrate that our approach outperforms previous filtering semantics and our algorithm scales smoothly achieving interactive response times on queries of 20 frequent keywords on large datasets.

Extracted Key Phrases

10 Figures and Tables

Cite this paper

@article{Dimitriou2015CohesivenessRT, title={Cohesiveness Relationships to Empower Keyword Search on Tree Data on the Web}, author={Aggeliki Dimitriou and Ananya Dass and Dimitri Theodoratos}, journal={CoRR}, year={2015}, volume={abs/1508.04957} }