Database Content Exploration and Exploratory Analysis of User Queries

Abstract

Content providers, such as enterprises and organizations who publish their content on the Internet, aim at making their content visible and easily accessible to the users. The vast amount of data contained in databases impedes their efforts, as users often find it challenging to navigate through the available data and find the items that best suit their needs. It is therefore necessary for content providers to motivate users to explore the available data and assist them in finding items that are interesting to them. State-of-the-art approaches such as top-k queries are not appropriate for data exploration as they require the users to be aware of the database structure and the content they are exploring. In this thesis, we study the problem of enhancing the visibility of database content through exploratory search and analysis. We propose exploratory algorithms that return to the user a small number of results, which at the same time provide a wide overview of the available content. In addition, we present algorithms that identify items that are appealing to users and can be exploited for offering users an insight of the available items and motivating them to explore the database. In particular, the main contributions of the thesis are: • We develop a framework for organizing and summarizing keyword search results based on their textual content and temporal data. • We introduce a new type of query, the eXploratory Top-k Join (XTJk) query, which creates object combinations that are better suited to user preferences than single objects, and we present algorithms for the efficient processing of XTJk queries. • We introduce the continuous influential query, which returns objects that are continuously attractive to a large number of users for long periods, and we present algorithms for the efficient retrieval of continuous influential objects. • We model the diversity of database objects based on user preferences, and we propose efficient algorithms for selecting products that are attractive to a wide range of users with diverse preferences. • We describe the Best-terms problem which is the problem of increasing the rank of a spatio-textual object through the enhancement of its textual description. We show that the problem is NP-hard and we present approximate algorithms that retrieve high quality results. The proposed approaches have been evaluated through extensive experimental evaluation. The experiments were conducted using both synthetic and real datasets and demonstrate the efficiency of the proposed methods.

View Slides

Cite this paper

@inproceedings{Gkorgkas2015DatabaseCE, title={Database Content Exploration and Exploratory Analysis of User Queries}, author={Orestis Gkorgkas}, year={2015} }