HengHa: data harvesting detection on hidden databases


The back-end databases of web-based applications are a major data security concern to enterprises. The problem becomes more critical with the proliferation of enterprise hosted web applications in the cloud. While prior work has concentrated on malicious attacks that try to break into the database using vulnerabilities of web applications, little work has focused on the threat of <i>data harvesting</i> through web form interfaces, in which large collections of the underlying data can be harvested and sensitive information can be learnt by iteratively submitting legitimate queries and analyzing the returned results for designing new queries. To defend against data harvesting without compromising usability, we consider a detection approach. We summarize the characteristics of data harvesting, and propose the notions of <i>query correlation</i> and <i>result coverage</i> for data harvesting detection. We design a detection system called <i>HengHa</i>, in which Heng examines the correlation among queries in a session, and Ha evaluates the data coverage of the results of queries in the same session. The experimental results verify the effectiveness and efficiency of HengHa for data harvesting detection.

DOI: 10.1145/1866835.1866847

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Wang2010HengHaDH, title={HengHa: data harvesting detection on hidden databases}, author={Shiyuan Wang and Divyakant Agrawal and Amr El Abbadi}, booktitle={CCSW}, year={2010} }