Data Set Used
Manually tuning tens to hundreds of configuration parameters in a complex software system like a database or an application server is an arduous task. Recent work has looked into automated approaches for recommending good configuration settings that adaptively search the full space of possible configurations. These approaches are based on conducting… (More)
Solid-state drives are becoming a viable alternative to magnetic disks in database systems, but their performance characteristics, particularly those caused by their erase-before-write behavior, make conventional database indexes a poor fit. There have been various proposals of indexes specialized for these devices, but to make such indexes practical, we… (More)
Recent studies in classification have proposed ways of exploiting the association rule mining paradigm. These studies have performed extensive experiments to show their techniques to be both efficient and accurate. However, existing studies in this paradigm either do not provide any theoretical justification behind their approaches or assume independence… (More)
In this paper, we look at the problem of assigning labels to nodes of a dynamic XML tree such that the labels encode all ancestor-descendant relationships between the nodes and the document-order between the nodes. Such labeling facilitates efficient XML query processing. A number of labeling schemes have been designed for this task. These schemes can be… (More)
— We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-term query, and for each query term, a list of all its matches with scores, sorted by locations. The problem is to find the overall best matchset, consisting of one match from… (More)
Permutation is a fundamental operator for array data, with applications in, for example, changing matrix layouts and reorganizing data cubes. We consider the problem of permuting large quantities of data stored on secondary storage that supports fast random block accesses, such as solid state drives and distributed key-value stores. Faster random accesses… (More)
CERTIFICATE It is certified that the work contained in this thesis, titled " Classifying Categorical Data " by Risi Vardhan Thonangi, has been carried out under my supervision and it is not submitted elsewhere for a degree.