Association rule mining is an indispensable tool for discovering insights from large databases and data warehouses. The data in a warehouse being multi-dimensional, it is often useful to mine rules over subsets of data defined by selections over the dimensions. Such interactive rule mining over multi-dimensional query windows is difficult since rule mining(More)
Master data management (MDM) integrates data from multiple structured data sources and builds a consolidated 360-degree view of business entities such as customers and products. Today's MDM systems are not prepared to integrate information from unstructured data sources, such as news reports, emails, call-center transcripts, and chat logs. However , those(More)
The top-<i>k</i> retrieval problem requires finding <i>k</i> objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes(More)
A skyline query returns a set of objects that are not dominated by other objects. An object is said to dominate another if it is closer to the query than the latter on all factors under consideration. In this paper, we consider the case where the similarity measures may be arbitrary and do not necessarily come from a metric space. We first explore(More)
This paper illustrates the utility of URL information in unsupervised learning. We outline the motivation behind the usage of URL information upfront, and present two techniques for unsupervised learning from URL corpora. First, we devise a similarity measure for URL pairs putting down the intuitions behind the same and verify its goodness by using it for(More)
Contact centers provide dialog based support to organizations to address various customer related issues. We have observed that the calls received at contact centers mostly follow well defined patterns. Such call flows not only specify how an agent should proceed in a call, handle objections, persuade customers, follow compliance issues, etc but also help(More)
Computer server management is an important component of the global IT (information technology) services business. The providers of server management services face unrelenting efficiency challenges in order to remain competitive with other providers. Server system administrators (SAs) represent the majority of the workers in this industry, and their primary(More)
I. INTRODUCTION The next-generation high-speed optical Internet will be required to support a broad range of emerging applications that may not only require significant bandwidth, but also have strict requirements with respect to end-to-end delay and reliability of transmitted data. In optical burst switching (OBS), data to be transmitted is assembled into(More)
Case-based reasoning (CBR) has been shown to be of considerable utility in a spam-filtering task. In the course of this study, we propose that the non-random skewed distribution of the cases in a case base is crucial, especially in the context of a classification task like spam filtering. In this paper, we propose approaches to improve the performance of a(More)