Learn More
Structural information (such as layout and look-and-feel) has been extensively used in the literatuce for extraction of interesting or relevant data, efficient storage, and query optimization. Traditionally, tree models (such as DOM trees) have been used to represent structural information, especially in the case of HTML and XML documents. However,(More)
Short Messaging Service (SMS) is popularly used to provide information access to people on the move. This has resulted in the growth of SMS based Question Answering (QA) services. However automatically handling SMS questions poses significant challenges due to the inherent noise in SMS questions. In this work we present an automatic FAQ-based question(More)
Often, in the real world noise is ubiquitous in text communications. Text produced by processing signals intended for human use are often noisy for automated computer processing. Automatic speech recognition, optical character recognition and machine translation all introduce processing noise. Also digital text produced in informal settings such as online(More)
In this paper we investigate the problem of processing multi-way interval joins on map-reduce platform. We look at join queries formed by interval predicates as defined by Allen's interval algebra. These predicates can be classified in two groups: colocation based predicates and sequence based predicates. A colocation predicate requires two intervals to(More)
Implementing a CRM Analytics solution for a business involves many steps including data extraction, populating the extracted data into a warehouse, and running an appropriate mining algorithm. We propose a CRM Analytics Framework that provides an end-to-end framework for developing and deploying prepackaged predictive modeling business solutions, intended(More)
Recent times have seen a tremendous growth in mobile based data services that allow people to use Short Message Service (SMS) to access these data services. In a multilingual society it is essential that data services that were developed for a specific language be made accessible through other local languages also. In this paper, we present a service that(More)
In this paper we investigate the problem of processing multi-way spatial joins on map-reduce platform. We look at two common spatial predicates - <i>overlap</i> and <i>range</i>. We address these two classes of join queries, discuss the challenges and outline novel approaches for executing these queries on a map-reduce framework. We then discuss how we can(More)
Noise in textual data such as those introduced by multi-linguality, misspellings, abbreviations, deletions, phonetic spellings, non standard transliteration, etc pose considerable problems for text-mining. Such corruptions are very common in instant messenger (IM) and short message service (SMS) data and adversely affect off the shelf text mining methods.(More)
Typical commercial Web sites publish information from multiple back-end data sources; these data sources are also updated very frequently. Given the size of most commercial sites today, it becomes essential to have an automated means of checking for correctness and consistency of data. The eShopmonitor allows users to specify items of interest to be(More)
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites(More)