Learn More
Often, in the real world noise is ubiquitous in text communications. Text produced by processing signals intended for human use are often noisy for automated computer processing. Automatic speech recognition, optical character recognition and machine translation all introduce processing noise. Also digital text produced in informal settings such as online(More)
Structural information (such as layout and look-and-feel) has been extensively used in the literatuce for extraction of interesting or relevant data, efficient storage, and query optimization. Traditionally, tree models (such as DOM trees) have been used to represent structural information, especially in the case of HTML and XML documents. However,(More)
Short Messaging Service (SMS) is popularly used to provide information access to people on the move. This has resulted in the growth of SMS based Question Answering (QA) services. However automatically handling SMS questions poses significant challenges due to the inherent noise in SMS questions. In this work we present an automatic FAQ-based question(More)
Recent times have seen a tremendous growth in mobile based data services that allow people to use Short Message Service (SMS) to access these data services. In a multilingual society it is essential that data services that were developed for a specific language be made accessible through other local languages also. In this paper, we present a service that(More)
Noise in textual data such as those introduced by multi-linguality, misspellings, abbreviations, deletions, phonetic spellings, non standard transliteration, etc pose considerable problems for text-mining. Such corruptions are very common in instant messenger (IM) and short message service (SMS) data and adversely affect off the shelf text mining methods.(More)
In this paper we investigate the problem of processing multi-way spatial joins on map-reduce platform. We look at two common spatial predicates - <i>overlap</i> and <i>range</i>. We address these two classes of join queries, discuss the challenges and outline novel approaches for executing these queries on a map-reduce framework. We then discuss how we can(More)
Typical commercial Web sites publish information from multiple back-end data sources; these data sources are also updated very frequently. Given the size of most commercial sites today, it becomes essential to have an automated means of checking for correctness and consistency of data. The eShopmonitor allows users to specify items of interest to be(More)
In this paper we address the problem of extracting important (and unimportant) discourse patterns from call center conversations. Call centers provide dialog based calling-in support for customers to address their queries, requests and complaints. A Call center is the direct interface between an organization and its customers and it is important to capture(More)
In this paper we investigate the problem of processing multi-way interval joins on map-reduce platform. We look at join queries formed by interval predicates as defined by Allen's interval algebra. These predicates can be classified in two groups: colocation based predicates and sequence based predicates. A colocation predicate requires two intervals to(More)