Learn More
A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document.(More)
User Generated Content (UGC) has become the fastest growing sector of the WWW. Data mining from UGC presents challenges not typically found in text mining from documents. UGC can be semi-structured and its content can be very short and informal, containing relatively little content similar to a chat or an email conversation. In addition, UGC can be viewed(More)
Introduction This is my personal " summary in 337 one-liners " of A Survey in Indexing and Searching XML Documents by Luk et al. (2002) [1]. I focus on technical aspects, omitting all system names and references. In my opinion, one cannot learn any technique from the survey: it only mentions various techniques but does not explain any. Alas, my 337(More)
Pattern discovery from time series is of fundamental importance. Particularly when the domain expert derived patterns do not exist or are not complete, an algorithm to discover specific patterns or shapes automatically from the time series data is necessary. Such an algorithm is noteworthy in that it does not assume prior knowledge of the number of(More)
In an ad-hoc retrieval task, the query is usually short and the user expects to find the relevant documents in the first several result pages. We explored the possibilities of using Wikipedia's articles as an external corpus to expand ad-hoc queries. Results show promising improvements over measures that emphasize on weak queries.