Victor Lavrenko

Learn More
We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for(More)
Retrieving images in response to textual queries requires some knowledge of the semantics of the picture. Here, we show how we can do both automatic image annotation and retrieval (using one word queries) from images and videos using a multiple Bernoulli relevance model. The model assumes that a training set of images or videos along with keyword(More)
Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to(More)
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which(More)
Twitter is a very popular way for people to share information on a bewildering multitude of topics. Tweets are propagated using a variety of channels: by following users or lists, by searching or by retweeting. Of these vectors, retweeting is arguably the most effective, as it can potentially reach the most people, given its viral nature. A key task is(More)
A GENERATIVE THEORY OF RELEVANCE SEPTEMBER 2004 VICTOR LAVRENKO B.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST M.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor W. Bruce Croft and Professor James Allan We present a new theory of relevance for the field of Information Retrieval. Relevance is viewed as a(More)
Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition(More)