James Lanagan

Learn More
For the first year of the TREC Microblog Track the CLARITY group concentrated on a number of areas, investigating the underlying term weighting scheme for ranking tweets, incorporating query expansion to introduce new terms into the query, as well as introducing an element of temporal re-weighting based on the temporal distribution of assumed relevant(More)
The automatic summarisation of sports video is of growing importance with the increased availability of on-demand content. Consumers who are unable to view events live often have a desire to watch a summary which allows then to quickly come to terms with all that has happened during a sporting event. Sports forums show that it is not only summaries that are(More)
In this paper we examine the effectiveness of using a filtered stream of tweets from Twitter to automatically identify events of interest within the video of live sports transmissions. We show that using just the volume of tweets generated at any moment of a game actually provides a very accurate means of event detection, as well as an automatic method for(More)
Role analysis in online communities allows us to understand and predict users behavior. Though several approaches have been followed, there is still lack of generalization of their methods and their results. In this paper, we discuss about the ground theory of roles and search for a consistent and computable definition that allows the automatic detection of(More)
Technology usage is changing rapidly and is becoming a more mobile, more social and more multimedia-based experience. This is especially true in the area of content creation where mobile social applications used by crowds of people are challenging traditional ways of creating and distributing content, especially for applications like news dissemination.(More)
The use of effective term frequency weighting and document length normalisation strategies have been shown over a number of decades to have a significant positive effect for document retrieval. When dealing with much shorter documents, such as those obtained from mi-croblogs, it would seem intuitive that these would have less benefit. In this paper we(More)
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a(More)