Learn More
There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving(More)
The ambiguity of person names in the Web has become a new area of interest for NLP researchers. This challenging problem has been formulated as the task of clustering Web search results (returned in response to a person name query) according to the individual they mention. In this paper we compare the coverage, reliability and independence of a number of(More)
The third WePS (Web People Search) Evaluation campaign took place in 2009-2010 and attracted the participation of 13 research groups from Europe, Asia and North America. Given the top web search results for a person name, two tasks were addressed: a clustering task, which consists of grouping together web pages referring to the same person, and an(More)
This paper summarizes the goals, organization, and results of the second RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2013). RepLab focused on the process of monitoring the reputation of companies and individuals, and asked participant systems to annotate different types of information on tweets containing the(More)
This paper summarizes the definition, resources, evaluation methodology and metrics, participation and comparative results for the second task of the WEPS-3 evaluation campaign. The so-called Online-Reputation Management task consists of filtering Twitter posts containing a given company name depending of whether the post is actually related with the(More)
This paper presents a probabilistic framework, QARLA, for the evaluation of text summarisation systems. The input of the framework is a set of manual (reference) summaries, a set of base-line (automatic) summaries and a set of similarity metrics between summaries. It provides i) a measure to evaluate the quality of any set of similarity metrics, ii) a(More)
We deal with shrinking the stream of tweets for scheduled events in real-time, following two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a tweet to describe each sub-event. By comparing summaries in three languages to live reports by journalists, we show that simple text analysis(More)
This paper describes the organisation and results of RepLab 2014, the third competitive evaluation campaign for Online Reputation Management systems. This year the focus lied on two new tasks: reputation dimensions classification and author profiling, which complement the aspects of reputation analysis studied in the previous campaigns. The participants(More)
This paper presents the Unanimous Improvement Ratio (UIR), a measure that allows to compare systems using two evaluation metrics without dependencies on relative metric weights. For clustering tasks, this kind of measure becomes necessary given the trade-off between precision and recall oriented metrics (e.g. Purity and Inverse Purity) which usually depends(More)
This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant systems to annotate different types of information on tweets containing the names of several companies. Two tasks were(More)