• Publications
  • Influence
Overview of the 6th International Competition on Plagiarism Detection
TLDR
Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. Expand
  • 374
  • 37
  • PDF
Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling
TLDR
This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. Expand
  • 185
  • 18
  • PDF
TIRA Integrated Research Architecture
Data and software are immaterial. Scientists in computer science hence have the unique chance to let other scientists easily reproduce their findings. Similarly, and with the same ease, theExpand
  • 147
  • 17
The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength
TLDR
This paper reports on the results of the Clickbait Challenge 2017, a shared task inviting the submission of clickbait detectors for a comparative evaluation. Expand
  • 32
  • 6
  • PDF
Recent Trends in Digital Text Forensics and Its Evaluation - Plagiarism Detection, Author Identification, and Author Profiling
TLDR
We present a standardized evaluation framework for each of the three tasks and discuss the evaluation results of the altogether 58i¾?submitted contributions. Expand
  • 75
  • 3
  • PDF
Crowdsourcing a Large Corpus of Clickbait on Twitter
TLDR
This paper introduces the Webis Clickbait Corpus 2017, a large-scale corpus of teaser messages, which has been built to address automatic clickbait detection. Expand
  • 44
  • 3
  • PDF
The optimum clustering framework: implementing the cluster hypothesis
TLDR
We present a theoretic foundation for optimum document clustering with respect to the estimates of the relevance probability for the query-document pairs. Expand
  • 22
  • 2
  • PDF
Unsupervised Sparsification of Similarity Graphs
TLDR
Sparsification improves both the runtime and the quality of cluster algorithms that exploit pairwise object similarities, i.e., that rely on similarity graphs. Expand
  • 5
  • 2
  • PDF
TIRA: Configuring, Executing, and Disseminating Information Retrieval Experiments
TLDR
We present the TIRA (Testbed for Information Retrieval Algorithms) web framework that addresses the outlined challenges and possesses a unique set of compelling features in comparison to existing web-based solutions. Expand
  • 51
  • 1
  • PDF
From keywords to keyqueries: content descriptors for the web
TLDR
We introduce the concept of keyqueries as dynamic content descriptors for documents and present an exhaustive search algorithm along with effective pruning strategies. Expand
  • 18
  • 1
  • PDF