• Publications
  • Influence
Overview of the 6th International Competition on Plagiarism Detection
TLDR
Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Expand
Webis: An Ensemble for Twitter Sentiment Detection
TLDR
Four Twitter sentiment classification approaches that participated in previous SemEval editions with diverse feature sets are reproduced, and the ensemble of the reproduced approaches serves as a strong baseline in the current edition where it is top-ranked on the 2015 test set. Expand
Query segmentation revisited
TLDR
A new method for query segmentation that is easy to implement, fast, and that comes with a segmentation accuracy comparable to current state-of-the-art techniques is introduced. Expand
ChatNoir: a search engine for the ClueWeb09 corpus
TLDR
The ChatNoir search engine is scalable and returns the first results within three seconds, which is significantly faster than Indri, which allows for implementing reproducible experiments based on retrieving documents from the ClueWeb09 corpus. Expand
The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength
TLDR
The Clickbait Challenge 2017 was a shared task inviting the submission of clickbait detectors for a comparative evaluation, and a total of 13 detectors have been submitted, achieving significant improvements over the previous state of the art in terms of detection performance. Expand
Clickbait Detection
TLDR
This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link, based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and0.76 recall. Expand
A News Editorial Corpus for Mining Argumentation Strategies
TLDR
A novel corpus with 300 editorials from three diverse news portals that provides the basis for mining argumentation strategies and reveals different strategies across the news portals, exemplifying the benefit of studying editorials—a so far underresourced text genre in argument mining. Expand
Who Wrote the Web? Revisiting Influential Author Identification Research Applicable to Information Retrieval
TLDR
This paper selects 15 of the most influential papers for author identification and recruits a group of students to reimplement them from scratch, laying the groundwork for integrating author identification with information retrieval to eventually scale the former to the web. Expand
Towards optimum query segmentation: in doubt without
TLDR
It turns out that more accurate segmentation not necessarily yields better retrieval performance, so a new in-doubt-without variant is proposed which achieves the best retrieval performance despite leaving many queries unsegmented. Expand
From search session detection to search mission detection
TLDR
A new algorithm for logical session detection is presented, which follows the state-of-the-art cascading method's rationale of combining effectiveness with efficiency, and a new publicly available corpus of 8800 queries labeled with session and mission information is introduced. Expand
...
1
2
3
4
5
...