Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07

  title={Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07},
  author={Benno Stein and Moshe Koppel and E. Stamatatos},
  journal={SIGIR Forum},
Goal of the workshop was to bring together experts and prospective researchers around the exciting and future-oriented topic of plagiarism analysis, authorship identification, and high similarity search. This topic receives increasing attention, which results, among others, from the fact that information about nearly any subject can be found on the World Wide Web. 
Deception detection: dependable or defective?
It is shown how the nature of the deception largely dictates the methods that can be deployed effectively in detection by reference to several experiments on materials which can have a strongly deceptive framing. Expand
Domain bias in distinguishing Flemish and Dutch subtitles
It is shown that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another, and how the relation between training and test domains influences the recognition quality is investigated. Expand
Profiling Fake News spreaders through Stylometry and Lexical Features. UniOR NLP @PAN2020
This paper describes the approach to address the Profiling Fake News Spreaders on Twitter task at PAN 2020 with different machine learning algorithms combined with strictly stylometric features, categories of emojis and a bunch of lexical features related to the fake news headlines vocabulary. Expand
A Decade of Shared Tasks in Digital Text Forensics at PAN
This paper presents the evolution of both the examined tasks and the developed datasets during the last decade in digital text forensics and briefly introduces the upcoming PAN 2019 shared tasks. Expand
A Stylometric Investigation of Character Voices in Literary Fiction
A Stylometric Investigation of Character Voices in Literary Fiction Krishnapriya Vishnubhotla Master of Science Graduate Department of Computer Science University of Toronto 2019 Characters in aExpand
Improving author verification based on topic modeling
The comparison to state‐of‐the‐art methods demonstrates the great potential of the approaches presented in this study and demonstrates that even when genre‐agnostic external documents are used, the proposed extrinsic models are very competitive. Expand
Precise Detection of Content Reuse in the Web
It is shown that *bad neighborhoods*, clusters of pages where copied content is frequent, help identify copying in the web, and the use of cryptographic hashing is much more precise than alternatives such as locality-sensitive hashing, avoiding the thousands of false-positives that would otherwise occur. Expand
A Change Tracking Framework for Financial Documents
A graph-based approach is devised called DeepAntara1 and its performance for change tracking task over multiple sentence pairs extracted from different versions of publicly available financial CRS treaties is shown. Expand
A System for Predicting Health of an E-Contract
This paper describes Fitcon - a contract mining system that detects service level agreements from contracts, tracks the delivery performance against them and predicts the health of long term contracts, and is first such system that has been deployed into production for large scale contract health determination and prediction. Expand
Applying the Seed-and-Extend Strategy to Text-Alignment
The alignment of reused passages between documents is a central task when handling large collections. Current alignment algorithms for generic text similarity relations are too heterogeneous and thusExpand