Vandalism Detection in Wikidata

@article{Heindorf2016VandalismDI,
  title={Vandalism Detection in Wikidata},
  author={Stefan Heindorf and Martin Potthast and Benno Stein and Gregor Engels},
  journal={Proceedings of the 25th ACM International on Conference on Information and Knowledge Management},
  year={2016}
}
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. [] Key Method We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROC-AUC, of 0.991. It significantly outperforms the state of the art…

Figures and Tables from this paper

Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017
TLDR
This paper provides the details of the submission that obtained an ROC-AUC score of 0.91976 in the final evaluation of the WSDM 2017 Wiki Vandalism Detection Challenge.
Debiasing Vandalism Detection Models at Wikidata
TLDR
The model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.
A Production Oriented Approach for Vandalism Detection in Wikidata - The Buffaloberry Vandalism Detector at WSDM Cup 2017
TLDR
The task at the WSDM Cup 2017 was to come up with a fast and reliable prediction system that narrows down suspicious edits for human revision, and this work was able to outperform all other contestants, while incorporating new interesting features.
Vandalism detection in crowdsourced knowledge bases
TLDR
This thesis develops novel machine learning-based vandalism detectors for Wikidata, the largest structured, crowdsourced knowledge base on the web, and enables a conscious trade-off between predictive performance and bias and might play an important role towards a more accurate and welcoming web in times of fake news and biased AI systems.
Attention-Based Vandalism Detection in OpenStreetMap
TLDR
Ovid relies on a novel neural architecture that adopts a multi-head attention mechanism to summarize information indicating vandalism from OSM changesets effectively and introduces a set of original features that capture changeset, user, and edit information.
Large-Scale Vandalism Detection with Linear Classifiers - The Conkerberry Vandalism Detector at WSDM Cup 2017
TLDR
The second place solution to the cup is presented and it is shown that it is possible to achieve competitive performance with simple linear classification, and the approach can achieve AU ROC of 0.938 on the test data.
Vandalism Detection in OpenStreetMap via User Embeddings
TLDR
A user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM is described and validated.
Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017
TLDR
This task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.
Ovid: A Machine Learning Approach for Automated Vandalism Detection in OpenStreetMap
TLDR
Ovid relies on a neural network architecture that adopts a multi-head attention mechanism to effectively summarize information indicating vandalism from OpenStreetMap changesets to facilitate automated vandalism detection.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
TLDR
An effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features improves the state-of-the-art from all previous methods and establishes a new baseline forWikipedia vandalism detection.
Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
TLDR
The construction of the Wikidata Vandalism Corpus WDVC-2015 is reported on, the first corpus for vandalism in knowledge bases, based on the entire revision history ofWikidata, the knowledge base underlying Wikipedia, which shows that public knowledge bases must be used with caution.
Automatic Vandalism Detection in Wikipedia
TLDR
The characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task are discussed and logistic regression is used to achieve 83% precision at 77% recall with the model.
Detecting Wikipedia Vandalism using WikiTrust - Lab Report for PAN at CLEF 2010
TLDR
A simple Web API is implemented that provides the vandalism estimate for every revision of the English Wikipedia, and can be used both to identify vandalism that needs to be reverted, and to select highquality, non-vandalized recent revisions of any given Wikipedia article.
Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?
TLDR
A classifier is produced which flags vandalism at performance comparable to the natural-language efforts the authors intend to complement, and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside their labeled set.
Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages
TLDR
A novel context-aware and cross-language vandalism detection technique that scales to the size of the full Wikipedia and extends the types of vandalism detectable beyond past feature-based approaches is proposed.
Elusive vandalism detection in wikipedia: a text stability-based approach
TLDR
This work identifies a number of vandal edits that can take hours, even days, to correct and proposes a text stability-based approach for detecting them and shows that text-stability is able to improve the performance of the selected machine-learning algorithms significantly.
A content-context-centric approach for detecting vandalism in Wikipedia
TLDR
A content-context-aware vandalism detection framework to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article, and presents two novel metrics, called WWW co-occurrence probability and top-ranked co- Occurrence probability for this purpose.
Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions
TLDR
Results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages.
Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals - Lab Report for PAN at CLEF 2010
TLDR
The framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection is extended, and several vandalism indicating features are extracted from edits in a vandalism corpus and are fed to a supervised learning algorithm.
...
1
2
3
4
5
...