Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes
@article{Kumar2016DisinformationOT, title={Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes}, author={Srijan Kumar and Robert West and Jure Leskovec}, journal={Proceedings of the 25th International Conference on World Wide Web}, year={2016} }
Wikipedia is a major source of information for many people. [] Key Result We find that humans are not particularly good at the task and that our automated classifier outperforms them by a big margin.
Figures and Tables from this paper
268 Citations
Vandals and Hoaxes on the Web
- Computer ScienceCyberSafety@CIKM
- 2016
This talk describes algorithms to identify two different aspects of undesirable actors and acts on Wikipedia: vandals and hoaxes and presents the state-of-the-art system to detect vandals on Wikipedia called VEWS, which stands for Vandal Early Warning System.
Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability
- Computer ScienceWWW
- 2019
This paper provides an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines, and designs algorithmic models to determine if a statement requires a citation, and to predict the citation reason.
False Information on Web and Social Media: A Survey
- Computer ScienceArXiv
- 2018
A comprehensive survey spanning diverse aspects of false information is presented, namely the actors involved in spreading false information, rationale behind successfully deceiving readers, quantifying the impact offalse information, and algorithms developed to detect false information.
UvA-DARE Quantifying Engagement with Citations on Wikipedia
- Computer Science
- 2020
This work built client-side instrumentation for logging all interactions with links English Wikipedia articles to cited references during one month, and conducted the first analysis of readers’ interactions with citations, finding that overall engagement with citations is low and that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user.
Quantifying Engagement with Citations on Wikipedia
- Computer ScienceWWW
- 2020
This work built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references during one month, and conducted the first analysis of readers’ interactions with citations, finding that overall engagement with citations is low and that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user.
A Framework for Hoax News Detection and Analyzer used Rule-based Methods
- Computer ScienceInternational Journal of Advanced Computer Science and Applications
- 2019
This study proposes Automatic detection of hoax news, Automatic Multilanguage Detection, and a collection of datasets that results in four categories of hoaxNews that have measured in terms of text similarity using similarity techniques that can be continued by adding objects hate speech, black campaign, blockchain technique to ward off hoaxes.
Detecting Undisclosed Paid Editing in Wikipedia
- Computer ScienceWWW
- 2020
This paper proposes a machine learning-based framework using a set of features based on both the content of the articles as well as the patterns of edit history of users who create them to solve the problem of identifying undisclosed paid articles in Wikipedia.
Detecting pages to protect in Wikipedia across multiple languages
- Computer ScienceSocial Network Analysis and Mining
- 2019
The problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia is considered as a binary classification task and a novel set of features to decide which pages to protect based on users page revision behavior and page categories are proposed.
Identifying Disinformation Websites Using Infrastructure Features
- Computer ScienceFOCI @ USENIX Security Symposium
- 2020
The hypothesis is that while disinformation websites may be perceptually similar to authentic news websites, there may also be significant non-perceptual differences in the domain registrations, TLS/SSL certificates, and web hosting configurations.
Supporting Early and Scalable Discovery of Disinformation Websites
- Computer ScienceArXiv
- 2020
It is shown that automated identification with similar features can effectively support human judgments for early and scalable discovery of disinformation websites, and this system significantly exceeds the state of the art in detecting disinformation websites.
References
SHOWING 1-10 OF 53 REFERENCES
Rumor has it: Identifying Misinformation in Microblogs
- Computer ScienceEMNLP
- 2011
This paper addresses the problem of rumor detection in microblogs and explores the effectiveness of 3 categories of features: content- based, network-based, and microblog-specific memes for correctly identifying rumors, and believes that its dataset is the first large-scale dataset on rumor detection.
Information credibility on twitter
- Computer ScienceWWW
- 2011
There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
On measuring the quality of Wikipedia articles
- Computer ScienceWICOW '10
- 2010
The experiment shows, that using special-purpose models for information quality captures user sentiment about Wikipedia articles better than using a single model for both categories of articles.
The spreading of misinformation online
- Computer ScienceProceedings of the National Academy of Sciences
- 2016
A massive quantitative analysis of Facebook shows that information related to distinct narratives––conspiracy theories and scientific news––generates homogeneous and polarized communities having similar information consumption patterns, and derives a data-driven percolation model of rumor spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades’ size.
Automatic detection of rumor on Sina Weibo
- Computer ScienceMDS '12
- 2012
This is the first study on rumor analysis and detection on Sina Weibo, China's leading micro-blogging service provider, and examines an extensive set of features that can be extracted from the microblogs, and trains a classifier to automatically detect the rumors from a mixed set of true information and false information.
How do users evaluate the credibility of Web sites?: a study with over 2,500 participants
- Computer ScienceDUX '03
- 2003
Comments in the top 18 areas that people noticed when evaluating Web site credibility are shared, and reasons for the prominence of design look are discussed.
Tweeting is believing?: understanding microblog credibility perceptions
- Computer ScienceCSCW
- 2012
It is shown that users are poor judges of truthfulness based on content alone, and instead are influenced by heuristics such as user name when making credibility assessments.
Containment of misinformation spread in online social networks
- Computer ScienceWebSci '12
- 2012
Empirical results indicate that the β1T -- Node Protectors methods are among the best ones for hinting out those important nodes in comparison with other available methods for limit viral propagation of misinformation in OSNs.
Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks
- Computer ScienceWWW
- 2015
The approach allows to quantitatively gauge the minimal reaction necessary to eradicate a hoax and is characterized by four parameters: spreading rate, gullibility, probability to verify a hoax, and that to forget one's current belief.