Manipulation among the Arbiters of Collective Intelligence

  title={Manipulation among the Arbiters of Collective Intelligence},
  author={Sanmay Das and Allen Lavoie and Malik Magdon-Ismail},
  journal={ACM Transactions on the Web (TWEB)},
  pages={1 - 25}
Our reliance on networked, collectively built information is a vulnerability when the quality or reliability of this information is poor. Wikipedia, one such collectively built information source, is often our first stop for information on all kinds of topics; its quality has stood up to many tests, and it prides itself on having a “neutral point of view.” Enforcement of neutrality is in the hands of comparatively few, powerful administrators. In this article, we document that a surprisingly… 

Controversy Detection in Wikipedia Using Collective Classification

This work proposes a stacked model which exploits the dependencies among related pages of controversial topics to improve classification of controversial web pages when compared to a model that examines each page in isolation, demonstrating that controversial topics exhibit homophily.

Detecting pages to protect in Wikipedia across multiple languages

The problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia is considered as a binary classification task and a novel set of features to decide which pages to protect based on users page revision behavior and page categories are proposed.

Detecting Biased Statements in Wikipedia

A supervised classification approach is proposed, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements, and shows that it is able to detect biased statements with an accuracy of 74%.

Social Motivation and Point of View (Doctoral Consortium)

This work investigates scalable algorithms for finding user behavior changes, predicting the effect of feedback on where users will make contributions, and evaluating the topics and points of view presented in peer-produced content.

The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

Surprisingly, it is found that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests.

Mind Your POV

It is shown that after an article is tagged for NPOV, there is a significant decrease in biased language in the article, as measured by several lexicons, which suggests that NPOV tagging and discussion does improve content, but has less success enculturating editors to the site's linguistic norms.

Improving Linguistic Bias Detection in Wikipedia using Cross-Domain Adaptive Pre-Training

The effectiveness of bias detection via cross- domain pre-training of deep transformer models is studied and it is found that the cross-domain bias classifier with continually pre-trained RoBERTa model achieves a precision of 89% with an F1 score of 87%, and can detect subtle forms of bias with higher accuracy than existing methods.

Concealing Communities Within the Crowd

This study investigates organizational identity and member identification in a hidden organization operating within a crowd-based collective. Specifically, it draws from Scott’s hidden organization

Telling Apart Tweets Associated with Controversial versus Non-Controversial Topics

It is shown that features specific to Twitter or social media, in general, are more prevalent in tweets on controversial topics than in non-controversial ones, and will inform future investigations into the relationship between language use on social media and the perceived controversiality of topics.

Probabilistic Approaches to Controversy Detection

A probabilistic framework to detect controversy on the web, and a language modeling approach to this problem is introduced, based on insights from social science research.



Manipulation among the arbiters of collective intelligence: how wikipedia administrators mold public opinion

Neither prior history nor vote counts during an administrator's election can identify those editors most likely to change their behavior in this suspicious manner, and an alternative measure, which gives more weight to influential voters, can successfully reject these suspicious candidates.

\Googlearchy": How a Few Heavily-Linked Sites Dominate Politics on the Web

This paper proposes a new methodology for measuring the link structure surrounding political Web sites, based on a large literature in computer science that ties a site’s visibility to the number of inbound hyperlinks it receives.

Who moderates the moderators?: crowdsourcing abuse detection in user-generated content

This paper introduces a framework to address the problem of moderating online content using crowdsourced ratings, and presents efficient algorithms to accurately detect abuse that only require knowledge about the identity of a single 'good' agent, who rates contributions accurately more than half the time.

Mopping up: modeling wikipedia promotion decisions

This paper presents a model of the behavior of candidates for promotion to administrator status in Wikipedia. It uses a policy capture framework to highlight similarities and differences in the

Collective wisdom: information growth in wikis and blogs

This model is able to reproduce many features of the edit dynamics observed on Wikipedia and on blogs collected from LiveJournal; in particular, it captures the observed rise in the edit rate, followed by 1/t decay.

Finding social roles in Wikipedia

The number of new editors playing helpful roles in a single month's cohort nearly equal the number found in the dedicated sample, suggesting that informal socialization has the potential provide sufficient role related labor despite growth and change in Wikipedia.

Mining latent relations in peer-production environments: a case study with Wikipedia article similarity and controversy

A new similarity measure, which is called expert-based similarity, is proposed to discover semantic relations among Wikipedia articles from the co-editorship perspective to discern the influence and impact of several factors which are hypothysed to generate controversies in Wikipedia articles.

Assessing the value of cooperation in Wikipedia

It is shown that the accretion of edits to an article is described by a simple stochastic mechanism, resulting in a heavy tail of highly visible articles with a large number of edits, which validates Wikipedia as a successful collaborative effort.

Automatic Vandalism Detection in Wikipedia : Towards a Machine Learning Approach

This study investigates the possibility of using machine learning techniques to build an autonomous system capable to distinguish vandalism from legitimate edits and reveals that elementary features are not sufficient to build such a system.

On ranking controversies in wikipedia: models and evaluation

Three models are proposed to identify controversial articles in Wikipedia, namely the Basic model and two Controversy Rank (CR) models, which draw clues from collaboration and edit history instead of interpreting the actual articles or edited content.