Size matters: word count as a measure of quality on wikipedia

  title={Size matters: word count as a measure of quality on wikipedia},
  author={Joshua Evan Blumenstock},
Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a high-quality, authoritative encyclopedia. Some Wikipedia articles, however, are of questionable quality, and it is not always apparent to the visitor which articles are good and which are bad. We propose a simple metric -- word count -- for measuring article quality. In spite of its striking simplicity, we show that this metric significantly outperforms the more complex methods… 

Figures, Tables, and Topics from this paper

Automatically Assessing the Quality of Wikipedia Articles
It is described how a very simple metric – word count – can be used to as a proxy for article quality, and the implications of this result for Wikipedia in particular, and quality assessment in general.
An Empirical Study to Predict the Quality of Wikipedia Articles
A significant correlation is observed with the rating of articles in order to identify their quality, and few metrics such as article length, number of edits, article age and article ranking are considered.
Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features
Over 100 linguistic features are considered to determine the quality of Wikipedia articles in Polish language and the importance of linguistic features for quality prediction is discussed.
Relative Quality Assessment of Wikipedia Articles in Different Languages Using Synthetic Measure
This paper proposes to use a synthetic measure for automatic quality evaluation of the articles in different languages based on important features in Wikipedia to help decide which language version is more complete and correct.
A Psycho-Lexical Approach to the Assessment of Information Quality on Wikipedia
  • Qi Su, Pengyuan Liu
  • Computer Science
    2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
  • 2015
This paper describes how to assess the quality of Wikipedia articles automatically by using a psycho-lexical resource, i.e., the Language Inquiry and Word Count (LIWC) dictionary, by training a classifier on different LIWC categories.
Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles
The proposed method allows us to find articles with better quality that can be used to automatically enrich other language editions of Wikipedia, and the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated.
Identifying featured articles in wikipedia: writing style matters
A machine learning approach is presented that exploits an article's character trigram distribution and aims to writing style rather than evaluating meta features like the edit history, which is robust, straightforward to implement, and outperforms existing solutions.
Measuring the Quality of Edits to Wikipedia
This paper uses human raters through Amazon Mechanical Turk to validate an efficient, automated quality metric that reflects an intuitive concept of "quality," but must also be scalable and run efficiently.
Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia
This study presents and classifies measures that can be extracted from Wikipedia articles for the purpose of automatic quality assessment in different languages, and describes also an extraction methods for various sources of measures, which can be used in quality assessment.
An investigation of the relationship between the amount of extra-textual data and the quality of Wikipedia articles
This paper investigates the correlation between article ratings and the count of media items in Wikipedia through a series of experiments and shows that article ratings are correlated.


Assessing Information Quality of a Community-Based Encyclopedia
This work proposes seven IQ metrics which can be evaluated automatically and test the set on a representative sample of Wikipedia content, along with a number of statistical characterizations of Wikipedia articles, their content construction, process metadata and social context.
Analysis of the discussion pages and other process-oriented pages within the Wikipedia project helps in understanding how high quality is maintained in a project where anyone may participate with no prior vetting.
Puppy smoothies: Improving the reliability of open, collaborative wikis
This paper provides a practical proposal for improving user confidence in wiki information by coloring the text of a wiki article based on the venerability of the text on the philosophy that bad information is less likely to survive a collaborative editing process over large numbers of edits.
Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource
  • A. Lih
  • Computer Science, Political Science
  • 2004
This study examines the growth of Wikipedia and analyzes the crucial technologies and community policies that have enabled the project to prosper, and establishes a set of metrics based on established encyclopedia taxonomies and analyzed the trends in Wikipedia being used as a source.
A content-driven reputation system for the wikipedia
The results show that the notion of reputation has good predictive value: changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, as judged by human observers, and of being later undone, as measured by the algorithms.
Computing trust from revision history
This paper explores ways of utilizing the revision history of an article to assess the trustworthiness of the article and uses this revision history-based trust model to assess a chain of successive versions of articles in Wikipedia and evaluated the assessments produced by the model.
  • WWW
  • 2008